Sample records for accurate phylogenetic classification

  1. Accurate phylogenetic classification of DNA fragments based onsequence composition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequencemore » characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.« less

  2. Phylogenetic classification of bony fishes.

    PubMed

    Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo

    2017-07-06

    Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution

  3. Accurate Arabic Script Language/Dialect Classification

    DTIC Science & Technology

    2014-01-01

    Army Research Laboratory Accurate Arabic Script Language/Dialect Classification by Stephen C. Tratz ARL-TR-6761 January 2014 Approved for public...1197 ARL-TR-6761 January 2014 Accurate Arabic Script Language/Dialect Classification Stephen C. Tratz Computational and Information Sciences...Include area code) Standard Form 298 (Rev. 8/98) Prescribed by ANSI Std. Z39.18 January 2014 Final Accurate Arabic Script Language/Dialect Classification

  4. Concepts of Classification and Taxonomy Phylogenetic Classification

    NASA Astrophysics Data System (ADS)

    Fraix-Burnet, D.

    2016-05-01

    Phylogenetic approaches to classification have been heavily developed in biology by bioinformaticians. But these techniques have applications in other fields, in particular in linguistics. Their main characteristics is to search for relationships between the objects or species in study, instead of grouping them by similarity. They are thus rather well suited for any kind of evolutionary objects. For nearly fifteen years, astrocladistics has explored the use of Maximum Parsimony (or cladistics) for astronomical objects like galaxies or globular clusters. In this lesson we will learn how it works.

  5. A Functional-Phylogenetic Classification System for Transmembrane Solute Transporters

    PubMed Central

    Saier, Milton H.

    2000-01-01

    A comprehensive classification system for transmembrane molecular transporters has been developed and recently approved by the transport panel of the nomenclature committee of the International Union of Biochemistry and Molecular Biology. This system is based on (i) transporter class and subclass (mode of transport and energy coupling mechanism), (ii) protein phylogenetic family and subfamily, and (iii) substrate specificity. Almost all of the more than 250 identified families of transporters include members that function exclusively in transport. Channels (115 families), secondary active transporters (uniporters, symporters, and antiporters) (78 families), primary active transporters (23 families), group translocators (6 families), and transport proteins of ill-defined function or of unknown mechanism (51 families) constitute distinct categories. Transport mode and energy coupling prove to be relatively immutable characteristics and therefore provide primary bases for classification. Phylogenetic grouping reflects structure, function, mechanism, and often substrate specificity and therefore provides a reliable secondary basis for classification. Substrate specificity and polarity of transport prove to be more readily altered during evolutionary history and therefore provide a tertiary basis for classification. With very few exceptions, a phylogenetic family of transporters includes members that function by a single transport mode and energy coupling mechanism, although a variety of substrates may be transported, sometimes with either inwardly or outwardly directed polarity. In this review, I provide cross-referencing of well-characterized constituent transporters according to (i) transport mode, (ii) energy coupling mechanism, (iii) phylogenetic grouping, and (iv) substrates transported. The structural features and distribution of recognized family members throughout the living world are also evaluated. The tabulations should facilitate familial and functional

  6. Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach

    PubMed Central

    Reaz, Rezwana; Bayzid, Md. Shamsuzzoha; Rahman, M. Sohel

    2014-01-01

    Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets. PMID:25117474

  7. Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.

    PubMed

    Hernandez, Troy; Yang, Jie

    2016-10-01

    The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.

  8. Short interspersed elements (SINEs) in plants: origin, classification, and use as phylogenetic markers.

    PubMed

    Deragon, Jean-Marc; Zhang, Xiaoyu

    2006-12-01

    Short interspersed elements (SINEs) are a class of dispersed mobile sequences that use RNA as an intermediate in an amplification process called retroposition. The presence-absence of a SINE at a given locus has been used as a meaningful classification criterion to evaluate phylogenetic relations among species. We review here recent developments in the characterisation of plant SINEs and their use as molecular makers to retrace phylogenetic relations among wild and cultivated Oryza and Brassica species. In Brassicaceae, further use of SINE markers is limited by our partial knowledge of endogenous SINE families (their origin and evolution histories) and by the absence of a clear classification. To solve this problem, phylogenetic relations among all known Brassicaceae SINEs were analyzed and a new classification, grouping SINEs in 15 different families, is proposed. The relative age and size of each Brassicaceae SINE family was evaluated and new phylogenetically supported subfamilies were described. We also present evidence suggesting that new potentially active SINEs recently emerged in Brassica oleracea from the shuffling of preexisting SINE portions. Finally, the comparative evolution history of SINE families present in Arabidopsis thaliana and Brassica oleracea revealed that SINEs were in general more active in the Brassica lineage. The importance of these new data for the use of Brassicaceae SINEs as molecular markers in future applications is discussed.

  9. Classification of Phylogenetic Profiles for Protein Function Prediction: An SVM Approach

    NASA Astrophysics Data System (ADS)

    Kotaru, Appala Raju; Joshi, Ramesh C.

    Predicting the function of an uncharacterized protein is a major challenge in post-genomic era due to problems complexity and scale. Having knowledge of protein function is a crucial link in the development of new drugs, better crops, and even the development of biochemicals such as biofuels. Recently numerous high-throughput experimental procedures have been invented to investigate the mechanisms leading to the accomplishment of a protein’s function and Phylogenetic profile is one of them. Phylogenetic profile is a way of representing a protein which encodes evolutionary history of proteins. In this paper we proposed a method for classification of phylogenetic profiles using supervised machine learning method, support vector machine classification along with radial basis function as kernel for identifying functionally linked proteins. We experimentally evaluated the performance of the classifier with the linear kernel, polynomial kernel and compared the results with the existing tree kernel. In our study we have used proteins of the budding yeast saccharomyces cerevisiae genome. We generated the phylogenetic profiles of 2465 yeast genes and for our study we used the functional annotations that are available in the MIPS database. Our experiments show that the performance of the radial basis kernel is similar to polynomial kernel is some functional classes together are better than linear, tree kernel and over all radial basis kernel outperformed the polynomial kernel, linear kernel and tree kernel. In analyzing these results we show that it will be feasible to make use of SVM classifier with radial basis function as kernel to predict the gene functionality using phylogenetic profiles.

  10. Unrealistic phylogenetic trees may improve phylogenetic footprinting.

    PubMed

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2017-06-01

    The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  11. Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes

    PubMed Central

    Liu, Kuan-Liang; Porras-Alfaro, Andrea; Eichorst, Stephanie A.

    2012-01-01

    Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp). PMID:22194300

  12. Phylogenetic classification of Aureobasidium pullulans strains for production of pullulan and xylanase

    USDA-ARS?s Scientific Manuscript database

    This study tests the hypothesis that phylogenetic classification can predict whether A. pullulans strains will produce useful levels of the commercial polysaccharide, pullulan, or the valuable enzyme, xylanase. To test this hypothesis, 19 strains of A. pullulans with previously described phenotypes...

  13. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    PubMed

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial

  14. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    PubMed Central

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for

  15. Phylogenetic Analysis and Classification of the Fungal bHLH Domain

    PubMed Central

    Sailsbery, Joshua K.; Atchley, William R.; Dean, Ralph A.

    2012-01-01

    The basic Helix-Loop-Helix (bHLH) domain is an essential highly conserved DNA-binding domain found in many transcription factors in all eukaryotic organisms. The bHLH domain has been well studied in the Animal and Plant Kingdoms but has yet to be characterized within Fungi. Herein, we obtained and evaluated the phylogenetic relationship of 490 fungal-specific bHLH containing proteins from 55 whole genome projects composed of 49 Ascomycota and 6 Basidiomycota organisms. We identified 12 major groupings within Fungi (F1–F12); identifying conserved motifs and functions specific to each group. Several classification models were built to distinguish the 12 groups and elucidate the most discerning sites in the domain. Performance testing on these models, for correct group classification, resulted in a maximum sensitivity and specificity of 98.5% and 99.8%, respectively. We identified 12 highly discerning sites and incorporated those into a set of rules (simplified model) to classify sequences into the correct group. Conservation of amino acid sites and phylogenetic analyses established that like plant bHLH proteins, fungal bHLH–containing proteins are most closely related to animal Group B. The models used in these analyses were incorporated into a software package, the source code for which is available at www.fungalgenomics.ncsu.edu. PMID:22114358

  16. Phylogenetic classification of Cordyceps and the clavicipitaceous fungi

    PubMed Central

    Sung, Gi-Ho; Hywel-Jones, Nigel L.; Sung, Jae-Mo; Luangsa-ard, J. Jennifer; Shrestha, Bhushan; Spatafora, Joseph W.

    2007-01-01

    Cordyceps, comprising over 400 species, was historically classified in the Clavicipitaceae, based on cylindrical asci, thickened ascus apices and filiform ascospores, which often disarticulate into part-spores. Cordyceps was characterized by the production of well-developed often stipitate stromata and an ecology as a pathogen of arthropods and Elaphomyces with infrageneric classifications emphasizing arrangement of perithecia, ascospore morphology and host affiliation. To refine the classification of Cordyceps and the Clavicipitaceae, the phylogenetic relationships of 162 taxa were estimated based on analyses consisting of five to seven loci, including the nuclear ribosomal small and large subunits (nrSSU and nrLSU), the elongation factor 1α (tef1), the largest and the second largest subunits of RNA polymerase II (rpb1 and rpb2), β-tubulin (tub), and mitochondrial ATP6 (atp6). Our results strongly support the existence of three clavicipitaceous clades and reject the monophyly of both Cordyceps and Clavicipitaceae. Most diagnostic characters used in current classifications of Cordyceps (e.g., arrangement of perithecia, ascospore fragmentation, etc.) were not supported as being phylogenetically informative; the characters that were most consistent with the phylogeny were texture, pigmentation and morphology of stromata. Therefore, we revise the taxonomy of Cordyceps and the Clavicipitaceae to be consistent with the multi-gene phylogeny. The family Cordycipitaceae is validated based on the type of Cordyceps, C. militaris, and includes most Cordyceps species that possess brightly coloured, fleshy stromata. The new family Ophiocordycipitaceae is proposed based on Ophiocordyceps Petch, which we emend. The majority of species in this family produce darkly pigmented, tough to pliant stromata that often possess aperithecial apices. The new genus Elaphocordyceps is proposed for a subclade of the Ophiocordycipitaceae, which includes all species of Cordyceps that parasitize

  17. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    PubMed

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.

  18. Towards a Formal Genealogical Classification of the Lezgian Languages (North Caucasus): Testing Various Phylogenetic Methods on Lexical Data

    PubMed Central

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456

  19. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

    PubMed

    Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal

    2012-01-01

    Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of

  20. The Transporter Classification Database: recent advances.

    PubMed

    Saier, Milton H; Yen, Ming Ren; Noto, Keith; Tamang, Dorjee G; Elkan, Charles

    2009-01-01

    The Transporter Classification Database (TCDB), freely accessible at http://www.tcdb.org, is a relational database containing sequence, structural, functional and evolutionary information about transport systems from a variety of living organisms, based on the International Union of Biochemistry and Molecular Biology-approved transporter classification (TC) system. It is a curated repository for factual information compiled largely from published references. It uses a functional/phylogenetic system of classification, and currently encompasses about 5000 representative transporters and putative transporters in more than 500 families. We here describe novel software designed to support and extend the usefulness of TCDB. Our recent efforts render it more user friendly, incorporate machine learning to input novel data in a semiautomatic fashion, and allow analyses that are more accurate and less time consuming. The availability of these tools has resulted in recognition of distant phylogenetic relationships and tremendous expansion of the information available to TCDB users.

  1. Phylogenetic classification of the world's tropical forests.

    PubMed

    Slik, J W Ferry; Franklin, Janet; Arroyo-Rodríguez, Víctor; Field, Richard; Aguilar, Salomon; Aguirre, Nikolay; Ahumada, Jorge; Aiba, Shin-Ichiro; Alves, Luciana F; K, Anitha; Avella, Andres; Mora, Francisco; Aymard C, Gerardo A; Báez, Selene; Balvanera, Patricia; Bastian, Meredith L; Bastin, Jean-François; Bellingham, Peter J; van den Berg, Eduardo; da Conceição Bispo, Polyanna; Boeckx, Pascal; Boehning-Gaese, Katrin; Bongers, Frans; Boyle, Brad; Brambach, Fabian; Brearley, Francis Q; Brown, Sandra; Chai, Shauna-Lee; Chazdon, Robin L; Chen, Shengbin; Chhang, Phourin; Chuyong, George; Ewango, Corneille; Coronado, Indiana M; Cristóbal-Azkarate, Jurgi; Culmsee, Heike; Damas, Kipiro; Dattaraja, H S; Davidar, Priya; DeWalt, Saara J; Din, Hazimah; Drake, Donald R; Duque, Alvaro; Durigan, Giselda; Eichhorn, Karl; Eler, Eduardo Schmidt; Enoki, Tsutomu; Ensslin, Andreas; Fandohan, Adandé Belarmain; Farwig, Nina; Feeley, Kenneth J; Fischer, Markus; Forshed, Olle; Garcia, Queila Souza; Garkoti, Satish Chandra; Gillespie, Thomas W; Gillet, Jean-Francois; Gonmadje, Christelle; Granzow-de la Cerda, Iñigo; Griffith, Daniel M; Grogan, James; Hakeem, Khalid Rehman; Harris, David J; Harrison, Rhett D; Hector, Andy; Hemp, Andreas; Homeier, Jürgen; Hussain, M Shah; Ibarra-Manríquez, Guillermo; Hanum, I Faridah; Imai, Nobuo; Jansen, Patrick A; Joly, Carlos Alfredo; Joseph, Shijo; Kartawinata, Kuswata; Kearsley, Elizabeth; Kelly, Daniel L; Kessler, Michael; Killeen, Timothy J; Kooyman, Robert M; Laumonier, Yves; Laurance, Susan G; Laurance, William F; Lawes, Michael J; Letcher, Susan G; Lindsell, Jeremy; Lovett, Jon; Lozada, Jose; Lu, Xinghui; Lykke, Anne Mette; Mahmud, Khairil Bin; Mahayani, Ni Putu Diana; Mansor, Asyraf; Marshall, Andrew R; Martin, Emanuel H; Calderado Leal Matos, Darley; Meave, Jorge A; Melo, Felipe P L; Mendoza, Zhofre Huberto Aguirre; Metali, Faizah; Medjibe, Vincent P; Metzger, Jean Paul; Metzker, Thiago; Mohandass, D; Munguía-Rosas, Miguel A; Muñoz, Rodrigo; Nurtjahy, Eddy; de Oliveira, Eddie Lenza; Onrizal; Parolin, Pia; Parren, Marc; Parthasarathy, N; Paudel, Ekananda; Perez, Rolando; Pérez-García, Eduardo A; Pommer, Ulf; Poorter, Lourens; Qie, Lan; Piedade, Maria Teresa F; Pinto, José Roberto Rodrigues; Poulsen, Axel Dalberg; Poulsen, John R; Powers, Jennifer S; Prasad, Rama Chandra; Puyravaud, Jean-Philippe; Rangel, Orlando; Reitsma, Jan; Rocha, Diogo S B; Rolim, Samir; Rovero, Francesco; Rozak, Andes; Ruokolainen, Kalle; Rutishauser, Ervan; Rutten, Gemma; Mohd Said, Mohd Nizam; Saiter, Felipe Z; Saner, Philippe; Santos, Braulio; Dos Santos, João Roberto; Sarker, Swapan Kumar; Schmitt, Christine B; Schoengart, Jochen; Schulze, Mark; Sheil, Douglas; Sist, Plinio; Souza, Alexandre F; Spironello, Wilson Roberto; Sposito, Tereza; Steinmetz, Robert; Stevart, Tariq; Suganuma, Marcio Seiji; Sukri, Rahayu; Sultana, Aisha; Sukumar, Raman; Sunderland, Terry; Supriyadi; Suresh, H S; Suzuki, Eizi; Tabarelli, Marcelo; Tang, Jianwei; Tanner, Ed V J; Targhetta, Natalia; Theilade, Ida; Thomas, Duncan; Timberlake, Jonathan; de Morisson Valeriano, Márcio; van Valkenburg, Johan; Van Do, Tran; Van Sam, Hoang; Vandermeer, John H; Verbeeck, Hans; Vetaas, Ole Reidar; Adekunle, Victor; Vieira, Simone A; Webb, Campbell O; Webb, Edward L; Whitfeld, Timothy; Wich, Serge; Williams, John; Wiser, Susan; Wittmann, Florian; Yang, Xiaobo; Adou Yao, C Yves; Yap, Sandra L; Zahawi, Rakan A; Zakaria, Rahmad; Zang, Runguo

    2018-02-20

    Knowledge about the biogeographic affinities of the world's tropical forests helps to better understand regional differences in forest structure, diversity, composition, and dynamics. Such understanding will enable anticipation of region-specific responses to global environmental change. Modern phylogenies, in combination with broad coverage of species inventory data, now allow for global biogeographic analyses that take species evolutionary distance into account. Here we present a classification of the world's tropical forests based on their phylogenetic similarity. We identify five principal floristic regions and their floristic relationships: ( i ) Indo-Pacific, ( ii ) Subtropical, ( iii ) African, ( iv ) American, and ( v ) Dry forests. Our results do not support the traditional neo- versus paleotropical forest division but instead separate the combined American and African forests from their Indo-Pacific counterparts. We also find indications for the existence of a global dry forest region, with representatives in America, Africa, Madagascar, and India. Additionally, a northern-hemisphere Subtropical forest region was identified with representatives in Asia and America, providing support for a link between Asian and American northern-hemisphere forests. Copyright © 2018 the Author(s). Published by PNAS.

  2. Accurate crop classification using hierarchical genetic fuzzy rule-based systems

    NASA Astrophysics Data System (ADS)

    Topaloglou, Charalampos A.; Mylonas, Stelios K.; Stavrakoudis, Dimitris G.; Mastorocostas, Paris A.; Theocharis, John B.

    2014-10-01

    This paper investigates the effectiveness of an advanced classification system for accurate crop classification using very high resolution (VHR) satellite imagery. Specifically, a recently proposed genetic fuzzy rule-based classification system (GFRBCS) is employed, namely, the Hierarchical Rule-based Linguistic Classifier (HiRLiC). HiRLiC's model comprises a small set of simple IF-THEN fuzzy rules, easily interpretable by humans. One of its most important attributes is that its learning algorithm requires minimum user interaction, since the most important learning parameters affecting the classification accuracy are determined by the learning algorithm automatically. HiRLiC is applied in a challenging crop classification task, using a SPOT5 satellite image over an intensively cultivated area in a lake-wetland ecosystem in northern Greece. A rich set of higher-order spectral and textural features is derived from the initial bands of the (pan-sharpened) image, resulting in an input space comprising 119 features. The experimental analysis proves that HiRLiC compares favorably to other interpretable classifiers of the literature, both in terms of structural complexity and classification accuracy. Its testing accuracy was very close to that obtained by complex state-of-the-art classification systems, such as the support vector machines (SVM) and random forest (RF) classifiers. Nevertheless, visual inspection of the derived classification maps shows that HiRLiC is characterized by higher generalization properties, providing more homogeneous classifications that the competitors. Moreover, the runtime requirements for producing the thematic map was orders of magnitude lower than the respective for the competitors.

  3. The COG database: new developments in phylogenetic classification of proteins from complete genomes

    PubMed Central

    Tatusov, Roman L.; Natale, Darren A.; Garkavtsev, Igor V.; Tatusova, Tatiana A.; Shankavaram, Uma T.; Rao, Bachoti S.; Kiryutin, Boris; Galperin, Michael Y.; Fedorova, Natalie D.; Koonin, Eugene V.

    2001-01-01

    The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih.gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis. PMID:11125040

  4. Phylogenetic classification of the world’s tropical forests

    PubMed Central

    Franklin, Janet; Arroyo-Rodríguez, Víctor; Field, Richard; Aguilar, Salomon; Aguirre, Nikolay; Ahumada, Jorge; Aiba, Shin-Ichiro; K, Anitha; Avella, Andres; Mora, Francisco; Aymard C., Gerardo A.; Báez, Selene; Balvanera, Patricia; Bastian, Meredith L.; Bastin, Jean-François; Bellingham, Peter J.; van den Berg, Eduardo; da Conceição Bispo, Polyanna; Boeckx, Pascal; Boehning-Gaese, Katrin; Bongers, Frans; Boyle, Brad; Brearley, Francis Q.; Brown, Sandra; Chai, Shauna-Lee; Chazdon, Robin L.; Chen, Shengbin; Chhang, Phourin; Chuyong, George; Ewango, Corneille; Coronado, Indiana M.; Cristóbal-Azkarate, Jurgi; Culmsee, Heike; Damas, Kipiro; Dattaraja, H. S.; Davidar, Priya; DeWalt, Saara J.; Din, Hazimah; Drake, Donald R.; Durigan, Giselda; Eichhorn, Karl; Eler, Eduardo Schmidt; Enoki, Tsutomu; Ensslin, Andreas; Fandohan, Adandé Belarmain; Farwig, Nina; Feeley, Kenneth J.; Fischer, Markus; Forshed, Olle; Garcia, Queila Souza; Garkoti, Satish Chandra; Gillespie, Thomas W.; Gillet, Jean-Francois; Gonmadje, Christelle; Granzow-de la Cerda, Iñigo; Griffith, Daniel M.; Grogan, James; Hakeem, Khalid Rehman; Harris, David J.; Harrison, Rhett D.; Hector, Andy; Hemp, Andreas; Hussain, M. Shah; Ibarra-Manríquez, Guillermo; Hanum, I. Faridah; Imai, Nobuo; Jansen, Patrick A.; Joly, Carlos Alfredo; Joseph, Shijo; Kartawinata, Kuswata; Kearsley, Elizabeth; Kelly, Daniel L.; Kessler, Michael; Killeen, Timothy J.; Kooyman, Robert M.; Laumonier, Yves; Laurance, William F.; Lawes, Michael J.; Letcher, Susan G.; Lovett, Jon; Lozada, Jose; Lu, Xinghui; Lykke, Anne Mette; Mahmud, Khairil Bin; Mahayani, Ni Putu Diana; Mansor, Asyraf; Marshall, Andrew R.; Martin, Emanuel H.; Calderado Leal Matos, Darley; Meave, Jorge A.; Melo, Felipe P. L.; Mendoza, Zhofre Huberto Aguirre; Metali, Faizah; Medjibe, Vincent P.; Metzger, Jean Paul; Metzker, Thiago; Mohandass, D.; Munguía-Rosas, Miguel A.; Muñoz, Rodrigo; Nurtjahy, Eddy; de Oliveira, Eddie Lenza; Onrizal; Parolin, Pia; Parren, Marc; Parthasarathy, N.; Paudel, Ekananda; Perez, Rolando; Pérez-García, Eduardo A.; Pommer, Ulf; Poorter, Lourens; Qie, Lan; Piedade, Maria Teresa F.; Pinto, José Roberto Rodrigues; Poulsen, Axel Dalberg; Poulsen, John R.; Powers, Jennifer S.; Prasad, Rama Chandra; Puyravaud, Jean-Philippe; Rangel, Orlando; Reitsma, Jan; Rocha, Diogo S. B.; Rolim, Samir; Rovero, Francesco; Ruokolainen, Kalle; Rutishauser, Ervan; Rutten, Gemma; Mohd. Said, Mohd. Nizam; Saiter, Felipe Z.; Saner, Philippe; Santos, Braulio; dos Santos, João Roberto; Sarker, Swapan Kumar; Schoengart, Jochen; Schulze, Mark; Sheil, Douglas; Sist, Plinio; Souza, Alexandre F.; Spironello, Wilson Roberto; Sposito, Tereza; Steinmetz, Robert; Stevart, Tariq; Suganuma, Marcio Seiji; Sukri, Rahayu; Sukumar, Raman; Sunderland, Terry; Supriyadi; Suresh, H. S.; Suzuki, Eizi; Tabarelli, Marcelo; Tang, Jianwei; Tanner, Ed V. J.; Targhetta, Natalia; Theilade, Ida; Thomas, Duncan; Timberlake, Jonathan; de Morisson Valeriano, Márcio; van Valkenburg, Johan; Van Do, Tran; Van Sam, Hoang; Vandermeer, John H.; Verbeeck, Hans; Vetaas, Ole Reidar; Adekunle, Victor; Vieira, Simone A.; Webb, Campbell O.; Webb, Edward L.; Whitfeld, Timothy; Wich, Serge; Williams, John; Wiser, Susan; Wittmann, Florian; Yang, Xiaobo; Adou Yao, C. Yves; Yap, Sandra L.; Zahawi, Rakan A.; Zakaria, Rahmad; Zang, Runguo

    2018-01-01

    Knowledge about the biogeographic affinities of the world’s tropical forests helps to better understand regional differences in forest structure, diversity, composition, and dynamics. Such understanding will enable anticipation of region-specific responses to global environmental change. Modern phylogenies, in combination with broad coverage of species inventory data, now allow for global biogeographic analyses that take species evolutionary distance into account. Here we present a classification of the world’s tropical forests based on their phylogenetic similarity. We identify five principal floristic regions and their floristic relationships: (i) Indo-Pacific, (ii) Subtropical, (iii) African, (iv) American, and (v) Dry forests. Our results do not support the traditional neo- versus paleotropical forest division but instead separate the combined American and African forests from their Indo-Pacific counterparts. We also find indications for the existence of a global dry forest region, with representatives in America, Africa, Madagascar, and India. Additionally, a northern-hemisphere Subtropical forest region was identified with representatives in Asia and America, providing support for a link between Asian and American northern-hemisphere forests. PMID:29432167

  5. Towards a phylogenetic classification of Leptothecata (Cnidaria, Hydrozoa)

    PubMed Central

    Maronna, Maximiliano M.; Miranda, Thaís P.; Peña Cantero, Álvaro L.; Barbeitos, Marcos S.; Marques, Antonio C.

    2016-01-01

    Leptothecata are hydrozoans whose hydranths are covered by perisarc and gonophores and whose medusae bear gonads on their radial canals. They develop complex polypoid colonies and exhibit considerable morphological variation among species with respect to growth, defensive structures and mode of development. For instance, several lineages within this order have lost the medusa stage. Depending on the author, traditional taxonomy in hydrozoans may be either polyp- or medusa-oriented. Therefore, the absence of the latter stage in some lineages may lead to very different classification schemes. Molecular data have proved useful in elucidating this taxonomic challenge. We analyzed a super matrix of new and published rRNA gene sequences (16S, 18S and 28S), employing newly proposed methods to measure branch support and improve phylogenetic signal. Our analysis recovered new clades not recognized by traditional taxonomy and corroborated some recently proposed taxa. We offer a thorough taxonomic revision of the Leptothecata, erecting new orders, suborders, infraorders and families. We also discuss the origination and diversification dynamics of the group from a macroevolutionary perspective. PMID:26821567

  6. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    PubMed Central

    Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

    2009-01-01

    Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884

  7. Evolutionary lineages of marine snails identified using molecular phylogenetics and geometric morphometric analysis of shells.

    PubMed

    Vaux, Felix; Trewick, Steven A; Crampton, James S; Marshall, Bruce A; Beu, Alan G; Hills, Simon F K; Morgan-Richards, Mary

    2018-06-15

    The relationship between morphology and inheritance is of perennial interest in evolutionary biology and palaeontology. Using three marine snail genera Penion, Antarctoneptunea and Kelletia, we investigate whether systematics based on shell morphology accurately reflect evolutionary lineages indicated by molecular phylogenetics. Members of these gastropod genera have been a taxonomic challenge due to substantial variation in shell morphology, conservative radular and soft tissue morphology, few known ecological differences, and geographical overlap between numerous species. Sampling all sixteen putative taxa identified across the three genera, we infer mitochondrial and nuclear ribosomal DNA phylogenetic relationships within the group, and compare this to variation in adult shell shape and size. Results of phylogenetic analysis indicate that each genus is monophyletic, although the status of some phylogenetically derived and likely more recently evolved taxa within Penion is uncertain. The recently described species P. lineatus is supported by genetic evidence. Morphology, captured using geometric morphometric analysis, distinguishes the genera and matches the molecular phylogeny, although using the same dataset, species and phylogenetic subclades are not identified with high accuracy. Overall, despite abundant variation, we find that shell morphology accurately reflects genus-level classification and the corresponding deep phylogenetic splits identified in this group of marine snails. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. Phylogeny and phylogenetic classification of the antbirds, ovenbirds, woodcreepers, and allies (Aves: Passeriformes: Infraorder Furnariides)

    USGS Publications Warehouse

    Moyle, R.G.; Chesser, R.T.; Brumfield, R.T.; Tello, J.G.; Marchese, D.J.; Cracraft, J.

    2009-01-01

    The infraorder Furnariides is a diverse group of suboscine passerine birds comprising a substantial component of the Neotropical avifauna. The included species encompass a broad array of morphologies and behaviours, making them appealing for evolutionary studies, but the size of the group (ca. 600 species) has limited well-sampled higher-level phylogenetic studies. Using DNA sequence data from the nuclear RAG-1 and RAG-2 exons, we undertook a phylogenetic analysis of the Furnariides sampling 124 (more than 88%) of the genera. Basal relationships among family-level taxa differed depending on phylogenetic method, but all topologies had little nodal support, mirroring the results from earlier studies in which discerning relationships at the base of the radiation was also difficult. In contrast, branch support for family-rank taxa and for many relationships within those clades was generally high. Our results support the Melanopareidae and Grallariidae as distinct from the Rhinocryptidae and Formicariidae, respectively. Within the Furnariides our data contradict some recent phylogenetic hypotheses and suggest that further study is needed to resolve these discrepancies. Of the few genera represented by multiple species, several were not monophyletic, indicating that additional systematic work remains within furnariine families and must include dense taxon sampling. We use this study as a basis for proposing a new phylogenetic classification for the group and in the process erect new family-group names for clades having high branch support across methods. ?? 2009 The Willi Hennig Society.

  9. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    PubMed

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  10. Classification and phylogenetic analysis of Chinese hawthorn assessed by plant and pollen morphology.

    PubMed

    Ma, S L Y; Lu, Y M

    2016-09-19

    The Chinese hawthorn (Crataegus pinnatifida Bge. var. major N.E.Br.) is uniquely originated in northern China. The ecological and horticultural importance of Chinese hawthorn is considerable and some varieties are valued for their fruit or medicine extracts. Its taxonomy and phylogeny remain poorly understood. Apart from general plant morphological traits, pollen is an important trait for the classification of plants and their evolutionary origin. However, few studies have investigated the pollen of Chinese hawthorn. Here, an analysis of plant and pollen morphological characteristics was conducted in 57 cultivars from the Shenyang region. Thirty plant morphological characters and nine pollen grain characters were investigated. The plant morphological analysis revealed that the coefficient of variation for 13 traits was >20%, which indicates a high degree of variability. We also found that the pollen grains varied greatly in size, shape (from prolate to perprolate), and exine pattern (striate-perforate predominantly). The number of apertures was typically three. Based on these findings, we suggest that pollen morphology associated with plant morphological traits can be used for classification and phylogenetic analysis of Chinese hawthorn cultivars. In sum, our results provide new insights and constitute a scientific basis for future studies on the classification and evolution of Chinese hawthorn.

  11. Cloning, in Vitro expression, and novel phylogenetic classification of a channel catfish estrogen receptor

    USGS Publications Warehouse

    Xia, Z.; Patino, R.; Gale, W.L.; Maule, A.G.; Densmore, L.D.

    1999-01-01

    We obtained two channel catfish estrogen receptor (ccER) cDNA from liver of female fish using RT–PCR. The two fragments were identical in sequence except that the smaller one had an out-of-frame deletion in the E domain, suggesting the existence of ccER splice variants. The larger fragment was used to screen a cDNA library from liver of a prepubescent female. A cDNA was obtained that encoded a 581-amino-acid ER with a deduced molecular weight of 63.8 kDa. Extracts of COS-7 cells transfected with ccER cDNA bound estrogen with high affinity (Kd = 4.7 nM) and specificity. Maximum parsimony and Neighbor Joining analyses were used to generate a phylogenetic classification of ccER on the basis of 18 full-length ER sequences. The tree suggested the existence of two major ER branches. One branch contained two clearly divergent clades which included all piscine ER (except Japanese eel ER) and all tetrapod ERα, respectively. The second major branch contained the eel ER and the mammalian ERβ. The high degree of divergence between the eel ER and mammalian ERβ suggested that they also represent distinct piscine and tetrapod ER. These data suggest that ERα and ERβ are present throughout vertebrates and that these two major ER types evolved by duplication of an ancestral ER gene. Sequence alignments with other members of the nuclear hormone receptor superfamily indicated the presence of 8 amino acids in the E domain that align exclusively among ER. Four of these amino acids have not received prior research attention and their function is unknown. The novel finding of putative ER splice variants in a nonmammalian vertebrate and the novel phylogenetic classification of ER offer new perspectives in understanding the diversification and function of ER.

  12. Phylogenetic relationships and classification of the Holarctic family Leuciscidae (Cypriniformes: Cyprinoidei).

    PubMed

    Schönhuth, Susana; Vukic, Jasna; Sanda, Radek; Yang, Lei; Mayden, Richard L

    2018-06-15

    The phylogenetic relationships and classification of the freshwater fish order Cypriniformes, like many other species-rich groups of vertebrates, has evolved over time with some consistency and inconsistencies of relationships across various studies. Within Cypriniformes, the Holarctic family Leuciscidae is one of the most widely distributed and highly diverse monophyletic groups of cyprinoids. Despite several studies conducted on this group, alternative hypotheses exist as to the composition and relationships within Leuciscidae. Here we assess the extent, composition, phylogenetic relationships, and taxonomy of this highly diverse group of fishes, using multiple mitochondrial and nuclear loci and a comprehensive and dense taxonomic sampling. Analyses of 418 specimens (410 species) resolve a well-supported Leuciscidae including 362 specimens (358 taxa) in six well-supported subfamilies/major clades: Pseudaspininae/Far East Asian clade (FEA); Laviniinae/North American Western clade (WC); Plagopterinae/North American Creek Chub-Plagopterin clade (CC-P); Leuciscinae/Eurasian Old World clade (OW) (minus Phoxinus) plus North American Notemigonus; Phoxininae/Eurasian Phoxinus clade (PHX); and Pogonichthyinae/North American clade (NA) including all remaining leuciscids. Within Leuciscidae, neither the traditional phoxinins (Phoxinus, FEA, Nearctic genera) nor all Nearctic genera (minus Notemigonus) are resolved as monophyletic; whereas the WC and CC-P form two independent lineages from remaining North American cyprinoids. A close relationship exists between Eurasian Phoxinus, NA, and OW clades, while FEA is the sister group to all remaining Leuciscidae. Major lineages resolved within these six subfamilies are mostly congruent with some previous studies. Our results suggests a complex evolutionary history of this diverse and widespread group of fishes. Copyright © 2018. Published by Elsevier Inc.

  13. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

    PubMed Central

    Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

    2009-01-01

    Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124

  14. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo

    2018-06-01

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  15. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

    PubMed

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

    2018-06-05

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

  16. Classification of HCV and HIV-1 Sequences with the Branching Index

    PubMed Central

    Hraber, Peter; Kuiken, Carla; Waugh, Mark; Geer, Shaun; Bruno, William J.; Leitner, Thomas

    2009-01-01

    SUMMARY Classification of viral sequences should be fast, objective, accurate, and reproducible. Most methods that classify sequences use either pairwise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the branching index for subtype classification in HCV and HIV-1. Pairs of BI values with known positive and negative test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signal that groups reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false positive and false negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not uniquely represent any known subtype. Web-based services for viral subtype classification with the branching index are available online. PMID:18753218

  17. Accurate Detection of Dysmorphic Nuclei Using Dynamic Programming and Supervised Classification.

    PubMed

    Verschuuren, Marlies; De Vylder, Jonas; Catrysse, Hannes; Robijns, Joke; Philips, Wilfried; De Vos, Winnok H

    2017-01-01

    A vast array of pathologies is typified by the presence of nuclei with an abnormal morphology. Dysmorphic nuclear phenotypes feature dramatic size changes or foldings, but also entail much subtler deviations such as nuclear protrusions called blebs. Due to their unpredictable size, shape and intensity, dysmorphic nuclei are often not accurately detected in standard image analysis routines. To enable accurate detection of dysmorphic nuclei in confocal and widefield fluorescence microscopy images, we have developed an automated segmentation algorithm, called Blebbed Nuclei Detector (BleND), which relies on two-pass thresholding for initial nuclear contour detection, and an optimal path finding algorithm, based on dynamic programming, for refining these contours. Using a robust error metric, we show that our method matches manual segmentation in terms of precision and outperforms state-of-the-art nuclear segmentation methods. Its high performance allowed for building and integrating a robust classifier that recognizes dysmorphic nuclei with an accuracy above 95%. The combined segmentation-classification routine is bound to facilitate nucleus-based diagnostics and enable real-time recognition of dysmorphic nuclei in intelligent microscopy workflows.

  18. Accurate Detection of Dysmorphic Nuclei Using Dynamic Programming and Supervised Classification

    PubMed Central

    Verschuuren, Marlies; De Vylder, Jonas; Catrysse, Hannes; Robijns, Joke; Philips, Wilfried

    2017-01-01

    A vast array of pathologies is typified by the presence of nuclei with an abnormal morphology. Dysmorphic nuclear phenotypes feature dramatic size changes or foldings, but also entail much subtler deviations such as nuclear protrusions called blebs. Due to their unpredictable size, shape and intensity, dysmorphic nuclei are often not accurately detected in standard image analysis routines. To enable accurate detection of dysmorphic nuclei in confocal and widefield fluorescence microscopy images, we have developed an automated segmentation algorithm, called Blebbed Nuclei Detector (BleND), which relies on two-pass thresholding for initial nuclear contour detection, and an optimal path finding algorithm, based on dynamic programming, for refining these contours. Using a robust error metric, we show that our method matches manual segmentation in terms of precision and outperforms state-of-the-art nuclear segmentation methods. Its high performance allowed for building and integrating a robust classifier that recognizes dysmorphic nuclei with an accuracy above 95%. The combined segmentation-classification routine is bound to facilitate nucleus-based diagnostics and enable real-time recognition of dysmorphic nuclei in intelligent microscopy workflows. PMID:28125723

  19. Taxonomic update on proposed nomenclature and classification changes for bacteria of medical importance, 2015.

    PubMed

    Janda, J Michael

    2016-10-01

    A key aspect of medical, public health, and diagnostic microbiology laboratories is the accurate and rapid reporting and communication regarding infectious agents of clinical significance. Microbial taxonomy in the age of molecular diagnostics and phylogenetics creates changes in taxonomy at a rapid rate further complicating this process. This update focuses on the description of new species and classification changes proposed in 2015. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Phylogenetic classification of Aureobasidium pullulans strains for production of feruloyl esterase

    USDA-ARS?s Scientific Manuscript database

    The objective was to phylogenetically classify diverse strains of A. pullulans and determine their production of feruloyl esterase. Seventeen strains from the A. pullulans literature were phylogenetically classified. Phenotypic traits of color variation and endo-ß-1,4-xylanase overproduction were as...

  1. Photometric brown-dwarf classification. I. A method to identify and accurately classify large samples of brown dwarfs without spectroscopy

    NASA Astrophysics Data System (ADS)

    Skrzypek, N.; Warren, S. J.; Faherty, J. K.; Mortlock, D. J.; Burgasser, A. J.; Hewett, P. C.

    2015-02-01

    Aims: We present a method, named photo-type, to identify and accurately classify L and T dwarfs onto the standard spectral classification system using photometry alone. This enables the creation of large and deep homogeneous samples of these objects efficiently, without the need for spectroscopy. Methods: We created a catalogue of point sources with photometry in 8 bands, ranging from 0.75 to 4.6 μm, selected from an area of 3344 deg2, by combining SDSS, UKIDSS LAS, and WISE data. Sources with 13.0 0.8, were then classified by comparison against template colours of quasars, stars, and brown dwarfs. The L and T templates, spectral types L0 to T8, were created by identifying previously known sources with spectroscopic classifications, and fitting polynomial relations between colour and spectral type. Results: Of the 192 known L and T dwarfs with reliable photometry in the surveyed area and magnitude range, 189 are recovered by our selection and classification method. We have quantified the accuracy of the classification method both externally, with spectroscopy, and internally, by creating synthetic catalogues and accounting for the uncertainties. We find that, brighter than J = 17.5, photo-type classifications are accurate to one spectral sub-type, and are therefore competitive with spectroscopic classifications. The resultant catalogue of 1157 L and T dwarfs will be presented in a companion paper.

  2. The Impact of Media, Phylogenetic Classification, and E. coli Pathotypes on Biofilm Formation in Extraintestinal and Commensal E. coli From Humans and Animals.

    PubMed

    Nielsen, Daniel W; Klimavicz, James S; Cavender, Tia; Wannemuehler, Yvonne; Barbieri, Nicolle L; Nolan, Lisa K; Logue, Catherine M

    2018-01-01

    Extraintestinal pathogenic Escherichia coli (ExPEC) include avian pathogenic E. coli (APEC), neonatal meningitis E. coli (NMEC), and uropathogenic E. coli (UPEC) and are responsible for significant animal and human morbidity and mortality. This study sought to investigate if biofilm formation by ExPEC likely contributes to these losses since biofilms are associated with recurrent urinary tract infections, antibiotic resistance, and bacterial exchange of genetic material. Therefore, the goal of this study was to examine differences in biofilm formation among a collection of ExPEC and to ascertain if there is a relationship between their ability to produce biofilms and their assignment to phylogenetic groups in three media types - M63, diluted TSB, and BHI. Our results suggest that ExPEC produce relatively different levels of biofilm formation in the media tested as APEC (70.4%, p = 0.0064) and NMEC (84.4%, p = 0.0093) isolates were poor biofilm formers in minimal medium M63 while UPEC isolates produced significantly higher ODs under nutrient-limited conditions with 25% of strains producing strong biofilms in diluted TSB ( p = 0.0204). Additionally, E. coli phylogenetic assignment using Clermont's original and revised typing scheme demonstrated significant differences among the phylogenetic groups in the different media. When the original phylogenetic group isolates previously typed as group D were phylogenetically typed under the revised scheme and examined, they showed substantial variation in their ability to form biofilms, which may explain the significant values of revised phylogenetic groups E and F in M63 ( p = 0.0291, p = 0.0024). Our data indicates that biofilm formation is correlated with phylogenetic classification and subpathotype or commensal grouping of E. coli strains.

  3. Effects of Phylogenetic Tree Style on Student Comprehension

    NASA Astrophysics Data System (ADS)

    Dees, Jonathan Andrew

    Phylogenetic trees are powerful tools of evolutionary biology that have become prominent across the life sciences. Consequently, learning to interpret and reason from phylogenetic trees is now an essential component of biology education. However, students often struggle to understand these diagrams, even after explicit instruction. One factor that has been observed to affect student understanding of phylogenetic trees is style (i.e., diagonal or bracket). The goal of this dissertation research was to systematically explore effects of style on student interpretations and construction of phylogenetic trees in the context of an introductory biology course. Before instruction, students were significantly more accurate with bracket phylogenetic trees for a variety of interpretation and construction tasks. Explicit instruction that balanced the use of diagonal and bracket phylogenetic trees mitigated some, but not all, style effects. After instruction, students were significantly more accurate for interpretation tasks involving taxa relatedness and construction exercises when using the bracket style. Based on this dissertation research and prior studies on style effects, I advocate for introductory biology instructors to use only the bracket style. Future research should examine causes of style effects and variables other than style to inform the development of research-based instruction that best supports student understanding of phylogenetic trees.

  4. Phylogenetic relationships among arecoid palms (Arecaceae: Arecoideae)

    PubMed Central

    Baker, William J.; Norup, Maria V.; Clarkson, James J.; Couvreur, Thomas L. P.; Dowe, John L.; Lewis, Carl E.; Pintaud, Jean-Christophe; Savolainen, Vincent; Wilmot, Tomas; Chase, Mark W.

    2011-01-01

    Background and Aims The Arecoideae is the largest and most diverse of the five subfamilies of palms (Arecaceae/Palmae), containing >50 % of the species in the family. Despite its importance, phylogenetic relationships among Arecoideae are poorly understood. Here the most densely sampled phylogenetic analysis of Arecoideae available to date is presented. The results are used to test the current classification of the subfamily and to identify priority areas for future research. Methods DNA sequence data for the low-copy nuclear genes PRK and RPB2 were collected from 190 palm species, covering 103 (96 %) genera of Arecoideae. The data were analysed using the parsimony ratchet, maximum likelihood, and both likelihood and parsimony bootstrapping. Key Results and Conclusions Despite the recovery of paralogues and pseudogenes in a small number of taxa, PRK and RPB2 were both highly informative, producing well-resolved phylogenetic trees with many nodes well supported by bootstrap analyses. Simultaneous analyses of the combined data sets provided additional resolution and support. Two areas of incongruence between PRK and RPB2 were strongly supported by the bootstrap relating to the placement of tribes Chamaedoreeae, Iriarteeae and Reinhardtieae; the causes of this incongruence remain uncertain. The current classification within Arecoideae was strongly supported by the present data. Of the 14 tribes and 14 sub-tribes in the classification, only five sub-tribes from tribe Areceae (Basseliniinae, Linospadicinae, Oncospermatinae, Rhopalostylidinae and Verschaffeltiinae) failed to receive support. Three major higher level clades were strongly supported: (1) the RRC clade (Roystoneeae, Reinhardtieae and Cocoseae), (2) the POS clade (Podococceae, Oranieae and Sclerospermeae) and (3) the core arecoid clade (Areceae, Euterpeae, Geonomateae, Leopoldinieae, Manicarieae and Pelagodoxeae). However, new data sources are required to elucidate ambiguities that remain in phylogenetic

  5. High-resolution phylogenetic microbial community profiling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin

    Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less

  6. High-resolution phylogenetic microbial community profiling

    DOE PAGES

    Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin; ...

    2016-02-09

    Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less

  7. Phylogenetic system and zoogeography of the Plecoptera.

    PubMed

    Zwick, P

    2000-01-01

    Information about the phylogenetic relationships of Plecoptera is summarized. The few characters supporting monophyly of the order are outlined. Several characters of possible significance for the search for the closest relatives of the stoneflies are discussed, but the sister-group of the order remains unknown. Numerous characters supporting the presently recognized phylogenetic system of Plecoptera are presented, alternative classifications are discussed, and suggestions for future studies are made. Notes on zoogeography are appended. The order as such is old (Permian fossils), but phylogenetic relationships and global distribution patterns suggest that evolution of the extant suborders started with the breakup of Pangaea. There is evidence of extensive recent speciation in all parts of the world.

  8. Phylogenetic classification and the universal tree.

    PubMed

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.

  9. Phylogenetic versus functional signals in the evolution of form-function relationships in terrestrial vision.

    PubMed

    Motani, Ryosuke; Schmitz, Lars

    2011-08-01

    Phylogeny is deeply pertinent to evolutionary studies. Traits that perform a body function are expected to be strongly influenced by physical "requirements" of the function. We investigated if such traits exhibit phylogenetic signals, and, if so, how phylogenetic noises bias quantification of form-function relationships. A form-function system that is strongly influenced by physics, namely the relationship between eye morphology and visual optics in amniotes, was used. We quantified the correlation between form (i.e., eye morphology) and function (i.e., ocular optics) while varying the level of phylogenetic bias removal through adjusting Pagel's λ. Ocular soft-tissue dimensions exhibited the highest correlation with ocular optics when 1% of phylogenetic bias expected from Brownian motion was removed (i.e., λ= 0.01); the value for hard-tissue data were 8%. A small degree of phylogenetic bias therefore exists in morphology despite of the stringent functional constraints. We also devised a phylogenetically informed discriminant analysis and recorded the effects of phylogenetic bias on this method using the same data. Use of proper λ values during phylogenetic bias removal improved misidentification rates in resulting classifications when prior probabilities were assumed to be equal. Even a small degree of phylogenetic bias affected the classification resulting from phylogenetically informed discriminant analysis. © 2011 The Author(s). Evolution© 2011 The Society for the Study of Evolution.

  10. Phylogenetic convolutional neural networks in metagenomics.

    PubMed

    Fioravanti, Diego; Giarratano, Ylenia; Maggio, Valerio; Agostinelli, Claudio; Chierici, Marco; Jurman, Giuseppe; Furlanello, Cesare

    2018-03-08

    Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

  11. Obtaining Accurate Probabilities Using Classifier Calibration

    ERIC Educational Resources Information Center

    Pakdaman Naeini, Mahdi

    2016-01-01

    Learning probabilistic classification and prediction models that generate accurate probabilities is essential in many prediction and decision-making tasks in machine learning and data mining. One way to achieve this goal is to post-process the output of classification models to obtain more accurate probabilities. These post-processing methods are…

  12. Rooting phylogenetic trees under the coalescent model using site pattern probabilities.

    PubMed

    Tian, Yuan; Kubatko, Laura

    2017-12-19

    Phylogenetic tree inference is a fundamental tool to estimate ancestor-descendant relationships among different species. In phylogenetic studies, identification of the root - the most recent common ancestor of all sampled organisms - is essential for complete understanding of the evolutionary relationships. Rooted trees benefit most downstream application of phylogenies such as species classification or study of adaptation. Often, trees can be rooted by using outgroups, which are species that are known to be more distantly related to the sampled organisms than any other species in the phylogeny. However, outgroups are not always available in evolutionary research. In this study, we develop a new method for rooting species tree under the coalescent model, by developing a series of hypothesis tests for rooting quartet phylogenies using site pattern probabilities. The power of this method is examined by simulation studies and by application to an empirical North American rattlesnake data set. The method shows high accuracy across the simulation conditions considered, and performs well for the rattlesnake data. Thus, it provides a computationally efficient way to accurately root species-level phylogenies that incorporates the coalescent process. The method is robust to variation in substitution model, but is sensitive to the assumption of a molecular clock. Our study establishes a computationally practical method for rooting species trees that is more efficient than traditional methods. The method will benefit numerous evolutionary studies that require rooting a phylogenetic tree without having to specify outgroups.

  13. Revisiting the phylogeny of Bombacoideae (Malvaceae): Novel relationships, morphologically cohesive clades, and a new tribal classification based on multilocus phylogenetic analyses.

    PubMed

    Carvalho-Sobrinho, Jefferson G; Alverson, William S; Alcantara, Suzana; Queiroz, Luciano P; Mota, Aline C; Baum, David A

    2016-08-01

    Bombacoideae (Malvaceae) is a clade of deciduous trees with a marked dominance in many forests, especially in the Neotropics. The historical lack of a well-resolved phylogenetic framework for Bombacoideae hinders studies in this ecologically important group. We reexamined phylogenetic relationships in this clade based on a matrix of 6465 nuclear (ETS, ITS) and plastid (matK, trnL-trnF, trnS-trnG) DNA characters. We used maximum parsimony, maximum likelihood, and Bayesian inference to infer relationships among 108 species (∼70% of the total number of known species). We analyzed the evolution of selected morphological traits: trunk or branch prickles, calyx shape, endocarp type, seed shape, and seed number per fruit, using ML reconstructions of their ancestral states to identify possible synapomorphies for major clades. Novel phylogenetic relationships emerged from our analyses, including three major lineages marked by fruit or seed traits: the winged-seed clade (Bernoullia, Gyranthera, and Huberodendron), the spongy endocarp clade (Adansonia, Aguiaria, Catostemma, Cavanillesia, and Scleronema), and the Kapok clade (Bombax, Ceiba, Eriotheca, Neobuchia, Pachira, Pseudobombax, Rhodognaphalon, and Spirotheca). The Kapok clade, the most diverse lineage of the subfamily, includes sister relationships (i) between Pseudobombax and "Pochota fendleri" a historically incertae sedis taxon, and (ii) between the Paleotropical genera Bombax and Rhodognaphalon, implying just two bombacoid dispersals to the Old World, the other one involving Adansonia. This new phylogenetic framework offers new insights and a promising avenue for further evolutionary studies. In view of this information, we present a new tribal classification of the subfamily, accompanied by an identification key. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Cyber infrastructure for Fusarium: three integrated platforms supporting strain identification, phylogenetics, comparative genomics and knowledge sharing.

    PubMed

    Park, Bongsoo; Park, Jongsun; Cheong, Kyeong-Chae; Choi, Jaeyoung; Jung, Kyongyong; Kim, Donghan; Lee, Yong-Hwan; Ward, Todd J; O'Donnell, Kerry; Geiser, David M; Kang, Seogchan

    2011-01-01

    The fungal genus Fusarium includes many plant and/or animal pathogenic species and produces diverse toxins. Although accurate species identification is critical for managing such threats, it is difficult to identify Fusarium morphologically. Fortunately, extensive molecular phylogenetic studies, founded on well-preserved culture collections, have established a robust foundation for Fusarium classification. Genomes of four Fusarium species have been published with more being currently sequenced. The Cyber infrastructure for Fusarium (CiF; http://www.fusariumdb.org/) was built to support archiving and utilization of rapidly increasing data and knowledge and consists of Fusarium-ID, Fusarium Comparative Genomics Platform (FCGP) and Fusarium Community Platform (FCP). The Fusarium-ID archives phylogenetic marker sequences from most known species along with information associated with characterized isolates and supports strain identification and phylogenetic analyses. The FCGP currently archives five genomes from four species. Besides supporting genome browsing and analysis, the FCGP presents computed characteristics of multiple gene families and functional groups. The Cart/Favorite function allows users to collect sequences from Fusarium-ID and the FCGP and analyze them later using multiple tools without requiring repeated copying-and-pasting of sequences. The FCP is designed to serve as an online community forum for sharing and preserving accumulated experience and knowledge to support future research and education.

  15. Cyber infrastructure for Fusarium: three integrated platforms supporting strain identification, phylogenetics, comparative genomics and knowledge sharing

    PubMed Central

    Park, Bongsoo; Park, Jongsun; Cheong, Kyeong-Chae; Choi, Jaeyoung; Jung, Kyongyong; Kim, Donghan; Lee, Yong-Hwan; Ward, Todd J.; O'Donnell, Kerry; Geiser, David M.; Kang, Seogchan

    2011-01-01

    The fungal genus Fusarium includes many plant and/or animal pathogenic species and produces diverse toxins. Although accurate species identification is critical for managing such threats, it is difficult to identify Fusarium morphologically. Fortunately, extensive molecular phylogenetic studies, founded on well-preserved culture collections, have established a robust foundation for Fusarium classification. Genomes of four Fusarium species have been published with more being currently sequenced. The Cyber infrastructure for Fusarium (CiF; http://www.fusariumdb.org/) was built to support archiving and utilization of rapidly increasing data and knowledge and consists of Fusarium-ID, Fusarium Comparative Genomics Platform (FCGP) and Fusarium Community Platform (FCP). The Fusarium-ID archives phylogenetic marker sequences from most known species along with information associated with characterized isolates and supports strain identification and phylogenetic analyses. The FCGP currently archives five genomes from four species. Besides supporting genome browsing and analysis, the FCGP presents computed characteristics of multiple gene families and functional groups. The Cart/Favorite function allows users to collect sequences from Fusarium-ID and the FCGP and analyze them later using multiple tools without requiring repeated copying-and-pasting of sequences. The FCP is designed to serve as an online community forum for sharing and preserving accumulated experience and knowledge to support future research and education. PMID:21087991

  16. Phylogenetic classification of yeasts and related taxa within Pucciniomycotina.

    PubMed

    Wang, Q-M; Yurkov, A M; Göker, M; Lumbsch, H T; Leavitt, S D; Groenewald, M; Theelen, B; Liu, X-Z; Boekhout, T; Bai, F-Y

    2015-06-01

    Most small genera containing yeast species in the Pucciniomycotina (Basidiomycota, Fungi) are monophyletic, whereas larger genera including Bensingtonia, Rhodosporidium, Rhodotorula, Sporidiobolus and Sporobolomyces are polyphyletic. With the implementation of the "One Fungus = One Name" nomenclatural principle these polyphyletic genera were revised. Nine genera, namely Bannoa, Cystobasidiopsis, Colacogloea, Kondoa, Erythrobasidium, Rhodotorula, Sporobolomyces, Sakaguchia and Sterigmatomyces, were emended to include anamorphic and teleomorphic species based on the results obtained by a multi-gene phylogenetic analysis, phylogenetic network analyses, branch length-based methods, as well as morphological, physiological and biochemical comparisons. A new class Spiculogloeomycetes is proposed to accommodate the order Spiculogloeales. The new families Buckleyzymaceae with Buckleyzyma gen. nov., Chrysozymaceae with Chrysozyma gen. nov., Microsporomycetaceae with Microsporomyces gen. nov., Ruineniaceae with Ruinenia gen. nov., Symmetrosporaceae with Symmetrospora gen. nov., Colacogloeaceae and Sakaguchiaceae are proposed. The new genera Bannozyma, Buckleyzyma, Fellozyma, Hamamotoa, Hasegawazyma, Jianyunia, Rhodosporidiobolus, Oberwinklerozyma, Phenoliferia, Pseudobensingtonia, Pseudohyphozyma, Sampaiozyma, Slooffia, Spencerozyma, Trigonosporomyces, Udeniozyma, Vonarxula, Yamadamyces and Yunzhangia are proposed to accommodate species segregated from the genera Bensingtonia, Rhodosporidium, Rhodotorula, Sporidiobolus and Sporobolomyces. Ballistosporomyces is emended and reintroduced to include three Sporobolomyces species of the sasicola clade. A total of 111 new combinations are proposed in this study.

  17. Associations of Leaf Spectra with Genetic and Phylogenetic Variation in Oaks: Prospects for Remote Detection of Biodiversity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cavender-Bares, Jeannine; Meireles, Jose; Couture, John

    Species and phylogenetic lineages have evolved to differ in the way that they acquire and deploy resources, with consequences for their physiological, chemical and structural attributes, many of which can be detected using spectral reflectance form leaves. Recent technological advances for assessing optical properties of plants offer opportunities to detect functional traits of organisms and differentiate levels of biological organization across the tree of life. We connect leaf-level full range spectral data (400–2400 nm) of leaves to the hierarchical organization of plant diversity within the oak genus (Quercus) using field and greenhouse experiments in which environmental factors and plant agemore » are controlled. We show that spectral data significantly differentiate populations within a species and that spectral similarity is significantly associated with phylogenetic similarity among species. Furthermore, we show that hyperspectral information allows more accurate classification of taxa than spectrally-derived traits, which by definition are of lower dimensionality. Finally, model accuracy increases at higher levels in the hierarchical organization of plant diversity, such that we are able to better distinguish clades than species or populations. This pattern supports an evolutionary explanation for the degree of optical differentiation among plants and demonstrates potential for remote detection of genetic and phylogenetic diversity.« less

  18. Associations of Leaf Spectra with Genetic and Phylogenetic Variation in Oaks: Prospects for Remote Detection of Biodiversity

    DOE PAGES

    Cavender-Bares, Jeannine; Meireles, Jose; Couture, John; ...

    2016-03-09

    Species and phylogenetic lineages have evolved to differ in the way that they acquire and deploy resources, with consequences for their physiological, chemical and structural attributes, many of which can be detected using spectral reflectance form leaves. Recent technological advances for assessing optical properties of plants offer opportunities to detect functional traits of organisms and differentiate levels of biological organization across the tree of life. We connect leaf-level full range spectral data (400–2400 nm) of leaves to the hierarchical organization of plant diversity within the oak genus (Quercus) using field and greenhouse experiments in which environmental factors and plant agemore » are controlled. We show that spectral data significantly differentiate populations within a species and that spectral similarity is significantly associated with phylogenetic similarity among species. Furthermore, we show that hyperspectral information allows more accurate classification of taxa than spectrally-derived traits, which by definition are of lower dimensionality. Finally, model accuracy increases at higher levels in the hierarchical organization of plant diversity, such that we are able to better distinguish clades than species or populations. This pattern supports an evolutionary explanation for the degree of optical differentiation among plants and demonstrates potential for remote detection of genetic and phylogenetic diversity.« less

  19. Phylogenetic classification of yeasts and related taxa within Pucciniomycotina

    PubMed Central

    Wang, Q.-M.; Yurkov, A.M.; Göker, M.; Lumbsch, H.T.; Leavitt, S.D.; Groenewald, M.; Theelen, B.; Liu, X.-Z.; Boekhout, T.; Bai, F.-Y.

    2016-01-01

    Most small genera containing yeast species in the Pucciniomycotina (Basidiomycota, Fungi) are monophyletic, whereas larger genera including Bensingtonia, Rhodosporidium, Rhodotorula, Sporidiobolus and Sporobolomyces are polyphyletic. With the implementation of the “One Fungus = One Name” nomenclatural principle these polyphyletic genera were revised. Nine genera, namely Bannoa, Cystobasidiopsis, Colacogloea, Kondoa, Erythrobasidium, Rhodotorula, Sporobolomyces, Sakaguchia and Sterigmatomyces, were emended to include anamorphic and teleomorphic species based on the results obtained by a multi-gene phylogenetic analysis, phylogenetic network analyses, branch length-based methods, as well as morphological, physiological and biochemical comparisons. A new class Spiculogloeomycetes is proposed to accommodate the order Spiculogloeales. The new families Buckleyzymaceae with Buckleyzyma gen. nov., Chrysozymaceae with Chrysozyma gen. nov., Microsporomycetaceae with Microsporomyces gen. nov., Ruineniaceae with Ruinenia gen. nov., Symmetrosporaceae with Symmetrospora gen. nov., Colacogloeaceae and Sakaguchiaceae are proposed. The new genera Bannozyma, Buckleyzyma, Fellozyma, Hamamotoa, Hasegawazyma, Jianyunia, Rhodosporidiobolus, Oberwinklerozyma, Phenoliferia, Pseudobensingtonia, Pseudohyphozyma, Sampaiozyma, Slooffia, Spencerozyma, Trigonosporomyces, Udeniozyma, Vonarxula, Yamadamyces and Yunzhangia are proposed to accommodate species segregated from the genera Bensingtonia, Rhodosporidium, Rhodotorula, Sporidiobolus and Sporobolomyces. Ballistosporomyces is emended and reintroduced to include three Sporobolomyces species of the sasicola clade. A total of 111 new combinations are proposed in this study. PMID:26951631

  20. How Accurate and Robust Are the Phylogenetic Estimates of Austronesian Language Relationships?

    PubMed Central

    Greenhill, Simon J.; Drummond, Alexei J.; Gray, Russell D.

    2010-01-01

    We recently used computational phylogenetic methods on lexical data to test between two scenarios for the peopling of the Pacific. Our analyses of lexical data supported a pulse-pause scenario of Pacific settlement in which the Austronesian speakers originated in Taiwan around 5,200 years ago and rapidly spread through the Pacific in a series of expansion pulses and settlement pauses. We claimed that there was high congruence between traditional language subgroups and those observed in the language phylogenies, and that the estimated age of the Austronesian expansion at 5,200 years ago was consistent with the archaeological evidence. However, the congruence between the language phylogenies and the evidence from historical linguistics was not quantitatively assessed using tree comparison metrics. The robustness of the divergence time estimates to different calibration points was also not investigated exhaustively. Here we address these limitations by using a systematic tree comparison metric to calculate the similarity between the Bayesian phylogenetic trees and the subgroups proposed by historical linguistics, and by re-estimating the age of the Austronesian expansion using only the most robust calibrations. The results show that the Austronesian language phylogenies are highly congruent with the traditional subgroupings, and the date estimates are robust even when calculated using a restricted set of historical calibrations. PMID:20224774

  1. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    PubMed

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  2. DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.

    PubMed

    Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine

    2009-11-10

    DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.

  3. Further Effects of Phylogenetic Tree Style on Student Comprehension in an Introductory Biology Course.

    PubMed

    Dees, Jonathan; Bussard, Caitlin; Momsen, Jennifer L

    2018-06-01

    Phylogenetic trees have become increasingly important across the life sciences, and as a result, learning to interpret and reason from these diagrams is now an essential component of biology education. Unfortunately, students often struggle to understand phylogenetic trees. Style (i.e., diagonal or bracket) is one factor that has been observed to impact how students interpret phylogenetic trees, and one goal of this research was to investigate these style effects across an introductory biology course. In addition, we investigated the impact of instruction that integrated diagonal and bracket phylogenetic trees equally. Before instruction, students were significantly more accurate with the bracket style for a variety of interpretation and construction tasks. After instruction, however, students were significantly more accurate only for construction tasks and interpretations involving taxa relatedness when using the bracket style. Thus, instruction that used both styles equally mitigated some, but not all, style effects. These results inform the development of research-based instruction that best supports student understanding of phylogenetic trees.

  4. A revision of infrageneric classification in Astelia Banks & Sol. ex R.Br. (Asteliaceae)

    PubMed Central

    Birch, Joanne L.

    2015-01-01

    Abstract Systematic investigations and phylogenetic analyses have indicated that Astelia, as currently circumscribed, is paraphyletic, with Collospermum nested within it. Further, Astelia subgenus Astelia is polyphyletic, and Astelia subgenera Asteliopsis and Tricella are paraphyletic, as currently circumscribed. Revision of the subgeneric classification of Astelia is warranted to ensure classification accurately reflects the evolutionary history of these taxa. Collospermum is relegated to synonymy within Astelia. Astelia is dioecious or polygamodioecious, with a superior ovary, anthers dorsi- or basifixed, pistillodes or pistils that have a single short or poorly defined style, a 3 lobed stigma, and fleshy uni- or trilocular fruit with funicular hairs that are poorly to well developed. Astelia subgenus Collospermum (Skottsb.) Birch is described. A key to Astelia sections is provided. Astelia hastata Colenso, Astelia montana Seem., and Astelia microsperma Colenso pro parte are resurrected and the new combination Astelia samoense (Skottsb.) Birch, comb. nov. is made. PMID:26312037

  5. Accurate mobile malware detection and classification in the cloud.

    PubMed

    Wang, Xiaolei; Yang, Yuexiang; Zeng, Yingzhi

    2015-01-01

    As the dominator of the Smartphone operating system market, consequently android has attracted the attention of s malware authors and researcher alike. The number of types of android malware is increasing rapidly regardless of the considerable number of proposed malware analysis systems. In this paper, by taking advantages of low false-positive rate of misuse detection and the ability of anomaly detection to detect zero-day malware, we propose a novel hybrid detection system based on a new open-source framework CuckooDroid, which enables the use of Cuckoo Sandbox's features to analyze Android malware through dynamic and static analysis. Our proposed system mainly consists of two parts: anomaly detection engine performing abnormal apps detection through dynamic analysis; signature detection engine performing known malware detection and classification with the combination of static and dynamic analysis. We evaluate our system using 5560 malware samples and 6000 benign samples. Experiments show that our anomaly detection engine with dynamic analysis is capable of detecting zero-day malware with a low false negative rate (1.16 %) and acceptable false positive rate (1.30 %); it is worth noting that our signature detection engine with hybrid analysis can accurately classify malware samples with an average positive rate 98.94 %. Considering the intensive computing resources required by the static and dynamic analysis, our proposed detection system should be deployed off-device, such as in the Cloud. The app store markets and the ordinary users can access our detection system for malware detection through cloud service.

  6. Host specificity and phylogenetic relationships of chicken and turkey parvoviruses

    USDA-ARS?s Scientific Manuscript database

    Previous reports indicate that the newly discovered chicken parvoviruses (ChPV) and turkey parvoviruses (TuPV) are very similar to each other, yet they represent different species within a new genus of Parvoviridae. Currently, strain classification is based on the phylogenetic analysis of a 561 bas...

  7. Phylogenetic comparative methods complement discriminant function analysis in ecomorphology.

    PubMed

    Barr, W Andrew; Scott, Robert S

    2014-04-01

    In ecomorphology, Discriminant Function Analysis (DFA) has been used as evidence for the presence of functional links between morphometric variables and ecological categories. Here we conduct simulations of characters containing phylogenetic signal to explore the performance of DFA under a variety of conditions. Characters were simulated using a phylogeny of extant antelope species from known habitats. Characters were modeled with no biomechanical relationship to the habitat category; the only sources of variation were body mass, phylogenetic signal, or random "noise." DFA on the discriminability of habitat categories was performed using subsets of the simulated characters, and Phylogenetic Generalized Least Squares (PGLS) was performed for each character. Analyses were repeated with randomized habitat assignments. When simulated characters lacked phylogenetic signal and/or habitat assignments were random, <5.6% of DFAs and <8.26% of PGLS analyses were significant. When characters contained phylogenetic signal and actual habitats were used, 33.27 to 45.07% of DFAs and <13.09% of PGLS analyses were significant. False Discovery Rate (FDR) corrections for multiple PGLS analyses reduced the rate of significance to <4.64%. In all cases using actual habitats and characters with phylogenetic signal, correct classification rates of DFAs exceeded random chance. In simulations involving phylogenetic signal in both predictor variables and predicted categories, PGLS with FDR was rarely significant, while DFA often was. In short, DFA offered no indication that differences between categories might be explained by phylogenetic signal, while PGLS did. As such, PGLS provides a valuable tool for testing the functional hypotheses at the heart of ecomorphology. Copyright © 2013 Wiley Periodicals, Inc.

  8. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    PubMed

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  9. Ebolavirus Classification Based on Natural Vectors

    PubMed Central

    Zheng, Hui; Yin, Changchuan; Hoang, Tung; He, Rong Lucy; Yang, Jie

    2015-01-01

    According to the WHO, ebolaviruses have resulted in 8818 human deaths in West Africa as of January 2015. To better understand the evolutionary relationship of the ebolaviruses and infer virulence from the relationship, we applied the alignment-free natural vector method to classify the newest ebolaviruses. The dataset includes three new Guinea viruses as well as 99 viruses from Sierra Leone. For the viruses of the family of Filoviridae, both genus label classification and species label classification achieve an accuracy rate of 100%. We represented the relationships among Filoviridae viruses by Unweighted Pair Group Method with Arithmetic Mean (UPGMA) phylogenetic trees and found that the filoviruses can be separated well by three genera. We performed the phylogenetic analysis on the relationship among different species of Ebolavirus by their coding-complete genomes and seven viral protein genes (glycoprotein [GP], nucleoprotein [NP], VP24, VP30, VP35, VP40, and RNA polymerase [L]). The topology of the phylogenetic tree by the viral protein VP24 shows consistency with the variations of virulence of ebolaviruses. The result suggests that VP24 be a pharmaceutical target for treating or preventing ebolaviruses. PMID:25803489

  10. Functional Basis of Microorganism Classification.

    PubMed

    Zhu, Chengsheng; Delmont, Tom O; Vogel, Timothy M; Bromberg, Yana

    2015-08-01

    Correctly identifying nearest "neighbors" of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity) is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion). Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1) the inconsistency of functional diversity levels among different taxa and (2) an (unsurprising) bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less concerned with

  11. Phylogenetic diversity and biodiversity indices on phylogenetic networks.

    PubMed

    Wicke, Kristina; Fischer, Mareike

    2018-04-01

    In biodiversity conservation it is often necessary to prioritize the species to conserve. Existing approaches to prioritization, e.g. the Fair Proportion Index and the Shapley Value, are based on phylogenetic trees and rank species according to their contribution to overall phylogenetic diversity. However, in many cases evolution is not treelike and thus, phylogenetic networks have been developed as a generalization of phylogenetic trees, allowing for the representation of non-treelike evolutionary events, such as hybridization. Here, we extend the concepts of phylogenetic diversity and phylogenetic diversity indices from phylogenetic trees to phylogenetic networks. On the one hand, we consider the treelike content of a phylogenetic network, e.g. the (multi)set of phylogenetic trees displayed by a network and the so-called lowest stable ancestor tree associated with it. On the other hand, we derive the phylogenetic diversity of subsets of taxa and biodiversity indices directly from the internal structure of the network. We consider both approaches that are independent of so-called inheritance probabilities as well as approaches that explicitly incorporate these probabilities. Furthermore, we introduce our software package NetDiversity, which is implemented in Perl and allows for the calculation of all generalized measures of phylogenetic diversity and generalized phylogenetic diversity indices established in this note that are independent of inheritance probabilities. We apply our methods to a phylogenetic network representing the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), a group of species characterized by widespread hybridization. Copyright © 2018 Elsevier Inc. All rights reserved.

  12. A subgeneric classification of Selaginella (Selaginellaceae).

    PubMed

    Weststrand, Stina; Korall, Petra

    2016-12-01

    The lycophyte family Selaginellaceae includes approximately 750 herbaceous species worldwide, with the main species richness in the tropics and subtropics. We recently presented a phylogenetic analysis of Selaginellaceae based on DNA sequence data and, with the phylogeny as a framework, the study discussed the character evolution of the group focusing on gross morphology. Here we translate these findings into a new classification. To present a robust and useful classification, we identified well-supported monophyletic groups from our previous phylogenetic analysis of 223 species, which together represent the diversity of the family with respect to morphology, taxonomy, and geographical distribution. Care was taken to choose groups with supporting morphology. In this classification, we recognize a single genus Selaginella and seven subgenera: Selaginella, Rupestrae, Lepidophyllae, Gymnogynum, Exaltatae, Ericetorum, and Stachygynandrum. The subgenera are all well supported based on analysis of DNA sequence data and morphology. A key to the subgenera is presented. Our new classification is based on a well-founded hypothesis of the evolutionary relationships of Selaginella, and each subgenus can be identified by a suite of morphological features, most of them possible to study in the field. Our intention is that the classification will be useful not only to experts in the field, but also to a broader audience. © 2016 Weststrand and Korall. Published by the Botanical Society of America. This work is licensed under a Creative Commons Attribution License (CC-BY 4.0).

  13. Classification of malignant and benign lung nodules using taxonomic diversity index and phylogenetic distance.

    PubMed

    de Sousa Costa, Robherson Wector; da Silva, Giovanni Lucca França; de Carvalho Filho, Antonio Oseas; Silva, Aristófanes Corrêa; de Paiva, Anselmo Cardoso; Gattass, Marcelo

    2018-05-23

    Lung cancer presents the highest cause of death among patients around the world, in addition of being one of the smallest survival rates after diagnosis. Therefore, this study proposes a methodology for diagnosis of lung nodules in benign and malignant tumors based on image processing and pattern recognition techniques. Mean phylogenetic distance (MPD) and taxonomic diversity index (Δ) were used as texture descriptors. Finally, the genetic algorithm in conjunction with the support vector machine were applied to select the best training model. The proposed methodology was tested on computed tomography (CT) images from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), with the best sensitivity of 93.42%, specificity of 91.21%, accuracy of 91.81%, and area under the ROC curve of 0.94. The results demonstrate the promising performance of texture extraction techniques using mean phylogenetic distance and taxonomic diversity index combined with phylogenetic trees. Graphical Abstract Stages of the proposed methodology.

  14. Preliminary Classification of Novel Hemorrhagic Fever-Causing Viruses Using Sequence-Based PAirwise Sequence Comparison (PASC) Analysis.

    PubMed

    Bào, Yīmíng; Kuhn, Jens H

    2018-01-01

    During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.

  15. On the nature of global classification

    NASA Technical Reports Server (NTRS)

    Wheelis, M. L.; Kandler, O.; Woese, C. R.

    1992-01-01

    Molecular sequencing technology has brought biology into the era of global (universal) classification. Methodologically and philosophically, global classification differs significantly from traditional, local classification. The need for uniformity requires that higher level taxa be defined on the molecular level in terms of universally homologous functions. A global classification should reflect both principal dimensions of the evolutionary process: genealogical relationship and quality and extent of divergence within a group. The ultimate purpose of a global classification is not simply information storage and retrieval; such a system should also function as an heuristic representation of the evolutionary paradigm that exerts a directing influence on the course of biology. The global system envisioned allows paraphyletic taxa. To retain maximal phylogenetic information in these cases, minor notational amendments in existing taxonomic conventions should be adopted.

  16. Bilateral weighted radiographs are required for accurate classification of acromioclavicular separation: an observational study of 59 cases.

    PubMed

    Ibrahim, E F; Forrest, N P; Forester, A

    2015-10-01

    Misinterpretation of the Rockwood classification system for acromioclavicular joint (ACJ) separations has resulted in a trend towards using unilateral radiographs for grading. Further, the use of weighted views to 'unmask' a grade III injury has fallen out of favour. Recent evidence suggests that many radiographic grade III injuries represent only a partial injury to the stabilising ligaments. This study aimed to determine (1) whether accurate classification is possible on unilateral radiographs and (2) the efficacy of weighted bilateral radiographs in unmasking higher-grade injuries. Complete bilateral non-weighted and weighted sets of radiographs for patients presenting with an acromioclavicular separation over a 10-year period were analysed retrospectively, and they were graded I-VI according to Rockwood's criteria. Comparison was made between grading based on (1) a single antero-posterior (AP) view of the injured side, (2) bilateral non-weighted views and (3) bilateral weighted views. Radiographic measurements for cases that changed grade after weighted views were statistically compared to see if this could have been predicted beforehand. Fifty-nine sets of radiographs on 59 patients (48 male, mean age of 33 years) were included. Compared with unilateral radiographs, non-weighted bilateral comparison films resulted in a grade change for 44 patients (74.5%). Twenty-eight of 56 patients initially graded as I, II or III were upgraded to grade V and two of three initial grade V patients were downgraded to grade III. The addition of a weighted view further upgraded 10 patients to grade V. No grade II injury was changed to grade III and no injury of any severity was downgraded by a weighted view. Grade III injuries upgraded on weighted views had a significantly greater baseline median percentage coracoclavicular distance increase than those that were not upgraded (80.7% vs. 55.4%, p=0.015). However, no cut-off point for this value could be identified to predict an

  17. Functional Basis of Microorganism Classification

    PubMed Central

    Zhu, Chengsheng; Delmont, Tom O.; Vogel, Timothy M.; Bromberg, Yana

    2015-01-01

    Correctly identifying nearest “neighbors” of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity) is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion). Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1) the inconsistency of functional diversity levels among different taxa and (2) an (unsurprising) bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less concerned

  18. Phylogenetic tree construction using trinucleotide usage profile (TUP).

    PubMed

    Chen, Si; Deng, Lih-Yuan; Bowman, Dale; Shiau, Jyh-Jen Horng; Wong, Tit-Yee; Madahian, Behrouz; Lu, Henry Horng-Shing

    2016-10-06

    overlapping words, which is the average (or mixture) of the frequency distributions of three possible reading frames. Consequently, we show (from the entropy viewpoint) that the FFP procedure could dilute important gene information and therefore provides less accurate classification.

  19. Absolute Pitch in Boreal Chickadees and Humans: Exceptions that Test a Phylogenetic Rule

    ERIC Educational Resources Information Center

    Weisman, Ronald G.; Balkwill, Laura-Lee; Hoeschele, Marisa; Moscicki, Michele K.; Bloomfield, Laurie L.; Sturdy, Christopher B.

    2010-01-01

    This research examined generality of the phylogenetic rule that birds discriminate frequency ranges more accurately than mammals. Human absolute pitch chroma possessors accurately tracked transitions between frequency ranges. Independent tests showed that they used note naming (pitch chroma) to remap the tones into ranges; neither possessors nor…

  20. On the quirks of maximum parsimony and likelihood on phylogenetic networks.

    PubMed

    Bryant, Christopher; Fischer, Mareike; Linz, Simone; Semple, Charles

    2017-03-21

    Maximum parsimony is one of the most frequently-discussed tree reconstruction methods in phylogenetic estimation. However, in recent years it has become more and more apparent that phylogenetic trees are often not sufficient to describe evolution accurately. For instance, processes like hybridization or lateral gene transfer that are commonplace in many groups of organisms and result in mosaic patterns of relationships cannot be represented by a single phylogenetic tree. This is why phylogenetic networks, which can display such events, are becoming of more and more interest in phylogenetic research. It is therefore necessary to extend concepts like maximum parsimony from phylogenetic trees to networks. Several suggestions for possible extensions can be found in recent literature, for instance the softwired and the hardwired parsimony concepts. In this paper, we analyze the so-called big parsimony problem under these two concepts, i.e. we investigate maximum parsimonious networks and analyze their properties. In particular, we show that finding a softwired maximum parsimony network is possible in polynomial time. We also show that the set of maximum parsimony networks for the hardwired definition always contains at least one phylogenetic tree. Lastly, we investigate some parallels of parsimony to different likelihood concepts on phylogenetic networks. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Phylogenetic analysis of a spontaneous cocoa bean fermentation metagenome reveals new insights into its bacterial and fungal community diversity.

    PubMed

    Illeghems, Koen; De Vuyst, Luc; Papalexandratou, Zoi; Weckx, Stefan

    2012-01-01

    This is the first report on the phylogenetic analysis of the community diversity of a single spontaneous cocoa bean box fermentation sample through a metagenomic approach involving 454 pyrosequencing. Several sequence-based and composition-based taxonomic profiling tools were used and evaluated to avoid software-dependent results and their outcome was validated by comparison with previously obtained culture-dependent and culture-independent data. Overall, this approach revealed a wider bacterial (mainly γ-Proteobacteria) and fungal diversity than previously found. Further, the use of a combination of different classification methods, in a software-independent way, helped to understand the actual composition of the microbial ecosystem under study. In addition, bacteriophage-related sequences were found. The bacterial diversity depended partially on the methods used, as composition-based methods predicted a wider diversity than sequence-based methods, and as classification methods based solely on phylogenetic marker genes predicted a more restricted diversity compared with methods that took all reads into account. The metagenomic sequencing analysis identified Hanseniaspora uvarum, Hanseniaspora opuntiae, Saccharomyces cerevisiae, Lactobacillus fermentum, and Acetobacter pasteurianus as the prevailing species. Also, the presence of occasional members of the cocoa bean fermentation process was revealed (such as Erwinia tasmaniensis, Lactobacillus brevis, Lactobacillus casei, Lactobacillus rhamnosus, Lactococcus lactis, Leuconostoc mesenteroides, and Oenococcus oeni). Furthermore, the sequence reads associated with viral communities were of a restricted diversity, dominated by Myoviridae and Siphoviridae, and reflecting Lactobacillus as the dominant host. To conclude, an accurate overview of all members of a cocoa bean fermentation process sample was revealed, indicating the superiority of metagenomic sequencing over previously used techniques.

  2. Evaluating Support for the Current Classification of Eukaryotic Diversity

    PubMed Central

    Parfrey, Laura Wegener; Barbero, Erika; Lasser, Elyse; Dunthorn, Micah; Bhattacharya, Debashish; Patterson, David J; Katz, Laura A

    2006-01-01

    Perspectives on the classification of eukaryotic diversity have changed rapidly in recent years, as the four eukaryotic groups within the five-kingdom classification—plants, animals, fungi, and protists—have been transformed through numerous permutations into the current system of six “supergroups.” The intent of the supergroup classification system is to unite microbial and macroscopic eukaryotes based on phylogenetic inference. This supergroup approach is increasing in popularity in the literature and is appearing in introductory biology textbooks. We evaluate the stability and support for the current six-supergroup classification of eukaryotes based on molecular genealogies. We assess three aspects of each supergroup: (1) the stability of its taxonomy, (2) the support for monophyly (single evolutionary origin) in molecular analyses targeting a supergroup, and (3) the support for monophyly when a supergroup is included as an out-group in phylogenetic studies targeting other taxa. Our analysis demonstrates that supergroup taxonomies are unstable and that support for groups varies tremendously, indicating that the current classification scheme of eukaryotes is likely premature. We highlight several trends contributing to the instability and discuss the requirements for establishing robust clades within the eukaryotic tree of life. PMID:17194223

  3. Genome-Based Taxonomic Classification of Bacteroidetes

    PubMed Central

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N.; Woyke, Tanja; Kyrpides, Nikos C.; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved. PMID:28066339

  4. Phylogenetic Quantification of Intra-tumour Heterogeneity

    PubMed Central

    Schwarz, Roland F.; Trinh, Anne; Sipos, Botond; Brenton, James D.; Goldman, Nick; Markowetz, Florian

    2014-01-01

    Intra-tumour genetic heterogeneity is the result of ongoing evolutionary change within each cancer. The expansion of genetically distinct sub-clonal populations may explain the emergence of drug resistance, and if so, would have prognostic and predictive utility. However, methods for objectively quantifying tumour heterogeneity have been missing and are particularly difficult to establish in cancers where predominant copy number variation prevents accurate phylogenetic reconstruction owing to horizontal dependencies caused by long and cascading genomic rearrangements. To address these challenges, we present MEDICC, a method for phylogenetic reconstruction and heterogeneity quantification based on a Minimum Event Distance for Intra-tumour Copy-number Comparisons. Using a transducer-based pairwise comparison function, we determine optimal phasing of major and minor alleles, as well as evolutionary distances between samples, and are able to reconstruct ancestral genomes. Rigorous simulations and an extensive clinical study show the power of our method, which outperforms state-of-the-art competitors in reconstruction accuracy, and additionally allows unbiased numerical quantification of tumour heterogeneity. Accurate quantification and evolutionary inference are essential to understand the functional consequences of tumour heterogeneity. The MEDICC algorithms are independent of the experimental techniques used and are applicable to both next-generation sequencing and array CGH data. PMID:24743184

  5. Iteratively Refined Guide Trees Help Improving Alignment and Phylogenetic Inference in the Mushroom Family Bolbitiaceae

    PubMed Central

    Tóth, Annamária; Hausknecht, Anton; Krisai-Greilhuber, Irmgard; Papp, Tamás; Vágvölgyi, Csaba; Nagy, László G.

    2013-01-01

    Reconciling traditional classifications, morphology, and the phylogenetic relationships of brown-spored agaric mushrooms has proven difficult in many groups, due to extensive convergence in morphological features. Here, we address the monophyly of the Bolbitiaceae, a family with over 700 described species and examine the higher-level relationships within the family using a newly constructed multilocus dataset (ITS, nrLSU rDNA and EF1-alpha). We tested whether the fast-evolving Internal Transcribed Spacer (ITS) sequences can be accurately aligned across the family, by comparing the outcome of two iterative alignment refining approaches (an automated and a manual) and various indel-treatment strategies. We used PRANK to align sequences in both cases. Our results suggest that – although PRANK successfully evades overmatching of gapped sites, referred previously to as alignment overmatching – it infers an unrealistically high number of indel events with natively generated guide-trees. This 'alignment undermatching' could be avoided by using more rigorous (e.g. ML) guide trees. The trees inferred in this study support the monophyly of the core Bolbitiaceae, with the exclusion of Panaeolus, Agrocybe, and some of the genera formerly placed in the family. Bolbitius and Conocybe were found monophyletic, however, Pholiotina and Galerella require redefinition. The phylogeny revealed that stipe coverage type is a poor predictor of phylogenetic relationships, indicating the need for a revision of the intrageneric relationships within Conocybe. PMID:23418526

  6. A curated database of cyanobacterial strains relevant for modern taxonomy and phylogenetic studies.

    PubMed

    Ramos, Vitor; Morais, João; Vasconcelos, Vitor M

    2017-04-25

    The dataset herein described lays the groundwork for an online database of relevant cyanobacterial strains, named CyanoType (http://lege.ciimar.up.pt/cyanotype). It is a database that includes categorized cyanobacterial strains useful for taxonomic, phylogenetic or genomic purposes, with associated information obtained by means of a literature-based curation. The dataset lists 371 strains and represents the first version of the database (CyanoType v.1). Information for each strain includes strain synonymy and/or co-identity, strain categorization, habitat, accession numbers for molecular data, taxonomy and nomenclature notes according to three different classification schemes, hierarchical automatic classification, phylogenetic placement according to a selection of relevant studies (including this), and important bibliographic references. The database will be updated periodically, namely by adding new strains meeting the criteria for inclusion and by revising and adding up-to-date metadata for strains already listed. A global 16S rDNA-based phylogeny is provided in order to assist users when choosing the appropriate strains for their studies.

  7. Is geography an accurate predictor of evolutionary history in the millipede family Xystodesmidae?

    PubMed Central

    Marek, Paul E.

    2017-01-01

    For the past several centuries, millipede taxonomists have used the morphology of male copulatory structures (modified legs called gonopods), which are strongly variable and suggestive of species-level differences, as a source to understand taxon relationships. Millipedes in the family Xystodesmidae are blind, dispersal-limited and have narrow habitat requirements. Therefore, geographical proximity may instead be a better predictor of evolutionary relationship than morphology, especially since gonopodal anatomy is extremely divergent and similarities may be masked by evolutionary convergence. Here we provide a phylogenetics-based test of the power of morphological versus geographical character sets for resolving phylogenetic relationships in xystodesmid millipedes. Molecular data from 90 species-group taxa in the family were included in a six-gene phylogenetic analysis to provide the basis for comparing trees generated from these alternative character sets. The molecular phylogeny was compared to topologies representing three hypotheses: (1) a prior classification formulated using morphological and geographical data, (2) hierarchical groupings derived from Euclidean geographical distance, and (3) one based solely on morphological data. Euclidean geographical distance was not found to be a better predictor of evolutionary relationship than the prior classification, the latter of which was the most similar to the molecular topology. However, all three of the alternative topologies were highly divergent (Bayes factor >10) from the molecular topology, with the tree inferred exclusively from morphology being the most divergent. The results of this analysis show that a high degree of morphological convergence from substantial gonopod shape divergence generated spurious phylogenetic relationships. These results indicate the impact that a high degree of morphological homoplasy may have had on prior treatments of the family. Using the results of our phylogenetic analysis

  8. Can glenoid wear be accurately assessed using x-ray imaging? Evaluating agreement of x-ray and magnetic resonance imaging (MRI) Walch classification.

    PubMed

    Kopka, Michaela; Fourman, Mitchell; Soni, Ashish; Cordle, Andrew C; Lin, Albert

    2017-09-01

    The Walch classification is the most recognized means of assessing glenoid wear in preoperative planning for shoulder arthroplasty. This classification relies on advanced imaging, which is more expensive and less practical than plain radiographs. The purpose of this study was to determine whether the Walch classification could be accurately applied to x-ray images compared with magnetic resonance imaging (MRI) as the gold standard. We hypothesized that x-ray images cannot adequately replace advanced imaging in the evaluation of glenoid wear. Preoperative axillary x-ray images and MRI scans of 50 patients assessed for shoulder arthroplasty were independently reviewed by 5 raters. Glenoid wear was individually classified according to the Walch classification using each imaging modality. The raters then collectively reviewed the MRI scans and assigned a consensus classification to serve as the gold standard. The κ coefficient was used to determine interobserver agreement for x-ray images and independent MRI reads, as well as the agreement between x-ray images and consensus MRI. The inter-rater agreement for x-ray images and MRIs was "moderate" (κ = 0.42 and κ = 0.47, respectively) for the 5-category Walch classification (A1, A2, B1, B2, C) and "moderate" (κ = 0.54 and κ = 0.59, respectively) for the 3-category Walch classification (A, B, C). The agreement between x-ray images and consensus MRI was much lower: "fair-to-moderate" (κ = 0.21-0.51) for the 5-category and "moderate" (κ = 0.36-0.60) for the 3-category Walch classification. The inter-rater agreement between x-ray images and consensus MRI is "fair-to-moderate." This is lower than the previously reported reliability of the Walch classification using computed tomography scans. Accordingly, x-ray images are inferior to advanced imaging when assessing glenoid wear. Copyright © 2017 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights

  9. Accurate vehicle classification including motorcycles using piezoelectric sensors.

    DOT National Transportation Integrated Search

    2013-03-01

    State and federal departments of transportation are charged with classifying vehicles and monitoring mileage traveled. Accurate data reporting enables suitable roadway design for safety and capacity. Vehicle classifiers currently employ inductive loo...

  10. Threat Diversity Will Erode Mammalian Phylogenetic Diversity in the Near Future

    PubMed Central

    Jono, Clémentine M. A.; Pavoine, Sandrine

    2012-01-01

    To reduce the accelerating rate of phylogenetic diversity loss, many studies have searched for mechanisms that could explain why certain species are at risk, whereas others are not. In particular, it has been demonstrated that species might be affected by both extrinsic threat factors as well as intrinsic biological traits that could render a species more sensitive to extinction; here, we focus on extrinsic factors. Recently, the International Union for Conservation of Nature developed a new classification of threat types, including climate change, urbanization, pollution, agriculture and aquaculture, and harvesting/hunting. We have used this new classification to analyze two main factors that could explain the expected future loss of mammalian phylogenetic diversity: 1. differences in the type of threats that affect mammals and 2. differences in the number of major threats that accumulate for a single species. Our results showed that Cetartiodactyla, Diprotodontia, Monotremata, Perissodactyla, Primates, and Proboscidea could lose a high proportion of their current phylogenetic diversity in the coming decades. In contrast, Chiroptera, Didelphimorphia, and Rodentia could lose less phylogenetic diversity than expected if extinctions were random. Some mammalian clades, including Marsupiala, Chiroptera, and a subclade of Primates, are affected by particular threat types, most likely due solely to their geographic locations and associations with particular habitats. However, regardless of the geography, habitat, and taxon considered, it is not the threat type, but the threat diversity that determines the extinction risk for species and clades. Thus, some mammals might be randomly located in areas subjected to a large diversity of threats; they might also accumulate detrimental traits that render them sensitive to different threats, which is a characteristic that could be associated with large body size. Any action reducing threat diversity is expected to have a

  11. Molecular phylogenetics and character evolution of morphologically diverse groups, Dendrobium section Dendrobium and allies

    PubMed Central

    Takamiya, Tomoko; Wongsawad, Pheravut; Sathapattayanon, Apirada; Tajima, Natsuko; Suzuki, Shunichiro; Kitamura, Saki; Shioda, Nao; Handa, Takashi; Kitanaka, Susumu; Iijima, Hiroshi; Yukawa, Tomohisa

    2014-01-01

    It is always difficult to construct coherent classification systems for plant lineages having diverse morphological characters. The genus Dendrobium, one of the largest genera in the Orchidaceae, includes ∼1100 species, and enormous morphological diversification has hindered the establishment of consistent classification systems covering all major groups of this genus. Given the particular importance of species in Dendrobium section Dendrobium and allied groups as floriculture and crude drug genetic resources, there is an urgent need to establish a stable classification system. To clarify phylogenetic relationships in Dendrobium section Dendrobium and allied groups, we analysed the macromolecular characters of the group. Phylogenetic analyses of 210 taxa of Dendrobium were conducted on DNA sequences of internal transcribed spacer (ITS) regions of 18S–26S nuclear ribosomal DNA and the maturase-coding gene (matK) located in an intron of the plastid gene trnK using maximum parsimony and Bayesian methods. The parsimony and Bayesian analyses revealed 13 distinct clades in the group comprising section Dendrobium and its allied groups. Results also showed paraphyly or polyphyly of sections Amblyanthus, Aporum, Breviflores, Calcarifera, Crumenata, Dendrobium, Densiflora, Distichophyllae, Dolichocentrum, Holochrysa, Oxyglossum and Pedilonum. On the other hand, the monophyly of section Stachyobium was well supported. It was found that many of the morphological characters that have been believed to reflect phylogenetic relationships are, in fact, the result of convergence. As such, many of the sections that have been recognized up to this point were found to not be monophyletic, so recircumscription of sections is required. PMID:25107672

  12. FSR: feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number.

    PubMed

    Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam

    2012-01-15

    Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.

  13. Visualizing phylogenetic tree landscapes.

    PubMed

    Wilgenbusch, James C; Huang, Wen; Gallivan, Kyle A

    2017-02-02

    Genomic-scale sequence alignments are increasingly used to infer phylogenies in order to better understand the processes and patterns of evolution. Different partitions within these new alignments (e.g., genes, codon positions, and structural features) often favor hundreds if not thousands of competing phylogenies. Summarizing and comparing phylogenies obtained from multi-source data sets using current consensus tree methods discards valuable information and can disguise potential methodological problems. Discovery of efficient and accurate dimensionality reduction methods used to display at once in 2- or 3- dimensions the relationship among these competing phylogenies will help practitioners diagnose the limits of current evolutionary models and potential problems with phylogenetic reconstruction methods when analyzing large multi-source data sets. We introduce several dimensionality reduction methods to visualize in 2- and 3-dimensions the relationship among competing phylogenies obtained from gene partitions found in three mid- to large-size mitochondrial genome alignments. We test the performance of these dimensionality reduction methods by applying several goodness-of-fit measures. The intrinsic dimensionality of each data set is also estimated to determine whether projections in 2- and 3-dimensions can be expected to reveal meaningful relationships among trees from different data partitions. Several new approaches to aid in the comparison of different phylogenetic landscapes are presented. Curvilinear Components Analysis (CCA) and a stochastic gradient decent (SGD) optimization method give the best representation of the original tree-to-tree distance matrix for each of the three- mitochondrial genome alignments and greatly outperformed the method currently used to visualize tree landscapes. The CCA + SGD method converged at least as fast as previously applied methods for visualizing tree landscapes. We demonstrate for all three mtDNA alignments that 3D

  14. A non-contact method based on multiple signal classification algorithm to reduce the measurement time for accurately heart rate detection

    NASA Astrophysics Data System (ADS)

    Bechet, P.; Mitran, R.; Munteanu, M.

    2013-08-01

    Non-contact methods for the assessment of vital signs are of great interest for specialists due to the benefits obtained in both medical and special applications, such as those for surveillance, monitoring, and search and rescue. This paper investigates the possibility of implementing a digital processing algorithm based on the MUSIC (Multiple Signal Classification) parametric spectral estimation in order to reduce the observation time needed to accurately measure the heart rate. It demonstrates that, by proper dimensioning the signal subspace, the MUSIC algorithm can be optimized in order to accurately assess the heart rate during an 8-28 s time interval. The validation of the processing algorithm performance was achieved by minimizing the mean error of the heart rate after performing simultaneous comparative measurements on several subjects. In order to calculate the error the reference value of heart rate was measured using a classic measurement system through direct contact.

  15. Making Mosquito Taxonomy Useful: A Stable Classification of Tribe Aedini that Balances Utility with Current Knowledge of Evolutionary Relationships.

    PubMed

    Wilkerson, Richard C; Linton, Yvonne-Marie; Fonseca, Dina M; Schultz, Ted R; Price, Dana C; Strickman, Daniel A

    2015-01-01

    The tribe Aedini (Family Culicidae) contains approximately one-quarter of the known species of mosquitoes, including vectors of deadly or debilitating disease agents. This tribe contains the genus Aedes, which is one of the three most familiar genera of mosquitoes. During the past decade, Aedini has been the focus of a series of extensive morphology-based phylogenetic studies published by Reinert, Harbach, and Kitching (RH&K). Those authors created 74 new, elevated or resurrected genera from what had been the single genus Aedes, almost tripling the number of genera in the entire family Culicidae. The proposed classification is based on subjective assessments of the "number and nature of the characters that support the branches" subtending particular monophyletic groups in the results of cladistic analyses of a large set of morphological characters of representative species. To gauge the stability of RH&K's generic groupings we reanalyzed their data with unweighted parsimony jackknife and maximum-parsimony analyses, with and without ordering 14 of the characters as in RH&K. We found that their phylogeny was largely weakly supported and their taxonomic rankings failed priority and other useful taxon-naming criteria. Consequently, we propose simplified aedine generic designations that 1) restore a classification system that is useful for the operational community; 2) enhance the ability of taxonomists to accurately place new species into genera; 3) maintain the progress toward a natural classification based on monophyletic groups of species; and 4) correct the current classification system that is subject to instability as new species are described and existing species more thoroughly defined. We do not challenge the phylogenetic hypotheses generated by the above-mentioned series of morphological studies. However, we reduce the ranks of the genera and subgenera of RH&K to subgenera or informal species groups, respectively, to preserve stability as new data become

  16. Making Mosquito Taxonomy Useful: A Stable Classification of Tribe Aedini that Balances Utility with Current Knowledge of Evolutionary Relationships

    PubMed Central

    Wilkerson, Richard C.; Linton, Yvonne-Marie; Fonseca, Dina M.; Schultz, Ted R.; Price, Dana C.; Strickman, Daniel A.

    2015-01-01

    The tribe Aedini (Family Culicidae) contains approximately one-quarter of the known species of mosquitoes, including vectors of deadly or debilitating disease agents. This tribe contains the genus Aedes, which is one of the three most familiar genera of mosquitoes. During the past decade, Aedini has been the focus of a series of extensive morphology-based phylogenetic studies published by Reinert, Harbach, and Kitching (RH&K). Those authors created 74 new, elevated or resurrected genera from what had been the single genus Aedes, almost tripling the number of genera in the entire family Culicidae. The proposed classification is based on subjective assessments of the “number and nature of the characters that support the branches” subtending particular monophyletic groups in the results of cladistic analyses of a large set of morphological characters of representative species. To gauge the stability of RH&K’s generic groupings we reanalyzed their data with unweighted parsimony jackknife and maximum-parsimony analyses, with and without ordering 14 of the characters as in RH&K. We found that their phylogeny was largely weakly supported and their taxonomic rankings failed priority and other useful taxon-naming criteria. Consequently, we propose simplified aedine generic designations that 1) restore a classification system that is useful for the operational community; 2) enhance the ability of taxonomists to accurately place new species into genera; 3) maintain the progress toward a natural classification based on monophyletic groups of species; and 4) correct the current classification system that is subject to instability as new species are described and existing species more thoroughly defined. We do not challenge the phylogenetic hypotheses generated by the above-mentioned series of morphological studies. However, we reduce the ranks of the genera and subgenera of RH&K to subgenera or informal species groups, respectively, to preserve stability as new data

  17. Spatial phylogenetics of the vascular flora of Chile.

    PubMed

    Scherson, Rosa A; Thornhill, Andrew H; Urbina-Casanova, Rafael; Freyman, William A; Pliscoff, Patricio A; Mishler, Brent D

    2017-07-01

    Current geographic patterns of biodiversity are a consequence of the evolutionary history of the lineages that comprise them. This study was aimed at exploring how evolutionary features of the vascular flora of Chile are distributed across the landscape. Using a phylogeny at the genus level for 87% of the Chilean vascular flora, and a geographic database of sample localities, we calculated phylogenetic diversity (PD), phylogenetic endemism (PE), relative PD (RPD), and relative PE (RPE). Categorical Analyses of Neo- and Paleo-Endemism (CANAPE) were also performed, using a spatial randomization to assess statistical significance. A cluster analysis using range-weighted phylogenetic turnover was used to compare among grid cells, and with known Chilean bioclimates. PD patterns were concordant with known centers of high taxon richness and the Chilean biodiversity hotspot. In addition, several other interesting areas of concentration of evolutionary history were revealed as potential conservation targets. The south of the country shows areas of significantly high RPD and a concentration of paleo-endemism, and the north shows areas of significantly low PD and RPD, and a concentration of neo-endemism. Range-weighted phylogenetic turnover shows high congruence with the main macrobioclimates of Chile. Even though the study was done at the genus level, the outcome provides an accurate outline of phylogenetic patterns that can be filled in as more fine-scaled information becomes available. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Incompletely resolved phylogenetic trees inflate estimates of phylogenetic conservatism.

    PubMed

    Davies, T Jonathan; Kraft, Nathan J B; Salamin, Nicolas; Wolkovich, Elizabeth M

    2012-02-01

    The tendency for more closely related species to share similar traits and ecological strategies can be explained by their longer shared evolutionary histories and represents phylogenetic conservatism. How strongly species traits co-vary with phylogeny can significantly impact how we analyze cross-species data and can influence our interpretation of assembly rules in the rapidly expanding field of community phylogenetics. Phylogenetic conservatism is typically quantified by analyzing the distribution of species values on the phylogenetic tree that connects them. Many phylogenetic approaches, however, assume a completely sampled phylogeny: while we have good estimates of deeper phylogenetic relationships for many species-rich groups, such as birds and flowering plants, we often lack information on more recent interspecific relationships (i.e., within a genus). A common solution has been to represent these relationships as polytomies on trees using taxonomy as a guide. Here we show that such trees can dramatically inflate estimates of phylogenetic conservatism quantified using S. P. Blomberg et al.'s K statistic. Using simulations, we show that even randomly generated traits can appear to be phylogenetically conserved on poorly resolved trees. We provide a simple rarefaction-based solution that can reliably retrieve unbiased estimates of K, and we illustrate our method using data on first flowering times from Thoreau's woods (Concord, Massachusetts, USA).

  19. Molecular taxonomy and phylogenetic position of lactic acid bacteria.

    PubMed

    Stackebrandt, E; Teuber, M

    1988-03-01

    Lactic acid bacteria, important in food technology, are Gram-positive organisms exhibiting a DNA G + C content of less than 50 mol%. Phylogenetically they are members of the Clostridium-Bacillus subdivision of Gram-positive eubacteria. Lactobacillus and streptococci together with related facultatively anaerobic taxa evolved as individual lines of descent about 1.5-2 billion years ago when the earth passed from an anaerobic to an aerobic environment. In contrast to the traditional, morphology-based classification, the genus Lactobacillus is intermixed with strains of Pediococcus and Leuconostoc. Similarly, the physiology-based clustering of lactobacilli into Thermo-, Strepto- and Betabacterium does not agree with their phylogenetic relationships. On the other hand, the phenotypically defined genus Streptococcus is not a phylogenetic coherent genus but its members fall into at least 3 moderately related genera, i.e. Streptococcus, Lactococcus and Enterococcus. The genus Bifidobacterium, frequently grouped with the lactobacilli, is the most ancient group of the second, the Actinomycetes subdivision of the Gram-positive eubacteria. In addition, propionibacteria, microbacteria and brevibacteria belong to this subdivision but the latter organisms appear as offshoots of non-lactic acid bacteria.

  20. Human Papillomavirus Type 16 Genetic Variants: Phylogeny and Classification Based on E6 and LCR

    PubMed Central

    Gheit, Tarik; Franceschi, Silvia; Vignat, Jerome; Burk, Robert D.; Sylla, Bakary S.; Tommasino, Massimo; Clifford, Gary M.

    2012-01-01

    Naturally occurring genetic variants of human papillomavirus type 16 (HPV16) are common and have previously been classified into 4 major lineages; European-Asian (EAS), including the sublineages European (EUR) and Asian (As), African 1 (AFR1), African 2 (AFR2), and North-American/Asian-American (NA/AA). We aimed to improve the classification of HPV16 variant lineages by using a large resource of HPV16-positive cervical samples collected from geographically diverse populations in studies on HPV and/or cervical cancer undertaken by the International Agency for Research on Cancer. In total, we sequenced the entire E6 genes and long control regions (LCRs) of 953 HPV16 isolates from 27 different countries worldwide. Phylogenetic analyses confirmed previously described variant lineages and subclassifications. We characterized two new sublineages within each of the lineages AFR1 and AFR2 that are robustly classified using E6 and/or the LCR. We could differentiate previously identified AA1, AA2, and NA sublineages, although they could not be distinguished by E6 alone, requiring the LCR for correct phylogenetic classification. We thus provide a classification system for HPV16 genomes based on 13 and 32 phylogenetically distinguishing positions in E6 and the LCR, respectively, that distinguish nine HPV16 variant sublineages (EUR, As, AFR1a, AFR1b, AFR2a, AFR2b, NA, AA1, and AA2). Ninety-seven percent of all 953 samples fitted this classification perfectly. Other positions were frequently polymorphic within one or more lineages but did not define phylogenetic subgroups. Such a standardized classification of HPV16 variants is important for future epidemiological and biological studies of the carcinogenic potential of HPV16 variant lineages. PMID:22491459

  1. Human papillomavirus type 16 genetic variants: phylogeny and classification based on E6 and LCR.

    PubMed

    Cornet, Iris; Gheit, Tarik; Franceschi, Silvia; Vignat, Jerome; Burk, Robert D; Sylla, Bakary S; Tommasino, Massimo; Clifford, Gary M

    2012-06-01

    Naturally occurring genetic variants of human papillomavirus type 16 (HPV16) are common and have previously been classified into 4 major lineages; European-Asian (EAS), including the sublineages European (EUR) and Asian (As), African 1 (AFR1), African 2 (AFR2), and North-American/Asian-American (NA/AA). We aimed to improve the classification of HPV16 variant lineages by using a large resource of HPV16-positive cervical samples collected from geographically diverse populations in studies on HPV and/or cervical cancer undertaken by the International Agency for Research on Cancer. In total, we sequenced the entire E6 genes and long control regions (LCRs) of 953 HPV16 isolates from 27 different countries worldwide. Phylogenetic analyses confirmed previously described variant lineages and subclassifications. We characterized two new sublineages within each of the lineages AFR1 and AFR2 that are robustly classified using E6 and/or the LCR. We could differentiate previously identified AA1, AA2, and NA sublineages, although they could not be distinguished by E6 alone, requiring the LCR for correct phylogenetic classification. We thus provide a classification system for HPV16 genomes based on 13 and 32 phylogenetically distinguishing positions in E6 and the LCR, respectively, that distinguish nine HPV16 variant sublineages (EUR, As, AFR1a, AFR1b, AFR2a, AFR2b, NA, AA1, and AA2). Ninety-seven percent of all 953 samples fitted this classification perfectly. Other positions were frequently polymorphic within one or more lineages but did not define phylogenetic subgroups. Such a standardized classification of HPV16 variants is important for future epidemiological and biological studies of the carcinogenic potential of HPV16 variant lineages.

  2. Molecular identification and phylogenetic study of Demodex caprae.

    PubMed

    Zhao, Ya-E; Cheng, Juan; Hu, Li; Ma, Jun-Xian

    2014-10-01

    The DNA barcode has been widely used in species identification and phylogenetic analysis since 2003, but there have been no reports in Demodex. In this study, to obtain an appropriate DNA barcode for Demodex, molecular identification of Demodex caprae based on mitochondrial cox1 was conducted. Firstly, individual adults and eggs of D. caprae were obtained for genomic DNA (gDNA) extraction; Secondly, mitochondrial cox1 fragment was amplified, cloned, and sequenced; Thirdly, cox1 fragments of D. caprae were aligned with those of other Demodex retrieved from GenBank; Finally, the intra- and inter-specific divergences were computed and the phylogenetic trees were reconstructed to analyze phylogenetic relationship in Demodex. Results obtained from seven 429-bp fragments of D. caprae showed that sequence identities were above 99.1% among three adults and four eggs. The intraspecific divergences in D. caprae, Demodex folliculorum, Demodex brevis, and Demodex canis were 0.0-0.9, 0.5-0.9, 0.0-0.2, and 0.0-0.5%, respectively, while the interspecific divergences between D. caprae and D. folliculorum, D. canis, and D. brevis were 20.3-20.9, 21.8-23.0, and 25.0-25.3, respectively. The interspecific divergences were 10 times higher than intraspecific ones, indicating considerable barcoding gap. Furthermore, the phylogenetic trees showed that four Demodex species gathered separately, representing independent species; and Demodex folliculorum gathered with canine Demodex, D. caprae, and D. brevis in sequence. In conclusion, the selected 429-bp mitochondrial cox1 gene is an appropriate DNA barcode for molecular classification, identification, and phylogenetic analysis of Demodex. D. caprae is an independent species and D. folliculorum is closer to D. canis than to D. caprae or D. brevis.

  3. Accurate classification of brain gliomas by discriminate dictionary learning based on projective dictionary pair learning of proton magnetic resonance spectra.

    PubMed

    Adebileje, Sikiru Afolabi; Ghasemi, Keyvan; Aiyelabegan, Hammed Tanimowo; Saligheh Rad, Hamidreza

    2017-04-01

    Proton magnetic resonance spectroscopy is a powerful noninvasive technique that complements the structural images of cMRI, which aids biomedical and clinical researches, by identifying and visualizing the compositions of various metabolites within the tissues of interest. However, accurate classification of proton magnetic resonance spectroscopy is still a challenging issue in clinics due to low signal-to-noise ratio, overlapping peaks of metabolites, and the presence of background macromolecules. This paper evaluates the performance of a discriminate dictionary learning classifiers based on projective dictionary pair learning method for brain gliomas proton magnetic resonance spectroscopy spectra classification task, and the result were compared with the sub-dictionary learning methods. The proton magnetic resonance spectroscopy data contain a total of 150 spectra (74 healthy, 23 grade II, 23 grade III, and 30 grade IV) from two databases. The datasets from both databases were first coupled together, followed by column normalization. The Kennard-Stone algorithm was used to split the datasets into its training and test sets. Performance comparison based on the overall accuracy, sensitivity, specificity, and precision was conducted. Based on the overall accuracy of our classification scheme, the dictionary pair learning method was found to outperform the sub-dictionary learning methods 97.78% compared with 68.89%, respectively. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  4. Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses.

    PubMed

    Fouquier, Jennifer; Rideout, Jai Ram; Bolyen, Evan; Chase, John; Shiffer, Arron; McDonald, Daniel; Knight, Rob; Caporaso, J Gregory; Kelley, Scott T

    2016-02-24

    Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a "foundation" phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, "extension" phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new "extension tree" child. We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic

  5. A curated database of cyanobacterial strains relevant for modern taxonomy and phylogenetic studies

    PubMed Central

    Ramos, Vitor; Morais, João; Vasconcelos, Vitor M.

    2017-01-01

    The dataset herein described lays the groundwork for an online database of relevant cyanobacterial strains, named CyanoType (http://lege.ciimar.up.pt/cyanotype). It is a database that includes categorized cyanobacterial strains useful for taxonomic, phylogenetic or genomic purposes, with associated information obtained by means of a literature-based curation. The dataset lists 371 strains and represents the first version of the database (CyanoType v.1). Information for each strain includes strain synonymy and/or co-identity, strain categorization, habitat, accession numbers for molecular data, taxonomy and nomenclature notes according to three different classification schemes, hierarchical automatic classification, phylogenetic placement according to a selection of relevant studies (including this), and important bibliographic references. The database will be updated periodically, namely by adding new strains meeting the criteria for inclusion and by revising and adding up-to-date metadata for strains already listed. A global 16S rDNA-based phylogeny is provided in order to assist users when choosing the appropriate strains for their studies. PMID:28440791

  6. Conformation of phylogenetic relationship of Penaeidae shrimp based on morphometric and molecular investigations.

    PubMed

    Rajakumaran, P; Vaseeharan, B; Jayakumar, R; Chidambara, R

    2014-01-01

    Understanding of accurate phylogenetic relationship among Penaeidae shrimp is important for academic and fisheries industry. The Morphometric and Randomly amplified polymorphic DNA (RAPD) analysis was used to make the phylogenetic relationsip among 13 Penaeidae shrimp. For morphometric analysis forty variables and total lengths of shrimp were measured for each species, and removed the effect of size variation. The size normalized values obtained was subjected to UPGMA (Unweighted Pair-Group Method with Arithmetic Mean) cluster analysis. For RAPD analysis, the four primers showed reliable differentiation between species, and used correlation coefficient between the DNA banding patterns of 13 Penaeidae species to construct UPGMA dendrogram. Phylogenetic relationship from morphometric and molecular analysis for Penaeidae species found to be congruent. We concluded that as the results from morphometry investigations concur with molecular one, phylogenetic relationship obtained for the studied Penaeidae are considered to be reliable.

  7. [Phylogenetic analysis of closely related Leuconostoc citreum species based on partial housekeeping genes].

    PubMed

    Lv, Qiang; Chen, Ming; Xu, Haiyan; Song, Yuqin; Sun, Zhihong; Dan, Tong; Sun, Tiansong

    2013-07-04

    Using the 16S rRNA, dnaA, murC and pyrG gene sequences, we identified the phylogenetic relationship among closely related Leuconostoc citreum species. Seven Leu. citreum strains originally isolated from sourdough were characterized by PCR methods to amplify the dnaA, murC and pyrG gene sequences, which were determined to assess the suitability as phylogenetic markers. Then, we estimated the genetic distance and constructed the phylogenetic trees including 16S rRNA and above mentioned three housekeeping genes combining with published corresponding sequences. By comparing the phylogenetic trees, the topology of three housekeeping genes trees were consistent with that of 16S rRNA gene. The homology of closely related Leu. citreum species among dnaA, murC, pyrG and 16S rRNA gene sequences were different, ranged from75.5% to 97.2%, 50.2% to 99.7%, 65.0% to 99.8% and 98.5% 100%, respectively. The phylogenetic relationship of three housekeeping genes sequences were highly consistent with the results of 16S rRNA gene sequence, while the genetic distance of these housekeeping genes were extremely high than 16S rRNA gene. Consequently, the dnaA, murC and pyrG gene are suitable for classification and identification closely related Leu. citreum species.

  8. A phylogenetic analysis of Diurideae (Orchidaceae) based on plastid DNA sequence data.

    PubMed

    Kores, P J; Molvray, M; Weston, P H; Hopper, S D; Brown, A P; Cameron, K M; Chase, M W

    2001-10-01

    DNA sequence data from plastid matK and trnL-F regions were used in phylogenetic analyses of Diurideae, which indicate that Diurideae are not monophyletic as currently delimited. However, if Chloraeinae and Pterostylidinae are excluded from Diurideae, the remaining subtribes form a well-supported, monophyletic group that is sister to a "spiranthid" clade. Chloraea, Gavilea, and Megastylis pro parte (Chloraeinae) are all placed among the spiranthid orchids and form a grade with Pterostylis leading to a monophyletic Cranichideae. Codonorchis, previously included among Chloraeinae, is sister to Orchideae. Within the more narrowly delimited Diurideae two major lineages are apparent. One includes Diuridinae, Cryptostylidinae, Thelymitrinae, and an expanded Drakaeinae; the other includes Caladeniinae s.s., Prasophyllinae, and Acianthinae. The achlorophyllous subtribe Rhizanthellinae is a member of Diurideae, but its placement is otherwise uncertain. The sequence-based trees indicate that some morphological characters used in previous classifications, such as subterranean storage organs, anther position, growth habit, fungal symbionts, and pollination syndromes have more complex evolutionary histories than previously hypothesized. Treatments based upon these characters have produced conflicting classifications, and molecular data offer a tool for reevaluating these phylogenetic hypotheses.

  9. Fuzzy-C-Means Clustering Based Segmentation and CNN-Classification for Accurate Segmentation of Lung Nodules

    PubMed

    K, Jalal Deen; R, Ganesan; A, Merline

    2017-07-27

    Objective: Accurate segmentation of abnormal and healthy lungs is very crucial for a steadfast computer-aided disease diagnostics. Methods: For this purpose a stack of chest CT scans are processed. In this paper, novel methods are proposed for segmentation of the multimodal grayscale lung CT scan. In the conventional methods using Markov–Gibbs Random Field (MGRF) model the required regions of interest (ROI) are identified. Result: The results of proposed FCM and CNN based process are compared with the results obtained from the conventional method using MGRF model. The results illustrate that the proposed method can able to segment the various kinds of complex multimodal medical images precisely. Conclusion: However, in this paper, to obtain an exact boundary of the regions, every empirical dispersion of the image is computed by Fuzzy C-Means Clustering segmentation. A classification process based on the Convolutional Neural Network (CNN) classifier is accomplished to distinguish the normal tissue and the abnormal tissue. The experimental evaluation is done using the Interstitial Lung Disease (ILD) database. Creative Commons Attribution License

  10. Fuzzy-C-Means Clustering Based Segmentation and CNN-Classification for Accurate Segmentation of Lung Nodules

    PubMed Central

    K, Jalal Deen; R, Ganesan; A, Merline

    2017-01-01

    Objective: Accurate segmentation of abnormal and healthy lungs is very crucial for a steadfast computer-aided disease diagnostics. Methods: For this purpose a stack of chest CT scans are processed. In this paper, novel methods are proposed for segmentation of the multimodal grayscale lung CT scan. In the conventional methods using Markov–Gibbs Random Field (MGRF) model the required regions of interest (ROI) are identified. Result: The results of proposed FCM and CNN based process are compared with the results obtained from the conventional method using MGRF model. The results illustrate that the proposed method can able to segment the various kinds of complex multimodal medical images precisely. Conclusion: However, in this paper, to obtain an exact boundary of the regions, every empirical dispersion of the image is computed by Fuzzy C-Means Clustering segmentation. A classification process based on the Convolutional Neural Network (CNN) classifier is accomplished to distinguish the normal tissue and the abnormal tissue. The experimental evaluation is done using the Interstitial Lung Disease (ILD) database. PMID:28749127

  11. Phylogenetic relationships among species of Lutzomyia, subgenus Lutzomyia (Diptera: Psychodidae).

    PubMed

    Pinto, Israel S; Filho, José D Andrade; Santos, Claudiney B; Falqueto, Aloísio; Leite, Yuri L R

    2010-01-01

    Lutzomyia França is the largest and most diverse sand fly genus in the New World and contains all the species involved in the transmission of American visceral leishmaniasis (AVL). Morphological characters were used to test the monophyly and to infer phylogenetic relationships among members of the Lutzomyia subgenus. Fifty-two morphological characters from male and female adult specimens belonging to 18 species of Lu. (Lutzomyia) were scored and analyzed. The resulting phylogeny confirms the monophyly of this subgenus and reveals four main internal clades. These four clades, however, do not support the classification of the subgenus in two series, longipalpis and cavernicola, because neither is necessarily monophyletic. Knowledge on phylogenetic relationships among these relevant vectors of AVL should be used as a tool for monitoring target taxa and a first step for establishing an early warning system for disease control.

  12. A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

    PubMed

    Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

    2017-05-10

    Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite

  13. Phylogenetic comparative methods on phylogenetic networks with reticulations.

    PubMed

    Bastide, Paul; Solís-Lemus, Claudia; Kriebel, Ricardo; Sparks, K William; Ané, Cécile

    2018-04-25

    The goal of Phylogenetic Comparative Methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) along the branches of a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affect a species' traits, but are not modeled by a tree. Phylogenetic networks have been designed to represent reticulate evolution. As they become available for downstream analyses, new models of trait evolution are needed, applicable to networks. One natural extension of the BM is to use a weighted average model for the trait of a hybrid, at a reticulation point. We develop here an efficient recursive algorithm to compute the phylogenetic variance matrix of a trait on a network, in only one preorder traversal of the network. We then extend the standard PCM tools to this new framework, including phylogenetic regression with covariates (or phylogenetic ANOVA), ancestral trait reconstruction, and Pagel's λ test of phylogenetic signal. The trait of a hybrid is sometimes outside of the range of its two parents, for instance because of hybrid vigor or hybrid depression. These two phenomena are rather commonly observed in present-day hybrids. Transgressive evolution can be modeled as a shift in the trait value following a reticulation point. We develop a general framework to handle such shifts, and take advantage of the phylogenetic regression view of the problem to design statistical tests for ancestral transgressive evolution in the evolutionary history of a group of species. We study the power of these tests in several scenarios, and show that recent events have indeed the strongest impact on the trait distribution of present-day taxa. We apply those methods to a dataset of Xiphophorus fishes, to confirm and complete previous analysis in this group. All the methods developed here are available in the Julia package PhyloNetworks.

  14. Impact of recent molecular phylogenetic studies on classification of ascomycete yeasts

    USDA-ARS?s Scientific Manuscript database

    Analyses of concatenated gene sequences as well as whole genome sequences are resolving relationships among the ascomycete yeasts (Saccharomycotina), thus allowing classification of members of this subphylum to be based on phylogeny. In addition, changes implemented in the new Botanical Code [Intern...

  15. Phylogenetic turnover during subtropical forest succession across environmental and phylogenetic scales.

    PubMed

    Purschke, Oliver; Michalski, Stefan G; Bruelheide, Helge; Durka, Walter

    2017-12-01

    Although spatial and temporal patterns of phylogenetic community structure during succession are inherently interlinked and assembly processes vary with environmental and phylogenetic scales, successional studies of community assembly have yet to integrate spatial and temporal components of community structure, while accounting for scaling issues. To gain insight into the processes that generate biodiversity after disturbance, we combine analyses of spatial and temporal phylogenetic turnover across phylogenetic scales, accounting for covariation with environmental differences. We compared phylogenetic turnover, at the species- and individual-level, within and between five successional stages, representing woody plant communities in a subtropical forest chronosequence. We decomposed turnover at different phylogenetic depths and assessed its covariation with between-plot abiotic differences. Phylogenetic turnover between stages was low relative to species turnover and was not explained by abiotic differences. However, within the late-successional stages, there was high presence-/absence-based turnover (clustering) that occurred deep in the phylogeny and covaried with environmental differentiation. Our results support a deterministic model of community assembly where (i) phylogenetic composition is constrained through successional time, but (ii) toward late succession, species sorting into preferred habitats according to niche traits that are conserved deep in phylogeny, becomes increasingly important.

  16. Phylogenetics.

    PubMed

    Sleator, Roy D

    2011-04-01

    The recent rapid expansion in the DNA and protein databases, arising from large-scale genomic and metagenomic sequence projects, has forced significant development in the field of phylogenetics: the study of the evolutionary relatedness of the planet's inhabitants. Advances in phylogenetic analysis have greatly transformed our view of the landscape of evolutionary biology, transcending the view of the tree of life that has shaped evolutionary theory since Darwinian times. Indeed, modern phylogenetic analysis no longer focuses on the restricted Darwinian-Mendelian model of vertical gene transfer, but must also consider the significant degree of lateral gene transfer, which connects and shapes almost all living things. Herein, I review the major tree-building methods, their strengths, weaknesses and future prospects.

  17. A molecular phylogenetic appraisal of the acanthostomines Acanthostomum and Timoniella and their position within Cryptogonimidae (Trematoda: Opisthorchioidea)

    PubMed Central

    Vidal-Martínez, Victor M.

    2017-01-01

    The phylogenetic position of three taxa from two trematode genera, belonging to the subfamily Acanthostominae (Opisthorchioidea: Cryptogonimidae), were analysed using partial 28S ribosomal DNA (Domains 1–2) and internal transcribed spacers (ITS1–5.8S–ITS2). Bayesian inference and Maximum likelihood analyses of combined 28S rDNA and ITS1 + 5.8S + ITS2 sequences indicated the monophyly of the genus Acanthostomum (A. cf. americanum and A. burminis) and paraphyly of the Acanthostominae. These phylogenetic relationships were consistent in analyses of 28S alone and concatenated 28S + ITS1 + 5.8S + ITS2 sequences analyses. Based on molecular phylogenetic analyses, the subfamily Acanthostominae is therefore a paraphyletic taxon, in contrast with previous classifications based on morphological data. Phylogenetic patterns of host specificity inferred from adult stages of other cryptogonimid taxa are also well supported. However, analyses using additional genera and species are necessary to support the phylogenetic inferences from this study. Our molecular phylogenetic reconstruction linked two larval stages of A. cf. americanum cercariae and metacercariae. Here, we present the evolutionary and ecological implications of parasitic infections in freshwater and brackish environments. PMID:29250471

  18. A molecular phylogenetic appraisal of the acanthostomines Acanthostomum and Timoniella and their position within Cryptogonimidae (Trematoda: Opisthorchioidea).

    PubMed

    Martínez-Aquino, Andrés; Vidal-Martínez, Victor M; Aguirre-Macedo, M Leopoldina

    2017-01-01

    The phylogenetic position of three taxa from two trematode genera, belonging to the subfamily Acanthostominae (Opisthorchioidea: Cryptogonimidae), were analysed using partial 28S ribosomal DNA (Domains 1-2) and internal transcribed spacers (ITS1-5.8S-ITS2). Bayesian inference and Maximum likelihood analyses of combined 28S rDNA and ITS1 + 5.8S + ITS2 sequences indicated the monophyly of the genus Acanthostomum ( A. cf. americanum and A. burminis ) and paraphyly of the Acanthostominae . These phylogenetic relationships were consistent in analyses of 28S alone and concatenated 28S + ITS1 + 5.8S + ITS2 sequences analyses. Based on molecular phylogenetic analyses, the subfamily Acanthostominae is therefore a paraphyletic taxon, in contrast with previous classifications based on morphological data. Phylogenetic patterns of host specificity inferred from adult stages of other cryptogonimid taxa are also well supported. However, analyses using additional genera and species are necessary to support the phylogenetic inferences from this study. Our molecular phylogenetic reconstruction linked two larval stages of A. cf. americanum cercariae and metacercariae. Here, we present the evolutionary and ecological implications of parasitic infections in freshwater and brackish environments.

  19. TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

    PubMed

    Mai, Uyen; Mirarab, Siavash

    2018-05-08

    Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .

  20. Selecting Species Traits for Biomonitoring Applications in light of Phylogenetic Relationships among Lotic Insects

    NASA Astrophysics Data System (ADS)

    Poff, N.; Vieira, N. K.; Simmons, M. P.; Olden, J. D.; Kondratieff, B. C.; Finn, D. S.

    2005-05-01

    The use of species traits as indicators of environmental disturbance is being considered for biomonitoring programs globally. As such, methods to select relevant and informative traits for inclusion in biometrics need to be developed. In this research, we identified 20 traits of aquatic insects within six trait groups: morphology, mobility, life-history strategy, thermal tolerance, feeding guild and ecology (e.g., habitat preference). We constructed phylogenetic trees for 1) all lotic insect species of North America and 2) all Ephemeroptera, Plecoptera and Trichoptera species based on morphology- and molecular-based analyses and classifications. We then measured variability (i.e., plasticity) of the 20 traits and six trait groups across the two phylogenetic trees. Traits with higher degrees of plasticity indicated traits that were less phylogenetically constrained, and were considered informative for biomonitoring purposes. Thermal tolerance, rheophily, body size at maturity and feeding guild showed the highest plasticity across both phylogenetic trees. Two mobility traits, occurrence in drift and adult dispersal distance, showed moderate plasticity. By contrast, adult exiting ability, degree of attachment, adult lifespan and body shape showed low variability and were thus less informative. Plastic species traits that are less phylogenetically constrained may be most useful in detecting community change along environmental gradients.

  1. Controlled recovery of phylogenetic communities from an evolutionary model using a network approach

    NASA Astrophysics Data System (ADS)

    Sousa, Arthur M. Y. R.; Vieira, André P.; Prado, Carmen P. C.; Andrade, Roberto F. S.

    2016-04-01

    This works reports the use of a complex network approach to produce a phylogenetic classification tree of a simple evolutionary model. This approach has already been used to treat proteomic data of actual extant organisms, but an investigation of its reliability to retrieve a traceable evolutionary history is missing. The used evolutionary model includes key ingredients for the emergence of groups of related organisms by differentiation through random mutations and population growth, but purposefully omits other realistic ingredients that are not strictly necessary to originate an evolutionary history. This choice causes the model to depend only on a small set of parameters, controlling the mutation probability and the population of different species. Our results indicate that for a set of parameter values, the phylogenetic classification produced by the used framework reproduces the actual evolutionary history with a very high average degree of accuracy. This includes parameter values where the species originated by the evolutionary dynamics have modular structures. In the more general context of community identification in complex networks, our model offers a simple setting for evaluating the effects, on the efficiency of community formation and identification, of the underlying dynamics generating the network itself.

  2. Taxonomy-aware feature engineering for microbiome classification.

    PubMed

    Oudah, Mai; Henschel, Andreas

    2018-06-15

    What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering. We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples. We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses.

  3. Bayesian phylogenetic estimation of fossil ages.

    PubMed

    Drummond, Alexei J; Stadler, Tanja

    2016-07-19

    Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular, the fossilized birth-death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized datasets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two datasets of 5.7% and 13.2%, respectively. The median relative standard error (RSD) was 9.2% and 7.2%, respectively, suggesting good precision, although with some outliers. In fact, in the two datasets we analyse, the phylogenetic estimate of fossil age is on average less than 2 Myr from the mid-point age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the 'morphological clock', and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses.This article is part of the themed issue 'Dating species divergences using

  4. Bayesian phylogenetic estimation of fossil ages

    PubMed Central

    Drummond, Alexei J.; Stadler, Tanja

    2016-01-01

    Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular, the fossilized birth–death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized datasets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two datasets of 5.7% and 13.2%, respectively. The median relative standard error (RSD) was 9.2% and 7.2%, respectively, suggesting good precision, although with some outliers. In fact, in the two datasets we analyse, the phylogenetic estimate of fossil age is on average less than 2 Myr from the mid-point age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the ‘morphological clock', and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses. This article is part of the themed issue ‘Dating species divergences

  5. Phylogenetic relationships and species circumscription in Trentepohlia and Printzina (Trentepohliales, Chlorophyta).

    PubMed

    Rindi, Fabio; Lam, Daryl W; López-Bautista, Juan M

    2009-08-01

    Subaerial green microalgae represent a polyphyletic complex of organisms, whose genetic diversity is much higher than their simple morphologies suggest. The order Trentepohliales is the only species-rich group of subaerial algae belonging to the class Ulvophyceae and represents an ideal model taxon to investigate evolutionary patterns of these organisms. We studied phylogenetic relationships in two common genera of Trentepohliales (Trentepohlia and Printzina) by separate and combined analyses of the rbcL and 18S rRNA genes. Trentepohlia and Printzina were not resolved as monophyletic groups. Three main clades were recovered in all analyses, but none corresponded to any trentepohlialean genus as defined based on morphological grounds. The rbcL and 18S rRNA datasets provided congruent phylogenetic signals and similar topologies were recovered in single-gene analyses. Analyses performed on the combined 2-gene dataset inferred generally higher nodal support. The results clarified several taxonomic problems and showed that the evolution of these algae has been characterized by considerable morphological convergence. Trentepohlia abietina and T. flava were shown to be separate species from T. aurea; Printzina lagenifera, T. arborum and T. umbrina were resolved as polyphyletic taxa, whose vegetative morphology appears to have evolved independently in separate lineages. Incongruence between phylogenetic relationships and traditional morphological classification was demonstrated, showing that the morphological characters commonly used in the taxonomy of the Trentepohliales are phylogenetically irrelevant.

  6. New phylogenetic insights toward developing a natural generic classification of African angraecoid orchids (Vandeae, Orchidaceae).

    PubMed

    Simo-Droissart, Murielle; Plunkett, Gregory M; Droissart, Vincent; Edwards, Molly B; Farminhão, João N M; Ječmenica, Vladimir; D'haijère, Tania; Lowry, Porter P; Sonké, Bonaventure; Micheneau, Claire; Carlsward, Barbara S; Azandi, Laura; Verlynde, Simon; Hardy, Olivier J; Martos, Florent; Bytebier, Benny; Fischer, Eberhard; Stévart, Tariq

    2018-09-01

    Despite significant progress made in recent years toward developing an infrafamilial classification of Orchidaceae, our understanding of relationships among and within tribal and subtribal groups of epidendroid orchids remains incomplete. To reassess generic delimitation among one group of these epidendroids, the African angraecoids, phylogenetic relationships were inferred from DNA sequence data from three regions, ITS, matK, and the trnL-trnF intergenic spacer, obtained from a broadly representative sample of taxa. Parsimony and Bayesian analyses yielded highly resolved trees that are in clear agreement and show significant support for many key clades within subtribe Angraecinae s.l. Angraecoid orchids comprise two well-supported clades: an African/American group and an Indian Ocean group. Molecular results also support many previously proposed relationships among genera, but also reveal some unexpected relationships. The genera Aerangis, Ancistrorhynchus, Bolusiella, Campylocentrum, Cyrtorchis, Dendrophylax, Eurychone, Microcoelia, Nephrangis, Podangis and Solenangis are all shown to be monophyletic, but Angraecopsis, Diaphananthe and Margelliantha are polyphyletic. Diaphananthe forms three well-supported clades, one of which might represent a new genus, and Rhipidoglossum is paraphyletic with respect to Cribbia and Rhaesteria, and also includes taxa currently assigned to Margelliantha. Tridactyle too is paraphyletic as Eggelingia is embedded within it. The large genus Angraecum is confirmed to be polyphyletic and several groups will have to be recognized as separate genera, including sections Dolabrifolia and Hadrangis. The recently segregated genus Pectinariella (previously recognized as A. sect. Pectinaria) is polyphyletic and its Continental African species will have to be removed. Similarly, some of the species recently transferred to Angraecoides that were previously placed in Angraecum sects. Afrangraecum and Conchoglossum will have to be moved and

  7. Rapid identification and classification of Mycobacterium spp. using whole-cell protein barcodes with matrix assisted laser desorption ionization time of flight mass spectrometry in comparison with multigene phylogenetic analysis.

    PubMed

    Wang, Jun; Chen, Wen Feng; Li, Qing X

    2012-02-24

    The need of quick diagnostics and increasing number of bacterial species isolated necessitate development of a rapid and effective phenotypic identification method. Mass spectrometry (MS) profiling of whole cell proteins has potential to satisfy the requirements. The genus Mycobacterium contains more than 154 species that are taxonomically very close and require use of multiple genes including 16S rDNA for phylogenetic identification and classification. Six strains of five Mycobacterium species were selected as model bacteria in the present study because of their 16S rDNA similarity (98.4-99.8%) and the high similarity of the concatenated 16S rDNA, rpoB and hsp65 gene sequences (95.9-99.9%), requiring high identification resolution. The classification of the six strains by MALDI TOF MS protein barcodes was consistent with, but at much higher resolution than, that of the multi-locus sequence analysis of using 16S rDNA, rpoB and hsp65. The species were well differentiated using MALDI TOF MS and MALDI BioTyper™ software after quick preparation of whole-cell proteins. Several proteins were selected as diagnostic markers for species confirmation. An integration of MALDI TOF MS, MALDI BioTyper™ software and diagnostic protein fragments provides a robust phenotypic approach for bacterial identification and classification. Copyright © 2011 Elsevier B.V. All rights reserved.

  8. HIV classification using the coalescent theory

    PubMed Central

    Bulla, Ingo; Schultz, Anne-Kathrin; Schreiber, Fabian; Zhang, Ming; Leitner, Thomas; Korber, Bette; Morgenstern, Burkhard; Stanke, Mario

    2010-01-01

    Motivation: Existing coalescent models and phylogenetic tools based on them are not designed for studying the genealogy of sequences like those of HIV, since in HIV recombinants with multiple cross-over points between the parental strains frequently arise. Hence, ambiguous cases in the classification of HIV sequences into subtypes and circulating recombinant forms (CRFs) have been treated with ad hoc methods in lack of tools based on a comprehensive coalescent model accounting for complex recombination patterns. Results: We developed the program ARGUS that scores classifications of sequences into subtypes and recombinant forms. It reconstructs ancestral recombination graphs (ARGs) that reflect the genealogy of the input sequences given a classification hypothesis. An ARG with maximal probability is approximated using a Markov chain Monte Carlo approach. ARGUS was able to distinguish the correct classification with a low error rate from plausible alternative classifications in simulation studies with realistic parameters. We applied our algorithm to decide between two recently debated alternatives in the classification of CRF02 of HIV-1 and find that CRF02 is indeed a recombinant of Subtypes A and G. Availability: ARGUS is implemented in C++ and the source code is available at http://gobics.de/software Contact: ibulla@uni-goettingen.de Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:20400454

  9. Phylogenetic inference under varying proportions of indel-induced alignment gaps

    PubMed Central

    Dwivedi, Bhakti; Gadagkar, Sudhindra R

    2009-01-01

    Background The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. Results (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. Conclusion When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the

  10. A multigene phylogenetic synthesis for the class Lecanoromycetes (Ascomycota): 1307 fungi representing 1139 infrageneric taxa, 317 genera and 66 families

    PubMed Central

    Miadlikowska, Jolanta; Kauff, Frank; Högnabba, Filip; Oliver, Jeffrey C.; Molnár, Katalin; Fraker, Emily; Gaya, Ester; Hafellner, Josef; Hofstetter, Valérie; Gueidan, Cécile; Otálora, Mónica A.G.; Hodkinson, Brendan; Kukwa, Martin; Lücking, Robert; Björk, Curtis; Sipman, Harrie J.M.; Burgaz, Ana Rosa; Thell, Arne; Passo, Alfredo; Myllys, Leena; Goward, Trevor; Fernández-Brime, Samantha; Hestmark, Geir; Lendemer, James; Lumbsch, H. Thorsten; Schmull, Michaela; Schoch, Conrad; Sérusiaux, Emmanuël; Maddison, David R.; Arnold, A. Elizabeth; Lutzoni, François; Stenroos, Soili

    2014-01-01

    The Lecanoromycetes is the largest class of lichenized Fungi, and one of the most species-rich classes in the kingdom. Here we provide a multigene phylogenetic synthesis (using three ribosomal RNA-coding and two protein-coding genes) of the Lecanoromycetes based on 642 newly generated and 3329 publicly available sequences representing 1139 taxa, 317 genera, 66 families, 17 orders and five subclasses (four currently recognized: Acarosporomycetidae, Lecanoromycetidae, Ostropomycetidae, Umbilicariomycetidae; and one provisionarily recognized, ‘Candelariomycetidae’). Maximum likelihood phylogenetic analyses on four multigene datasets assembled using a cumulative supermatrix approach with a progressively higher number of species and missing data (5-gene, 5+4-gene, 5+4+3-gene and 5+4+3+2-gene datasets) show that the current classification includes non-monophyletic taxa at various ranks, which need to be recircumscribed and require revisionary treatments based on denser taxon sampling and more loci. Two newly circumscribed orders (Arctomiales and Hymeneliales in the Ostropomycetidae) and three families (Ramboldiaceae and Psilolechiaceae in the Lecanorales, and Strangosporaceae in the Lecanoromycetes inc. sed.) are introduced. The potential resurrection of the families Eigleraceae and Lopadiaceae is considered here to alleviate phylogenetic and classification disparities. An overview of the photobionts associated with the main fungal lineages in the Lecanoromycetes based on available published records is provided. A revised schematic classification at the family level in the phylogenetic context of widely accepted and newly revealed relationships across Lecanoromycetes is included. The cumulative addition of taxa with an increasing amount of missing data (i.e., a cumulative supermatrix approach, starting with taxa for which sequences were available for all five targeted genes and ending with the addition of taxa for which only two genes have been sequenced) revealed

  11. Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error

    PubMed Central

    Porter, Teresita M.; Golding, G. Brian

    2012-01-01

    Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215

  12. Contextual classification of multispectral image data: Approximate algorithm

    NASA Technical Reports Server (NTRS)

    Tilton, J. C. (Principal Investigator)

    1980-01-01

    An approximation to a classification algorithm incorporating spatial context information in a general, statistical manner is presented which is computationally less intensive. Classifications that are nearly as accurate are produced.

  13. Genome-Wide Comparative Gene Family Classification

    PubMed Central

    Frech, Christian; Chen, Nansheng

    2010-01-01

    Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221

  14. Comprehensive evolutionary and phylogenetic analysis of Hepacivirus N (HNV).

    PubMed

    da Silva, M S; Junqueira, D M; Baumbach, L F; Cibulski, S P; Mósena, A C S; Weber, M N; Silveira, S; de Moraes, G M; Maia, R D; Coimbra, V C S; Canal, C W

    2018-05-24

    Hepaciviruses (HVs) have been detected in several domestic and wild animals and present high genetic diversity. The actual classification divides the genus Hepacivirus into 14 species (A-N), according to their phylogenetic relationships, including the bovine hepacivirus [Hepacivirus N (HNV)]. In this study, we confirmed HNV circulation in Brazil and sequenced the whole genome of two strains. Based on the current classification of HCV, which is divided into genotypes and subtypes, we analysed all available bovine hepacivirus sequences in the GenBank database and proposed an HNV classification. All of the sequences were grouped into a single genotype, putatively named 'genotype 1'. This genotype can be clearly divided into four subtypes: A and D containing sequences from Germany and Brazil, respectively, and B and C containing Ghanaian sequences. In addition, the NS3-coding region was used to estimate the time to the most recent common ancestor (TMRCA) of each subtype, using a Bayesian approach and a relaxed molecular clock model. The analyses indicated a common origin of the virus circulating in Germany and Brazil. Ghanaian sequences seemed to have an older TMRCA, indicating a long time of circulation of these viruses in the African continent.

  15. Classification of Instructional Programs: 2000 Edition.

    ERIC Educational Resources Information Center

    Morgan, Robert L.; Hunt, E. Stephen

    This third revision of the Classification of Instructional Programs (CIP) updates and modifies education program classifications, providing a taxonomic scheme that supports the accurate tracking, assessment, and reporting of field of study and program completions activity. This edition has also been adopted as the standard field of study taxonomy…

  16. Genetic variability of HEV isolates: inconsistencies of current classification.

    PubMed

    Oliveira-Filho, Edmilson F; König, Matthias; Thiel, Heinz-Jürgen

    2013-07-26

    Many HEV and HEV-like sequences have been reported during the last years, including isolates which may represent a number of potential new genera, new genotypes or new subtypes within the family Hepeviridae. Using the most common classification system, difficulties in the establishment of subtypes have been reported. Moreover the relevance of subtype classification for epidemiology can be questioned. In this study we have performed phylogenetic analyses based on whole capsid gene and complete HEV genomic sequences in order to evaluate the current classification of HEV at genotype and subtype levels. The results of our analyses modify the current taxonomy of genotype 3 and refine the established system for typing of HEV. In addition we suggest a classification for hepeviruses recently isolated from bats, ferrets, rats and wild boar. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Trends and concepts in fern classification.

    PubMed

    Christenhusz, Maarten J M; Chase, Mark W

    2014-03-01

    Throughout the history of fern classification, familial and generic concepts have been highly labile. Many classifications and evolutionary schemes have been proposed during the last two centuries, reflecting different interpretations of the available evidence. Knowledge of fern structure and life histories has increased through time, providing more evidence on which to base ideas of possible relationships, and classification has changed accordingly. This paper reviews previous classifications of ferns and presents ideas on how to achieve a more stable consensus. An historical overview is provided from the first to the most recent fern classifications, from which conclusions are drawn on past changes and future trends. The problematic concept of family in ferns is discussed, with a particular focus on how this has changed over time. The history of molecular studies and the most recent findings are also presented. Fern classification generally shows a trend from highly artificial, based on an interpretation of a few extrinsic characters, via natural classifications derived from a multitude of intrinsic characters, towards more evolutionary circumscriptions of groups that do not in general align well with the distribution of these previously used characters. It also shows a progression from a few broad family concepts to systems that recognized many more narrowly and highly controversially circumscribed families; currently, the number of families recognized is stabilizing somewhere between these extremes. Placement of many genera was uncertain until the arrival of molecular phylogenetics, which has rapidly been improving our understanding of fern relationships. As a collective category, the so-called 'fern allies' (e.g. Lycopodiales, Psilotaceae, Equisetaceae) were unsurprisingly found to be polyphyletic, and the term should be abandoned. Lycopodiaceae, Selaginellaceae and Isoëtaceae form a clade (the lycopods) that is sister to all other vascular plants, whereas

  18. Trends and concepts in fern classification

    PubMed Central

    Christenhusz, Maarten J. M.; Chase, Mark W.

    2014-01-01

    Background and Aims Throughout the history of fern classification, familial and generic concepts have been highly labile. Many classifications and evolutionary schemes have been proposed during the last two centuries, reflecting different interpretations of the available evidence. Knowledge of fern structure and life histories has increased through time, providing more evidence on which to base ideas of possible relationships, and classification has changed accordingly. This paper reviews previous classifications of ferns and presents ideas on how to achieve a more stable consensus. Scope An historical overview is provided from the first to the most recent fern classifications, from which conclusions are drawn on past changes and future trends. The problematic concept of family in ferns is discussed, with a particular focus on how this has changed over time. The history of molecular studies and the most recent findings are also presented. Key Results Fern classification generally shows a trend from highly artificial, based on an interpretation of a few extrinsic characters, via natural classifications derived from a multitude of intrinsic characters, towards more evolutionary circumscriptions of groups that do not in general align well with the distribution of these previously used characters. It also shows a progression from a few broad family concepts to systems that recognized many more narrowly and highly controversially circumscribed families; currently, the number of families recognized is stabilizing somewhere between these extremes. Placement of many genera was uncertain until the arrival of molecular phylogenetics, which has rapidly been improving our understanding of fern relationships. As a collective category, the so-called ‘fern allies’ (e.g. Lycopodiales, Psilotaceae, Equisetaceae) were unsurprisingly found to be polyphyletic, and the term should be abandoned. Lycopodiaceae, Selaginellaceae and Isoëtaceae form a clade (the lycopods) that is

  19. BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees.

    PubMed

    Wang, Juan; Guo, Maozu; Xing, Linlin; Che, Kai; Liu, Xiaoyan; Wang, Chunyu

    2013-09-15

    Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. © 2013 Elsevier B.V. All rights reserved.

  20. High-resolution phylogenetic microbial community profiling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singer, Esther; Coleman-Derr, Devin; Bowman, Brett

    2014-03-17

    The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance ourmore » knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.« less

  1. Multi-gene phylogenetic analysis reveals that shochu-fermenting Saccharomyces cerevisiae strains form a distinct sub-clade of the Japanese sake cluster.

    PubMed

    Futagami, Taiki; Kadooka, Chihiro; Ando, Yoshinori; Okutsu, Kayu; Yoshizaki, Yumiko; Setoguchi, Shinji; Takamine, Kazunori; Kawai, Mikihiko; Tamaki, Hisanori

    2017-10-01

    Shochu is a traditional Japanese distilled spirit. The formation of the distinguishing flavour of shochu produced in individual distilleries is attributed to putative indigenous yeast strains. In this study, we performed the first (to our knowledge) phylogenetic classification of shochu strains based on nucleotide gene sequences. We performed phylogenetic classification of 21 putative indigenous shochu yeast strains isolated from 11 distilleries. All of these strains were shown or confirmed to be Saccharomyces cerevisiae, sharing species identification with 34 known S. cerevisiae strains (including commonly used shochu, sake, ale, whisky, bakery, bioethanol and laboratory yeast strains and clinical isolate) that were tested in parallel. Our analysis used five genes that reflect genome-level phylogeny for the strain-level classification. In a first step, we demonstrated that partial regions of the ZAP1, THI7, PXL1, YRR1 and GLG1 genes were sufficient to reproduce previous sub-species classifications. In a second step, these five analysed regions from each of 25 strains (four commonly used shochu strains and the 21 putative indigenous shochu strains) were concatenated and used to generate a phylogenetic tree. Further analysis revealed that the putative indigenous shochu yeast strains form a monophyletic group that includes both the shochu yeasts and a subset of the sake group strains; this cluster is a sister group to other sake yeast strains, together comprising a sake-shochu group. Differences among shochu strains were small, suggesting that it may be possible to correlate subtle phenotypic differences among shochu flavours with specific differences in genome sequences. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  2. Phylogenetics of the phlebotomine sand fly group Verrucarum (Diptera: Psychodidae: Lutzomyia).

    PubMed

    Cohnstaedt, Lee W; Beati, Lorenza; Caceres, Abraham G; Ferro, Cristina; Munstermann, Leonard E

    2011-06-01

    Within the sand fly genus Lutzomyia, the Verrucarum species group contains several of the principal vectors of American cutaneous leishmaniasis and human bartonellosis in the Andean region of South America. The group encompasses 40 species for which the taxonomic status, phylogenetic relationships, and role of each species in disease transmission remain unresolved. Mitochondrial cytochrome c oxidase I (COI) phylogenetic analysis of a 667-bp fragment supported the morphological classification of the Verrucarum group into series. Genetic sequences from seven species were grouped in well-supported monophyletic lineages. Four species, however, clustered in two paraphyletic lineages that indicate conspecificity--the Lutzomyia longiflocosa-Lutzomyia sauroida pair and the Lutzomyia quasitownsendi-Lutzomyia torvida pair. COI sequences were also evaluated as a taxonomic tool based on interspecific genetic variability within the Verrucarum group and the intraspecific variability of one of its members, Lutzomyia verrucarum, across its known distribution.

  3. Phylogenetics of the Phlebotomine Sand Fly Group Verrucarum (Diptera: Psychodidae: Lutzomyia)

    PubMed Central

    Cohnstaedt, Lee W.; Beati, Lorenza; Caceres, Abraham G.; Ferro, Cristina; Munstermann, Leonard E.

    2011-01-01

    Within the sand fly genus Lutzomyia, the Verrucarum species group contains several of the principal vectors of American cutaneous leishmaniasis and human bartonellosis in the Andean region of South America. The group encompasses 40 species for which the taxonomic status, phylogenetic relationships, and role of each species in disease transmission remain unresolved. Mitochondrial cytochrome c oxidase I (COI) phylogenetic analysis of a 667-bp fragment supported the morphological classification of the Verrucarum group into series. Genetic sequences from seven species were grouped in well-supported monophyletic lineages. Four species, however, clustered in two paraphyletic lineages that indicate conspecificity—the Lutzomyia longiflocosa–Lutzomyia sauroida pair and the Lutzomyia quasitownsendi–Lutzomyia torvida pair. COI sequences were also evaluated as a taxonomic tool based on interspecific genetic variability within the Verrucarum group and the intraspecific variability of one of its members, Lutzomyia verrucarum, across its known distribution. PMID:21633028

  4. Use of phylogenetical analysis to predict susceptibility of pathogenic Candida spp. to antifungal drugs.

    PubMed

    Maheux, Andrée F; Sellam, Adnane; Piché, Yves; Boissinot, Maurice; Pelletier, René; Boudreau, Dominique K; Picard, François J; Trépanier, Hélène; Boily, Marie-Josée; Ouellette, Marc; Roy, Paul H; Bergeron, Michel G

    2016-12-01

    Successful treatment of a Candida infection relies on 1) an accurate identification of the pathogenic fungus and 2) on its susceptibility to antifungal drugs. In the present study we investigated the level of correlation between phylogenetical evolution and susceptibility of pathogenic Candida spp. to antifungal drugs. For this, we compared a phylogenetic tree, assembled with the concatenated sequences (2475-bp) of the ATP2, TEF1, and TUF1 genes from 20 representative Candida species, with published minimal inhibitory concentrations (MIC) of the four principal antifungal drug classes commonly used in the treatment of candidiasis: polyenes, triazoles, nucleoside analogues, and echinocandins. The phylogenetic tree revealed three distinct phylogenetic clusters among Candida species. Species within a given phylogenetic cluster have generally similar susceptibility profiles to antifungal drugs and species within Clusters II and III were less sensitive to antifungal drugs than Cluster I species. These results showed that phylogenetical relationship between clusters and susceptibility to several antifungal drugs could be used to guide therapy when only species identification is available prior to information pertaining to its resistance profile. An extended study comprising a large panel of clinical samples should be conducted to confirm the efficiency of this approach in the treatment of candidiasis. Copyright © 2016. Published by Elsevier B.V.

  5. A phylogenetic framework facilitates Y-STR variant discovery and classification via massively parallel sequencing.

    PubMed

    Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H

    2018-04-12

    Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants

  6. Revisiting the taxonomical classification of Porcine Circovirus type 2 (PCV2): still a real challenge.

    PubMed

    Franzo, Giovanni; Cortey, Martí; Olvera, Alex; Novosel, Dinko; Castro, Alessandra Marnie Martins Gomes De; Biagini, Philippe; Segalés, Joaquim; Drigo, Michele

    2015-08-28

    PCV2 has emerged as one of the most devastating viral infections of swine farming, causing a relevant economic impact due to direct losses and control strategies expenses. Epidemiological and experimental studies have evidenced that genetic diversity is potentially affecting the virulence of PVC2. The growing number of PCV2 complete genomes and partial sequences available at GenBank questioned the accepted PCV2 classification. Nine hundred seventy five PCV2 complete genomes and 1,270 ORF2 sequences available from GenBank were subjected to recombination, PASC and phylogenetic analyses and results were used for comparison with previous classification scheme. The outcome of these analyses favors the recognition of four genotypes on the basis of ORF2 sequences, namely PCV2a, PCV2b, PCV2c and PCV2d-mPCV2b. To deal with the difficulty of founding an unambiguous classification and accounting the impossibility to define a p-distance cut-off, a set of reference sequences that could be used in further phylogenetic studies for PCV2 genotyping was established. Being aware that extensive phylogenetic analyses are time-consuming and often impracticable during routine diagnostic activity, ORF2 nucleotide positions adequately conserved in the reference sequences were identified and reported to allow a quick genotype differentiation. Globally, the present work provides an updated scenario of PCV2 genotypes distribution and, based on the limits of the previous classification criteria, proposes new rapid and effective schemes for differentiating the four defined PCV2 genotypes.

  7. Highly Accurate Classification of Watson-Crick Basepairs on Termini of Single DNA Molecules

    PubMed Central

    Winters-Hilt, Stephen; Vercoutere, Wenonah; DeGuzman, Veronica S.; Deamer, David; Akeson, Mark; Haussler, David

    2003-01-01

    We introduce a computational method for classification of individual DNA molecules measured by an α-hemolysin channel detector. We show classification with better than 99% accuracy for DNA hairpin molecules that differ only in their terminal Watson-Crick basepairs. Signal classification was done in silico to establish performance metrics (i.e., where train and test data were of known type, via single-species data files). It was then performed in solution to assay real mixtures of DNA hairpins. Hidden Markov Models (HMMs) were used with Expectation/Maximization for denoising and for associating a feature vector with the ionic current blockade of the DNA molecule. Support Vector Machines (SVMs) were used as discriminators, and were the focus of off-line training. A multiclass SVM architecture was designed to place less discriminatory load on weaker discriminators, and novel SVM kernels were used to boost discrimination strength. The tuning on HMMs and SVMs enabled biophysical analysis of the captured molecule states and state transitions; structure revealed in the biophysical analysis was used for better feature selection. PMID:12547778

  8. A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.

    PubMed

    Rajan, Vaibhav

    2013-03-01

    Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.

  9. Analysis of genetic diversity in banana cultivars (Musa cvs.) from the South of Oman using AFLP markers and classification by phylogenetic, hierarchical clustering and principal component analyses*

    PubMed Central

    Opara, Umezuruike Linus; Jacobson, Dan; Al-Saady, Nadiya Abubakar

    2010-01-01

    Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs. This study employed amplified fragment length polymorphism (AFLP) to investigate the genetic variation in local banana cultivars from the southern region of Oman. Using 12 primer combinations, a total of 1094 bands were scored, of which 1012 were polymorphic. Eighty-two unique markers were identified, which revealed the distinct separation of the seven cultivars. The results obtained show that AFLP can be used to differentiate the banana cultivars. Further classification by phylogenetic, hierarchical clustering and principal component analyses showed significant differences between the clusters found with molecular markers and those clusters created by previous studies using morphological analysis. Based on the analytical results, a consensus dendrogram of the banana cultivars is presented. PMID:20443211

  10. Update on diabetes classification.

    PubMed

    Thomas, Celeste C; Philipson, Louis H

    2015-01-01

    This article highlights the difficulties in creating a definitive classification of diabetes mellitus in the absence of a complete understanding of the pathogenesis of the major forms. This brief review shows the evolving nature of the classification of diabetes mellitus. No classification scheme is ideal, and all have some overlap and inconsistencies. The only diabetes in which it is possible to accurately diagnose by DNA sequencing, monogenic diabetes, remains undiagnosed in more than 90% of the individuals who have diabetes caused by one of the known gene mutations. The point of classification, or taxonomy, of disease, should be to give insight into both pathogenesis and treatment. It remains a source of frustration that all schemes of diabetes mellitus continue to fall short of this goal. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. An updated evolutionary classification of CRISPR–Cas systems

    PubMed Central

    Makarova, Kira S.; Wolf, Yuri I.; Alkhnbashi, Omer S.; Costa, Fabrizio; Shah, Shiraz A.; Saunders, Sita J.; Barrangou, Rodolphe; Brouns, Stan J. J.; Charpentier, Emmanuelle; Haft, Daniel H.; Horvath, Philippe; Moineau, Sylvain; Mojica, Francisco J. M.; Terns, Rebecca M.; Terns, Michael P.; White, Malcolm F.; Yakunin, Alexander F.; Garrett, Roger A.; van der Oost, John; Backofen, Rolf; Koonin, Eugene V.

    2017-01-01

    The evolution of CRISPR–cas loci, which encode adaptive immune systems in archaea and bacteria, involves rapid changes, in particular numerous rearrangements of the locus architecture and horizontal transfer of complete loci or individual modules. These dynamics complicate straightforward phylogenetic classification, but here we present an approach combining the analysis of signature protein families and features of the architecture of cas loci that unambiguously partitions most CRISPR–cas loci into distinct classes, types and subtypes. The new classification retains the overall structure of the previous version but is expanded to now encompass two classes, five types and 16 subtypes. The relative stability of the classification suggests that the most prevalent variants of CRISPR–Cas systems are already known. However, the existence of rare, currently unclassifiable variants implies that additional types and subtypes remain to be characterized. PMID:26411297

  12. Phylogenetic Inferences Reveal a Large Extent of Novel Biodiversity in Chemically Rich Tropical Marine Cyanobacteria

    PubMed Central

    Gunasekera, Sarath P.; Gerwick, William H.

    2013-01-01

    Benthic marine cyanobacteria are known for their prolific biosynthetic capacities to produce structurally diverse secondary metabolites with biomedical application and their ability to form cyanobacterial harmful algal blooms. In an effort to provide taxonomic clarity to better guide future natural product drug discovery investigations and harmful algal bloom monitoring, this study investigated the taxonomy of tropical and subtropical natural product-producing marine cyanobacteria on the basis of their evolutionary relatedness. Our phylogenetic inferences of marine cyanobacterial strains responsible for over 100 bioactive secondary metabolites revealed an uneven taxonomic distribution, with a few groups being responsible for the vast majority of these molecules. Our data also suggest a high degree of novel biodiversity among natural product-producing strains that was previously overlooked by traditional morphology-based taxonomic approaches. This unrecognized biodiversity is primarily due to a lack of proper classification systems since the taxonomy of tropical and subtropical, benthic marine cyanobacteria has only recently been analyzed by phylogenetic methods. This evolutionary study provides a framework for a more robust classification system to better understand the taxonomy of tropical and subtropical marine cyanobacteria and the distribution of natural products in marine cyanobacteria. PMID:23315747

  13. Optimal rates for phylogenetic inference and experimental design in the era of genome-scale datasets.

    PubMed

    Dornburg, Alex; Su, Zhuo; Townsend, Jeffrey P

    2018-06-25

    With the rise of genome- scale datasets there has been a call for increased data scrutiny and careful selection of loci appropriate for attempting the resolution of a phylogenetic problem. Such loci are desired to maximize phylogenetic information content while minimizing the risk of homoplasy. Theory posits the existence of characters that evolve under such an optimum rate, and efforts to determine optimal rates of inference have been a cornerstone of phylogenetic experimental design for over two decades. However, both theoretical and empirical investigations of optimal rates have varied dramatically in their conclusions: spanning no relationship to a tight relationship between the rate of change and phylogenetic utility. Here we synthesize these apparently contradictory views, demonstrating both empirical and theoretical conditions under which each is correct. We find that optimal rates of characters-not genes-are generally robust to most experimental design decisions. Moreover, consideration of site rate heterogeneity within a given locus is critical to accurate predictions of utility. Factors such as taxon sampling or the targeted number of characters providing support for a topology are additionally critical to the predictions of phylogenetic utility based on the rate of character change. Further, optimality of rates and predictions of phylogenetic utility are not equivalent, demonstrating the need for further development of comprehensive theory of phylogenetic experimental design.

  14. Prostate segmentation by sparse representation based classification

    PubMed Central

    Gao, Yaozong; Liao, Shu; Shen, Dinggang

    2012-01-01

    Purpose: The segmentation of prostate in CT images is of essential importance to external beam radiotherapy, which is one of the major treatments for prostate cancer nowadays. During the radiotherapy, the prostate is radiated by high-energy x rays from different directions. In order to maximize the dose to the cancer and minimize the dose to the surrounding healthy tissues (e.g., bladder and rectum), the prostate in the new treatment image needs to be accurately localized. Therefore, the effectiveness and efficiency of external beam radiotherapy highly depend on the accurate localization of the prostate. However, due to the low contrast of the prostate with its surrounding tissues (e.g., bladder), the unpredicted prostate motion, and the large appearance variations across different treatment days, it is challenging to segment the prostate in CT images. In this paper, the authors present a novel classification based segmentation method to address these problems. Methods: To segment the prostate, the proposed method first uses sparse representation based classification (SRC) to enhance the prostate in CT images by pixel-wise classification, in order to overcome the limitation of poor contrast of the prostate images. Then, based on the classification results, previous segmented prostates of the same patient are used as patient-specific atlases to align onto the current treatment image and the majority voting strategy is finally adopted to segment the prostate. In order to address the limitations of the traditional SRC in pixel-wise classification, especially for the purpose of segmentation, the authors extend SRC from the following four aspects: (1) A discriminant subdictionary learning method is proposed to learn a discriminant and compact representation of training samples for each class so that the discriminant power of SRC can be increased and also SRC can be applied to the large-scale pixel-wise classification. (2) The L1 regularized sparse coding is replaced by

  15. Prostate segmentation by sparse representation based classification.

    PubMed

    Gao, Yaozong; Liao, Shu; Shen, Dinggang

    2012-10-01

    The segmentation of prostate in CT images is of essential importance to external beam radiotherapy, which is one of the major treatments for prostate cancer nowadays. During the radiotherapy, the prostate is radiated by high-energy x rays from different directions. In order to maximize the dose to the cancer and minimize the dose to the surrounding healthy tissues (e.g., bladder and rectum), the prostate in the new treatment image needs to be accurately localized. Therefore, the effectiveness and efficiency of external beam radiotherapy highly depend on the accurate localization of the prostate. However, due to the low contrast of the prostate with its surrounding tissues (e.g., bladder), the unpredicted prostate motion, and the large appearance variations across different treatment days, it is challenging to segment the prostate in CT images. In this paper, the authors present a novel classification based segmentation method to address these problems. To segment the prostate, the proposed method first uses sparse representation based classification (SRC) to enhance the prostate in CT images by pixel-wise classification, in order to overcome the limitation of poor contrast of the prostate images. Then, based on the classification results, previous segmented prostates of the same patient are used as patient-specific atlases to align onto the current treatment image and the majority voting strategy is finally adopted to segment the prostate. In order to address the limitations of the traditional SRC in pixel-wise classification, especially for the purpose of segmentation, the authors extend SRC from the following four aspects: (1) A discriminant subdictionary learning method is proposed to learn a discriminant and compact representation of training samples for each class so that the discriminant power of SRC can be increased and also SRC can be applied to the large-scale pixel-wise classification. (2) The L1 regularized sparse coding is replaced by the elastic net in

  16. Towards a phylogenetic generic classification of Thelypteridaceae: Additional sampling suggests alterations of neotropical taxa and further study of paleotropical genera.

    PubMed

    Almeida, Thaís Elias; Hennequin, Sabine; Schneider, Harald; Smith, Alan R; Batista, João Aguiar Nogueira; Ramalho, Aline Joseph; Proite, Karina; Salino, Alexandre

    2016-01-01

    Thelypteridaceae is one of the largest fern families, having about 950 species and a cosmopolitan distribution but with most species occurring in tropical and subtropical regions. Its generic classification remains controversial, with different authors recognizing from one up to 32 genera. Phylogenetic relationships within the family have not been exhaustively studied, but previous studies have confirmed the monophyly of the lineage. Thus far, sampling has been inadequate for establishing a robust hypothesis of infrafamilial relationships within the family. In order to understand phylogenetic relationships within Thelypteridaceae and thus to improve generic reclassification, we expand the molecular sampling, including new samples of Old World taxa and, especially, many additional neotropical representatives. We also explore the monophyly of exclusively or mostly neotropical genera Amauropelta, Goniopteris, Meniscium, and Steiropteris. Our sampling includes 68 taxa and 134 newly generated sequences from two plastid genomic regions (rps4-trnS and trnL-trnF), plus 73 rps4 and 72 trnL-trnF sequences from GenBank. These data resulted in a concatenated matrix of 1980 molecular characters for 149 taxa. The combined data set was analyzed using maximum parsimony and bayesian inference of phylogeny. Our results are consistent with the general topological structure found in previous studies, including two main lineages within the family: phegopteroid and thelypteroid. The thelypteroid lineage comprises two clades; one of these included the segregates Metathelypteris, Coryphopteris, and Amauropelta (including part of Parathelypteris), whereas the other comprises all segregates of Cyclosorus s.l., such as Goniopteris, Meniscium, and Steiropteris (including Thelypteris polypodioides, previously incertae sedis). The three mainly neotropical segregates were found to be monophyletic but nested in a broadly defined Cyclosorus. The fourth mainly neotropical segregate, Amauropelta

  17. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.

    PubMed

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  18. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    PubMed Central

    Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159

  19. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    USGS Publications Warehouse

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  20. A multi-gene phylogeny of Chlorophyllum (Agaricaceae, Basidiomycota): new species, new combination and infrageneric classification

    PubMed Central

    Ge, Zai-Wei; Jacobs, Adriaana; Vellinga, Else C.; Sysouphanthong, Phongeun; van der Walt, Retha; Lavorato, Carmine; An, Yi-Feng; Yang, Zhu L.

    2018-01-01

    Abstract Taxonomic and phylogenetic studies of Chlorophyllum were carried out on the basis of morphological differences and molecular phylogenetic analyses. Based on the phylogeny inferred from the internal transcribed spacer (ITS), the partial large subunit nuclear ribosomal DNA (nrLSU), the second largest subunit of RNA polymerase II (rpb2) and translation elongation factor 1-α (tef1) sequences, six well-supported clades and 17 phylogenetic species are recognised. Within this phylogenetic framework and considering the diagnostic morphological characters, two new species, C. africanum and C. palaeotropicum, are described. In addition, a new infrageneric classification of Chlorophyllum is proposed, in which the genus is divided into six sections. One new combination is also made. This study provides a robust basis for a more detailed investigation of diversity and biogeography of Chlorophyllum. PMID:29681738

  1. Phylogenetic placement of two species known only from resting spores: Zoophthora independentia sp. nov. and Z. porteri comb. nov. (Entomophthorales: Entomophthoraceae)

    USDA-ARS?s Scientific Manuscript database

    Molecular methods were used to determine the generic placement of two species of Entomophthorales known only from resting spores. Historically, these species would belong in the form-genus Tarichium, but this classification provides no information about phylogenetic relationships. Using DNA from res...

  2. Cross-validation to select Bayesian hierarchical models in phylogenetics.

    PubMed

    Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C

    2016-05-26

    Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.

  3. Automatic classification of blank substrate defects

    NASA Astrophysics Data System (ADS)

    Boettiger, Tom; Buck, Peter; Paninjath, Sankaranarayanan; Pereira, Mark; Ronald, Rob; Rost, Dan; Samir, Bhamidipati

    2014-10-01

    Mask preparation stages are crucial in mask manufacturing, since this mask is to later act as a template for considerable number of dies on wafer. Defects on the initial blank substrate, and subsequent cleaned and coated substrates, can have a profound impact on the usability of the finished mask. This emphasizes the need for early and accurate identification of blank substrate defects and the risk they pose to the patterned reticle. While Automatic Defect Classification (ADC) is a well-developed technology for inspection and analysis of defects on patterned wafers and masks in the semiconductors industry, ADC for mask blanks is still in the early stages of adoption and development. Calibre ADC is a powerful analysis tool for fast, accurate, consistent and automatic classification of defects on mask blanks. Accurate, automated classification of mask blanks leads to better usability of blanks by enabling defect avoidance technologies during mask writing. Detailed information on blank defects can help to select appropriate job-decks to be written on the mask by defect avoidance tools [1][4][5]. Smart algorithms separate critical defects from the potentially large number of non-critical defects or false defects detected at various stages during mask blank preparation. Mechanisms used by Calibre ADC to identify and characterize defects include defect location and size, signal polarity (dark, bright) in both transmitted and reflected review images, distinguishing defect signals from background noise in defect images. The Calibre ADC engine then uses a decision tree to translate this information into a defect classification code. Using this automated process improves classification accuracy, repeatability and speed, while avoiding the subjectivity of human judgment compared to the alternative of manual defect classification by trained personnel [2]. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at MP Mask

  4. Assessment of available anatomical characters for linking living mammals to fossil taxa in phylogenetic analyses.

    PubMed

    Guillerme, Thomas; Cooper, Natalie

    2016-05-01

    Analyses of living and fossil taxa are crucial for understanding biodiversity through time. The total evidence method allows living and fossil taxa to be combined in phylogenies, using molecular data for living taxa and morphological data for living and fossil taxa. With this method, substantial overlap of coded anatomical characters among living and fossil taxa is vital for accurately inferring topology. However, although molecular data for living species are widely available, scientists generating morphological data mainly focus on fossils. Therefore, there are fewer coded anatomical characters in living taxa, even in well-studied groups such as mammals. We investigated the number of coded anatomical characters available in phylogenetic matrices for living mammals and how these were phylogenetically distributed across orders. Eleven of 28 mammalian orders have less than 25% species with available characters; this has implications for the accurate placement of fossils, although the issue is less pronounced at higher taxonomic levels. In most orders, species with available characters are randomly distributed across the phylogeny, which may reduce the impact of the problem. We suggest that increased morphological data collection efforts for living taxa are needed to produce accurate total evidence phylogenies. © 2016 The Authors.

  5. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE PAGES

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; ...

    2016-12-20

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  6. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  7. Protein classification based on text document classification techniques.

    PubMed

    Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith

    2005-03-01

    The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.

  8. Simultaneous fecal microbial and metabolite profiling enables accurate classification of pediatric irritable bowel syndrome.

    PubMed

    Shankar, Vijay; Reo, Nicholas V; Paliy, Oleg

    2015-12-09

    We previously showed that stool samples of pre-adolescent and adolescent US children diagnosed with diarrhea-predominant IBS (IBS-D) had different compositions of microbiota and metabolites compared to healthy age-matched controls. Here we explored whether observed fecal microbiota and metabolite differences between these two adolescent populations can be used to discriminate between IBS and health. We constructed individual microbiota- and metabolite-based sample classification models based on the partial least squares multivariate analysis and then applied a Bayesian approach to integrate individual models into a single classifier. The resulting combined classification achieved 84 % accuracy of correct sample group assignment and 86 % prediction for IBS-D in cross-validation tests. The performance of the cumulative classification model was further validated by the de novo analysis of stool samples from a small independent IBS-D cohort. High-throughput microbial and metabolite profiling of subject stool samples can be used to facilitate IBS diagnosis.

  9. Teaching Molecular Phylogenetics through Investigating a Real-World Phylogenetic Problem

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2012-01-01

    A phylogenetics exercise is incorporated into the "Introduction to biocomputing" course, a junior-level course at Savannah State University. This exercise is designed to help students learn important concepts and practical skills in molecular phylogenetics through solving a real-world problem. In this application, students are required to identify…

  10. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

    PubMed Central

    2010-01-01

    Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. PMID:21034504

  11. Analysis and application of classification methods of complex carbonate reservoirs

    NASA Astrophysics Data System (ADS)

    Li, Xiongyan; Qin, Ruibao; Ping, Haitao; Wei, Dan; Liu, Xiaomei

    2018-06-01

    There are abundant carbonate reservoirs from the Cenozoic to Mesozoic era in the Middle East. Due to variation in sedimentary environment and diagenetic process of carbonate reservoirs, several porosity types coexist in carbonate reservoirs. As a result, because of the complex lithologies and pore types as well as the impact of microfractures, the pore structure is very complicated. Therefore, it is difficult to accurately calculate the reservoir parameters. In order to accurately evaluate carbonate reservoirs, based on the pore structure evaluation of carbonate reservoirs, the classification methods of carbonate reservoirs are analyzed based on capillary pressure curves and flow units. Based on the capillary pressure curves, although the carbonate reservoirs can be classified, the relationship between porosity and permeability after classification is not ideal. On the basis of the flow units, the high-precision functional relationship between porosity and permeability after classification can be established. Therefore, the carbonate reservoirs can be quantitatively evaluated based on the classification of flow units. In the dolomite reservoirs, the average absolute error of calculated permeability decreases from 15.13 to 7.44 mD. Similarly, the average absolute error of calculated permeability of limestone reservoirs is reduced from 20.33 to 7.37 mD. Only by accurately characterizing pore structures and classifying reservoir types, reservoir parameters could be calculated accurately. Therefore, characterizing pore structures and classifying reservoir types are very important to accurate evaluation of complex carbonate reservoirs in the Middle East.

  12. Nonbinary Tree-Based Phylogenetic Networks.

    PubMed

    Jetten, Laura; van Iersel, Leo

    2018-01-01

    Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can, for example, represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based nonbinary phylogenetic networks. We give simple graph-theoretic characterizations of tree-based and strictly-tree-based nonbinary phylogenetic networks. Moreover, we show for each of these two classes that it can be decided in polynomial time whether a given network is contained in the class. Our approach also provides a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary phylogenetic networks in biology and show how our results can be applied to them.

  13. Treelink: data integration, clustering and visualization of phylogenetic trees.

    PubMed

    Allende, Christian; Sohn, Erik; Little, Cedric

    2015-12-29

    Phylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis. Treelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms. Our software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com .

  14. A Phylogenetic Re-Analysis of Groupers with Applications for Ciguatera Fish Poisoning

    PubMed Central

    Schoelinck, Charlotte; Hinsinger, Damien D.; Dettaï, Agnès; Cruaud, Corinne; Justine, Jean-Lou

    2014-01-01

    Background Ciguatera fish poisoning (CFP) is a significant public health problem due to dinoflagellates. It is responsible for one of the highest reported incidence of seafood-borne illness and Groupers are commonly reported as a source of CFP due to their position in the food chain. With the role of recent climate change on harmful algal blooms, CFP cases might become more frequent and more geographically widespread. Since there is no appropriate treatment for CFP, the most efficient solution is to regulate fish consumption. Such a strategy can only work if the fish sold are correctly identified, and it has been repeatedly shown that misidentifications and species substitutions occur in fish markets. Methods We provide here both a DNA-barcoding reference for groupers, and a new phylogenetic reconstruction based on five genes and a comprehensive taxonomical sampling. We analyse the correlation between geographic range of species and their susceptibility to ciguatera accumulation, and the co-occurrence of ciguatoxins in closely related species, using both character mapping and statistical methods. Results Misidentifications were encountered in public databases, precluding accurate species identifications. Epinephelinae now includes only twelve genera (vs. 15 previously). Comparisons with the ciguatera incidences show that in some genera most species are ciguateric, but statistical tests display only a moderate correlation with the phylogeny. Atlantic species were rarely contaminated, with ciguatera occurrences being restricted to the South Pacific. Conclusions The recent changes in classification based on the reanalyses of the relationships within Epinephelidae have an impact on the interpretation of the ciguatera distribution in the genera. In this context and to improve the monitoring of fish trade and safety, we need to obtain extensive data on contamination at the species level. Accurate species identifications through DNA barcoding are thus an essential tool in

  15. A phylogenetic re-analysis of groupers with applications for ciguatera fish poisoning.

    PubMed

    Schoelinck, Charlotte; Hinsinger, Damien D; Dettaï, Agnès; Cruaud, Corinne; Justine, Jean-Lou

    2014-01-01

    Ciguatera fish poisoning (CFP) is a significant public health problem due to dinoflagellates. It is responsible for one of the highest reported incidence of seafood-borne illness and Groupers are commonly reported as a source of CFP due to their position in the food chain. With the role of recent climate change on harmful algal blooms, CFP cases might become more frequent and more geographically widespread. Since there is no appropriate treatment for CFP, the most efficient solution is to regulate fish consumption. Such a strategy can only work if the fish sold are correctly identified, and it has been repeatedly shown that misidentifications and species substitutions occur in fish markets. We provide here both a DNA-barcoding reference for groupers, and a new phylogenetic reconstruction based on five genes and a comprehensive taxonomical sampling. We analyse the correlation between geographic range of species and their susceptibility to ciguatera accumulation, and the co-occurrence of ciguatoxins in closely related species, using both character mapping and statistical methods. Misidentifications were encountered in public databases, precluding accurate species identifications. Epinephelinae now includes only twelve genera (vs. 15 previously). Comparisons with the ciguatera incidences show that in some genera most species are ciguateric, but statistical tests display only a moderate correlation with the phylogeny. Atlantic species were rarely contaminated, with ciguatera occurrences being restricted to the South Pacific. The recent changes in classification based on the reanalyses of the relationships within Epinephelidae have an impact on the interpretation of the ciguatera distribution in the genera. In this context and to improve the monitoring of fish trade and safety, we need to obtain extensive data on contamination at the species level. Accurate species identifications through DNA barcoding are thus an essential tool in controlling CFP since meal remnants in

  16. Progress, pitfalls and parallel universes: a history of insect phylogenetics

    PubMed Central

    Simon, Chris; Yavorskaya, Margarita; Beutel, Rolf G.

    2016-01-01

    The phylogeny of insects has been both extensively studied and vigorously debated for over a century. A relatively accurate deep phylogeny had been produced by 1904. It was not substantially improved in topology until recently when phylogenomics settled many long-standing controversies. Intervening advances came instead through methodological improvement. Early molecular phylogenetic studies (1985–2005), dominated by a few genes, provided datasets that were too small to resolve controversial phylogenetic problems. Adding to the lack of consensus, this period was characterized by a polarization of philosophies, with individuals belonging to either parsimony or maximum-likelihood camps; each largely ignoring the insights of the other. The result was an unfortunate detour in which the few perceived phylogenetic revolutions published by both sides of the philosophical divide were probably erroneous. The size of datasets has been growing exponentially since the mid-1980s accompanied by a wave of confidence that all relationships will soon be known. However, large datasets create new challenges, and a large number of genes does not guarantee reliable results. If history is a guide, then the quality of conclusions will be determined by an improved understanding of both molecular and morphological evolution, and not simply the number of genes analysed. PMID:27558853

  17. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  18. Free classification of American English dialects by native and non-native listeners

    PubMed Central

    Clopper, Cynthia G.; Bradlow, Ann R.

    2009-01-01

    Most second language acquisition research focuses on linguistic structures, and less research has examined the acquisition of sociolinguistic patterns. The current study explored the perceptual classification of regional dialects of American English by native and non-native listeners using a free classification task. Results revealed similar classification strategies for the native and non-native listeners. However, the native listeners were more accurate overall than the non-native listeners. In addition, the non-native listeners were less able to make use of constellations of cues to accurately classify the talkers by dialect. However, the non-native listeners were able to attend to cues that were either phonologically or sociolinguistically relevant in their native language. These results suggest that non-native listeners can use information in the speech signal to classify talkers by regional dialect, but that their lack of signal-independent cultural knowledge about variation in the second language leads to less accurate classification performance. PMID:20161400

  19. Multiple Sparse Representations Classification

    PubMed Central

    Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik

    2015-01-01

    Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and

  20. A statistical approach to root system classification

    PubMed Central

    Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter

    2013-01-01

    Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for “plant functional type” identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential. PMID:23914200

  1. A statistical approach to root system classification.

    PubMed

    Bodner, Gernot; Leitner, Daniel; Nakhforoosh, Alireza; Sobotik, Monika; Moder, Karl; Kaul, Hans-Peter

    2013-01-01

    Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for "plant functional type" identification in ecology can be applied to the classification of root systems. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. The study demonstrates that principal component based rooting types provide efficient and meaningful multi-trait classifiers. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems) is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Rooting types emerging from measured data, mainly distinguished by diameter/weight and density dominated types. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement techniques are essential.

  2. Effects of stress typicality during speeded grammatical classification.

    PubMed

    Arciuli, Joanne; Cupples, Linda

    2003-01-01

    The experiments reported here were designed to investigate the influence of stress typicality during speeded grammatical classification of disyllabic English words by native and non-native speakers. Trochaic nouns and iambic gram verbs were considered to be typically stressed, whereas iambic nouns and trochaic verbs were considered to be atypically stressed. Experiments 1a and 2a showed that while native speakers classified typically stressed words individual more quickly and more accurately than atypically stressed words during differences reading, there were no overall effects during classification of spoken stimuli. However, a subgroup of native speakers with high error rates did show a significant effect during classification of spoken stimuli. Experiments 1b and 2b showed that non-native speakers classified typically stressed words more quickly and more accurately than atypically stressed words during reading. Typically stressed words were classified more accurately than atypically stressed words when the stimuli were spoken. Importantly, there was a significant relationship between error rates, vocabulary size and the size of the stress typicality effect in each experiment. We conclude that participants use information about lexical stress to help them distinguish between disyllabic nouns and verbs during speeded grammatical classification. This is especially so for individuals with a limited vocabulary who lack other knowledge (e.g., semantic knowledge) about the differences between these grammatical categories.

  3. The Independent Evolution Method Is Not a Viable Phylogenetic Comparative Method

    PubMed Central

    2015-01-01

    Phylogenetic comparative methods (PCMs) use data on species traits and phylogenetic relationships to shed light on evolutionary questions. Recently, Smaers and Vinicius suggested a new PCM, Independent Evolution (IE), which purportedly employs a novel model of evolution based on Felsenstein’s Adaptive Peak Model. The authors found that IE improves upon previous PCMs by producing more accurate estimates of ancestral states, as well as separate estimates of evolutionary rates for each branch of a phylogenetic tree. Here, we document substantial theoretical and computational issues with IE. When data are simulated under a simple Brownian motion model of evolution, IE produces severely biased estimates of ancestral states and changes along individual branches. We show that these branch-specific changes are essentially ancestor-descendant or “directional” contrasts, and draw parallels between IE and previous PCMs such as “minimum evolution”. Additionally, while comparisons of branch-specific changes between variables have been interpreted as reflecting the relative strength of selection on those traits, we demonstrate through simulations that regressing IE estimated branch-specific changes against one another gives a biased estimate of the scaling relationship between these variables, and provides no advantages or insights beyond established PCMs such as phylogenetically independent contrasts. In light of our findings, we discuss the results of previous papers that employed IE. We conclude that Independent Evolution is not a viable PCM, and should not be used in comparative analyses. PMID:26683838

  4. Cyber-infrastructure for Fusarium (CiF): Three integrated platforms supporting strain identification, phylogenetics, comparative genomics, and knowledge sharing

    USDA-ARS?s Scientific Manuscript database

    The fungal genus Fusarium includes many plant and/or animal pathogenic species and produces diverse toxins. Although accurate identification is critical for managing such threats, it is difficult to identify Fusarium morphologically. Fortunately, extensive molecular phylogenetic studies, founded on ...

  5. Phylogeny and classification of Prunus sensu lato (Rosaceae).

    PubMed

    Shi, Shuo; Li, Jinlu; Sun, Jiahui; Yu, Jing; Zhou, Shiliang

    2013-11-01

    The classification of the economically important genus Prunus L. sensu lato (s.l.) is controversial due to the high levels of convergent or the parallel evolution of morphological characters. In the present study, phylogenetic analyses of fifteen main segregates of Prunus s.l. represented by eighty-four species were conducted with maximum parsimony and Bayesian approaches using twelve chloroplast regions (atpB-rbcL, matK, ndhF, psbA-trnH, rbcL, rpL16, rpoC1, rps16, trnS-G, trnL, trnL-F and ycf1) and three nuclear genes (ITS, s6pdh and SbeI) to explore their infrageneric relationships. The results of these analyses were used to develop a new, phylogeny-based classification of Prunus s.l. Our phylogenetic reconstructions resolved three main clades of Prunus s.l. with strong supports. We adopted a broad-sensed genus, Prunus, and recognised three subgenera corresponding to the three main clades: subgenus Padus, subgenus Cerasus and subgenus Prunus. Seven sections of subgenus Prunus were recognised. The dwarf cherries, which were previously assigned to subgenus Cerasus, were included in this subgenus Prunus. One new section name, Prunus L. subgenus Prunus section Persicae (T. T. Yü & L. T. Lu) S. L. Zhou and one new species name, Prunus tianshanica (Pojarkov) S. Shi, were proposed. © 2013 Institute of Botany, Chinese Academy of Sciences.

  6. Progressive Classification Using Support Vector Machines

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri; Kocurek, Michael

    2009-01-01

    An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user

  7. Addition of Histology to the Paris Classification of Pediatric Crohn Disease Alters Classification of Disease Location.

    PubMed

    Fernandes, Melissa A; Verstraete, Sofia G; Garnett, Elizabeth A; Heyman, Melvin B

    2016-02-01

    The aim of the study was to investigate the value of microscopic findings in the classification of pediatric Crohn disease (CD) by determining whether classification of disease changes significantly with inclusion of histologic findings. Sixty patients were randomly selected from a cohort of patients studied at the Pediatric Inflammatory Bowel Disease Clinic at the University of California, San Francisco Benioff Children's Hospital. Two physicians independently reviewed the electronic health records of the included patients to determine the Paris classification for each patient by adhering to present guidelines and then by including microscopic findings. Macroscopic and combined disease location classifications were discordant in 34 (56.6%), with no statistically significant differences between groups. Interobserver agreement was higher in the combined classification (κ = 0.73, 95% confidence interval 0.65-0.82) as opposed to when classification was limited to macroscopic findings (κ = 0.53, 95% confidence interval 0.40-0.58). When evaluating the proximal upper gastrointestinal tract (Paris L4a), the interobserver agreement was better in macroscopic compared with the combined classification. Disease extent classifications differed significantly when comparing isolated macroscopic findings (Paris classification) with the combined scheme that included microscopy. Further studies are needed to determine which scheme provides more accurate representation of disease extent.

  8. Land use/cover classification in the Brazilian Amazon using satellite images.

    PubMed

    Lu, Dengsheng; Batistella, Mateus; Li, Guiying; Moran, Emilio; Hetrick, Scott; Freitas, Corina da Costa; Dutra, Luciano Vieira; Sant'anna, Sidnei João Siqueira

    2012-09-01

    Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation-based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi-resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Of the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, has the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical-based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data.

  9. Land use/cover classification in the Brazilian Amazon using satellite images

    PubMed Central

    Lu, Dengsheng; Batistella, Mateus; Li, Guiying; Moran, Emilio; Hetrick, Scott; Freitas, Corina da Costa; Dutra, Luciano Vieira; Sant’Anna, Sidnei João Siqueira

    2013-01-01

    Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation-based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi-resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Of the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, has the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical-based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data. PMID:24353353

  10. Phylogenetic Relationships of American Willows (Salix L., Salicaceae)

    PubMed Central

    Lauron-Moreau, Aurélien; Pitre, Frédéric E.; Argus, George W.; Labrecque, Michel; Brouillet, Luc

    2015-01-01

    Salix L. is the largest genus in the family Salicaceae (450 species). Several classifications have been published, but taxonomic subdivision has been under continuous revision. Our goal is to establish the phylogenetic structure of the genus using molecular data on all American willows, using three DNA markers. This complete phylogeny of American willows allows us to propose a biogeographic framework for the evolution of the genus. Material was obtained for the 122 native and introduced willow species of America. Sequences were obtained from the ITS (ribosomal nuclear DNA) and two plastid regions, matK and rbcL. Phylogenetic analyses (parsimony, maximum likelihood, Bayesian inference) were performed on the data. Geographic distribution was mapped onto the tree. The species tree provides strong support for a division of the genus into two subgenera, Salix and Vetrix. Subgenus Salix comprises temperate species from the Americas and Asia, and their disjunction may result from Tertiary events. Subgenus Vetrix is composed of boreo-arctic species of the Northern Hemisphere and their radiation may coincide with the Quaternary glaciations. Sixteen species have ambiguous positions; genetic diversity is lower in subg. Vetrix. A molecular phylogeny of all species of American willows has been inferred. It needs to be tested and further resolved using other molecular data. Nonetheless, the genus clearly has two clades that have distinct biogeographic patterns. PMID:25880993

  11. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  12. Classifying the bacterial gut microbiota of termites and cockroaches: A curated phylogenetic reference database (DictDb).

    PubMed

    Mikaelyan, Aram; Köhler, Tim; Lampert, Niclas; Rohland, Jeffrey; Boga, Hamadi; Meuser, Katja; Brune, Andreas

    2015-10-01

    Recent developments in sequencing technology have given rise to a large number of studies that assess bacterial diversity and community structure in termite and cockroach guts based on large amplicon libraries of 16S rRNA genes. Although these studies have revealed important ecological and evolutionary patterns in the gut microbiota, classification of the short sequence reads is limited by the taxonomic depth and resolution of the reference databases used in the respective studies. Here, we present a curated reference database for accurate taxonomic analysis of the bacterial gut microbiota of dictyopteran insects. The Dictyopteran gut microbiota reference Database (DictDb) is based on the Silva database but was significantly expanded by the addition of clones from 11 mostly unexplored termite and cockroach groups, which increased the inventory of bacterial sequences from dictyopteran guts by 26%. The taxonomic depth and resolution of DictDb was significantly improved by a general revision of the taxonomic guide tree for all important lineages, including a detailed phylogenetic analysis of the Treponema and Alistipes complexes, the Fibrobacteres, and the TG3 phylum. The performance of this first documented version of DictDb (v. 3.0) using the revised taxonomic guide tree in the classification of short-read libraries obtained from termites and cockroaches was highly superior to that of the current Silva and RDP databases. DictDb uses an informative nomenclature that is consistent with the literature also for clades of uncultured bacteria and provides an invaluable tool for anyone exploring the gut community structure of termites and cockroaches. Copyright © 2015 Elsevier GmbH. All rights reserved.

  13. Transforming phylogenetic networks: Moving beyond tree space.

    PubMed

    Huber, Katharina T; Moulton, Vincent; Wu, Taoyang

    2016-09-07

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice

    PubMed Central

    Shavit Grievink, Liat; Penny, David; Holland, Barbara R.

    2013-01-01

    Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this. PMID:23471508

  15. The New Higher Level Classification of Eukaryotes with Emphasis on the Taxonomy of Protists

    Treesearch

    SINA M. ADL; ALASTAIR G. B. SIMPSON; MARK A. FARMER; ROBERT A. ANDERSEN; O. ROGER ANDERSON; JOHN R. BARTA; SAMUEL S. BOWSER; GUY BRUGEROLLE; ROBERT A. FENSOME; SUZANNE FREDERICQ; TIMOTHY Y. JAMES; SERGEI KARPOV; PAUL KUGRENS; JOHN KRUG; CHRISTOPHER E. LANE; LOUISE A. LEWIS; JEAN LODGE; DENIS H. LYNN; DAVID G. MANN; RICHARD M. MCCOURT; LEONEL MENDOZA; ØJVIND MOESTRUP; SHARON E. MOZLEY-STANDRIDGE; THOMAS A. NERAD; CAROL A. SHEARER; ALEXEY V. SMIRNOV; FREDERICK W. SPIEGEL; MAX F.J.R. TAYLOR

    2005-01-01

    This revision of the classification of unicellular eukaryotes updates that of Levine et al. (1980) for the protozoa and expands it to include other protists. Whereas the previous revision was primarily to incorporate the results of ultrastructural studies, this revision incorporates results from both ultrastructural research since 1980 and molecular phylogenetic...

  16. The new higher level classification of eukaryotes with emphasis on the taxonomy of protists

    Treesearch

    Sina M. Adl; Alastair G.B. Simpson; Mark A. Farmer; Robert A. Andersen; O. Roger Anderson; John R. Barta; Samuel S. Bowser; Guy Brugerolle; Robert A. Fensome; Suzanne Fredericq; Timothy Y. James; Sergei Karpov; Paul Kugrens; John Krug; Christopher E. Lane; Louise A. Lewis; Jean Lodge; Denis H. Lynn; David G. Mann; Richard M. McCourt; Leonel Mendoza; Ojvind Moestrup; Sharon E. Mozley-Standridge; Thomas A. Nerad; Carol A. Shearer; Alexey V. Smirnov; Frederick W. Speigel; Max F.J.R. Taylor

    2005-01-01

    This revision of the classification of unicellular eukaryotes updates that of Levine et al. (1980) for the protozoa and expands it to include other protists. Whereas the previous revision was primarily to incorporate the results of ultrastructural studies, this revision incorporates results from both ultrastructural research since 1980 and molecular phylogenetic...

  17. Phylogenetic effective sample size.

    PubMed

    Bartoszek, Krzysztof

    2016-10-21

    In this paper I address the question-how large is a phylogenetic sample? I propose a definition of a phylogenetic effective sample size for Brownian motion and Ornstein-Uhlenbeck processes-the regression effective sample size. I discuss how mutual information can be used to define an effective sample size in the non-normal process case and compare these two definitions to an already present concept of effective sample size (the mean effective sample size). Through a simulation study I find that the AICc is robust if one corrects for the number of species or effective number of species. Lastly I discuss how the concept of the phylogenetic effective sample size can be useful for biodiversity quantification, identification of interesting clades and deciding on the importance of phylogenetic correlations. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. A Format for Phylogenetic Placements

    PubMed Central

    Matsen, Frederick A.; Hoffman, Noah G.; Gallagher, Aaron; Stamatakis, Alexandros

    2012-01-01

    We have developed a unified format for phylogenetic placements, that is, mappings of environmental sequence data (e.g., short reads) into a phylogenetic tree. We are motivated to do so by the growing number of tools for computing and post-processing phylogenetic placements, and the lack of an established standard for storing them. The format is lightweight, versatile, extensible, and is based on the JSON format, which can be parsed by most modern programming languages. Our format is already implemented in several tools for computing and post-processing parsimony- and likelihood-based phylogenetic placements and has worked well in practice. We believe that establishing a standard format for analyzing read placements at this early stage will lead to a more efficient development of powerful and portable post-analysis tools for the growing applications of phylogenetic placement. PMID:22383988

  19. A format for phylogenetic placements.

    PubMed

    Matsen, Frederick A; Hoffman, Noah G; Gallagher, Aaron; Stamatakis, Alexandros

    2012-01-01

    We have developed a unified format for phylogenetic placements, that is, mappings of environmental sequence data (e.g., short reads) into a phylogenetic tree. We are motivated to do so by the growing number of tools for computing and post-processing phylogenetic placements, and the lack of an established standard for storing them. The format is lightweight, versatile, extensible, and is based on the JSON format, which can be parsed by most modern programming languages. Our format is already implemented in several tools for computing and post-processing parsimony- and likelihood-based phylogenetic placements and has worked well in practice. We believe that establishing a standard format for analyzing read placements at this early stage will lead to a more efficient development of powerful and portable post-analysis tools for the growing applications of phylogenetic placement.

  20. Phylogenetic inertia and Darwin's higher law.

    PubMed

    Shanahan, Timothy

    2011-03-01

    The concept of 'phylogenetic inertia' is routinely deployed in evolutionary biology as an alternative to natural selection for explaining the persistence of characteristics that appear sub-optimal from an adaptationist perspective. However, in many of these contexts the precise meaning of 'phylogenetic inertia' and its relationship to selection are far from clear. After tracing the history of the concept of 'inertia' in evolutionary biology, I argue that treating phylogenetic inertia and natural selection as alternative explanations is mistaken because phylogenetic inertia is, from a Darwinian point of view, simply an expected effect of selection. Although Darwin did not discuss 'phylogenetic inertia,' he did assert the explanatory priority of selection over descent. An analysis of 'phylogenetic inertia' provides a perspective from which to assess Darwin's view. Copyright © 2010 Elsevier Ltd. All rights reserved.

  1. A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus.

    PubMed

    Tan, Joon Liang; Khang, Tsung Fei; Ngeow, Yun Fong; Choo, Siew Woh

    2013-12-13

    Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification. A data mining approach was used to rank and select informative genes based on the relative entropy metric for the construction of a phylogenetic tree. The resulting tree topology was similar to that generated using the concatenation of five classical housekeeping genes: rpoB, hsp65, secA, recA and sodA. Additional support for the reliability of the subspecies classification came from the analysis of erm41 and ITS gene sequences, single nucleotide polymorphisms (SNPs)-based classification and strain clustering demonstrated by a variable number tandem repeat (VNTR) assay and a multilocus sequence analysis (MLSA). We subsequently found that the concatenation of a minimal set of three median-ranked genes: DNA polymerase III subunit alpha (polC), 4-hydroxy-2-ketovalerate aldolase (Hoa) and cell division protein FtsZ (ftsZ), is sufficient to recover the same tree topology. PCR assays designed specifically for these genes showed that all three genes could be amplified in the reference strain of M. abscessus ATCC 19977T. This study provides proof of concept that whole-genome sequence-based data mining approach can provide confirmatory evidence of the phylogenetic informativeness of existing markers, as well as lead to the discovery of a more economical and informative set of markers that produces similar subspecies classification in M. abscessus. The systematic procedure used in this study to choose the informative minimal set of gene markers can potentially be applied to species or subspecies classification of other bacteria.

  2. Classification of male lower torso for underwear design

    NASA Astrophysics Data System (ADS)

    Cheng, Z.; Kuzmichev, V. E.

    2017-10-01

    By means of scanning technology we have got new information about the morphology of male bodies and have redistricted the classification of men’s underwear by adopting one to consumer demands. To build the new classification in accordance with male body characteristic factors of lower torso, we make the method of underwear designing which allow to get the accurate and convenience for consumers products.

  3. Sampling strategies for improving tree accuracy and phylogenetic analyses: a case study in ciliate protists, with notes on the genus Paramecium.

    PubMed

    Yi, Zhenzhen; Strüder-Kypke, Michaela; Hu, Xiaozhong; Lin, Xiaofeng; Song, Weibo

    2014-02-01

    In order to assess how dataset-selection for multi-gene analyses affects the accuracy of inferred phylogenetic trees in ciliates, we chose five genes and the genus Paramecium, one of the most widely used model protist genera, and compared tree topologies of the single- and multi-gene analyses. Our empirical study shows that: (1) Using multiple genes improves phylogenetic accuracy, even when their one-gene topologies are in conflict with each other. (2) The impact of missing data on phylogenetic accuracy is ambiguous: resolution power and topological similarity, but not number of represented taxa, are the most important criteria of a dataset for inclusion in concatenated analyses. (3) As an example, we tested the three classification models of the genus Paramecium with a multi-gene based approach, and only the monophyly of the subgenus Paramecium is supported. Copyright © 2013 Elsevier Inc. All rights reserved.

  4. Archaeal-eubacterial mergers in the origin of Eukarya: phylogenetic classification of life

    NASA Technical Reports Server (NTRS)

    Margulis, L.

    1996-01-01

    A symbiosis-based phylogeny leads to a consistent, useful classification system for all life. "Kingdoms" and "Domains" are replaced by biological names for the most inclusive taxa: Prokarya (bacteria) and Eukarya (symbiosis-derived nucleated organisms). The earliest Eukarya, anaerobic mastigotes, hypothetically originated from permanent whole-cell fusion between members of Archaea (e.g., Thermoplasma-like organisms) and of Eubacteria (e.g., Spirochaeta-like organisms). Molecular biology, life-history, and fossil record evidence support the reunification of bacteria as Prokarya while subdividing Eukarya into uniquely defined subtaxa: Protoctista, Animalia, Fungi, and Plantae.

  5. Molecular Phylogenetics: Concepts for a Newcomer.

    PubMed

    Ajawatanawong, Pravech

    Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.

  6. Phylogenetic and environmental diversity of DsrAB-type dissimilatory (bi)sulfite reductases

    PubMed Central

    Müller, Albert Leopold; Kjeldsen, Kasper Urup; Rattei, Thomas; Pester, Michael; Loy, Alexander

    2015-01-01

    The energy metabolism of essential microbial guilds in the biogeochemical sulfur cycle is based on a DsrAB-type dissimilatory (bi)sulfite reductase that either catalyzes the reduction of sulfite to sulfide during anaerobic respiration of sulfate, sulfite and organosulfonates, or acts in reverse during sulfur oxidation. Common use of dsrAB as a functional marker showed that dsrAB richness in many environments is dominated by novel sequence variants and collectively represents an extensive, largely uncharted sequence assemblage. Here, we established a comprehensive, manually curated dsrAB/DsrAB database and used it to categorize the known dsrAB diversity, reanalyze the evolutionary history of dsrAB and evaluate the coverage of published dsrAB-targeted primers. Based on a DsrAB consensus phylogeny, we introduce an operational classification system for environmental dsrAB sequences that integrates established taxonomic groups with operational taxonomic units (OTUs) at multiple phylogenetic levels, ranging from DsrAB enzyme families that reflect reductive or oxidative DsrAB types of bacterial or archaeal origin, superclusters, uncultured family-level lineages to species-level OTUs. Environmental dsrAB sequences constituted at least 13 stable family-level lineages without any cultivated representatives, suggesting that major taxa of sulfite/sulfate-reducing microorganisms have not yet been identified. Three of these uncultured lineages occur mainly in marine environments, while specific habitat preferences are not evident for members of the other 10 uncultured lineages. In summary, our publically available dsrAB/DsrAB database, the phylogenetic framework, the multilevel classification system and a set of recommended primers provide a necessary foundation for large-scale dsrAB ecology studies with next-generation sequencing methods. PMID:25343514

  7. Morphological reassessment and molecular phylogenetic analyses of Amauroderma s.lat. raised new perspectives in the generic classification of the Ganodermataceae family.

    PubMed

    Costa-Rezende, D H; Robledo, G L; Góes-Neto, A; Reck, M A; Crespo, E; Drechsler-Santos, E R

    2017-12-01

    Ganodermataceae is a remarkable group of polypore fungi, mainly characterized by particular double-walled basidiospores with a coloured endosporium ornamented with columns or crests, and a hyaline smooth exosporium. In order to establish an integrative morphological and molecular phylogenetic approach to clarify relationship of Neotropical Amauroderma s.lat. within the Ganodermataceae family, morphological analyses, including scanning electron microscopy, as well as a molecular phylogenetic approach based on one (ITS) and four loci (ITS-5.8S, LSU, TEF-1α and RPB1 ), were carried out. Ultrastructural analyses raised up a new character for Ganodermataceae systematics, i.e . , the presence of perforation in the exosporium with holes that are connected with hollow columns of the endosporium. This character is considered as a synapomorphy in Foraminispora , a new genus proposed here to accommodate Porothelium rugosum (≡ Amauroderma sprucei ). Furtadoa is proposed to accommodate species with monomitic context: F. biseptata, F. brasiliensis and F. corneri . Molecular phylogenetic analyses confirm that both genera grouped as strongly supported distinct lineages out of the Amauroderma s.str. clade.

  8. Classification of spatially unresolved objects

    NASA Technical Reports Server (NTRS)

    Nalepka, R. F.; Horwitz, H. M.; Hyde, P. D.; Morgenstern, J. P.

    1972-01-01

    A proportion estimation technique for classification of multispectral scanner images is reported that uses data point averaging to extract and compute estimated proportions for a single average data point to classify spatial unresolved areas. Example extraction calculations of spectral signatures for bare soil, weeds, alfalfa, and barley prove quite accurate.

  9. Phylogenetic analyses of the genus Aeromonas based on housekeeping gene sequencing and its influence on systematics.

    PubMed

    Navarro, Aaron; Martínez-Murcia, Antonio

    2018-04-19

    The phylogenies derived from housekeeping gene sequence alignments, although mere evolutionary hypotheses, have increased our knowledge about the Aeromonas genetic diversity, providing a robust species delineation framework invaluable for reliable, easy and fast species identification. Previous classifications of Aeromonas, have been fully surpassed by recently developed phylogenetic (natural) classification obtained from the analysis of so-called "molecular chronometers". Despite ribosomal RNAs cannot split all known Aeromonas species, the conserved nature of 16S rRNA offers reliable alignments containing mosaics of sequence signatures which may serve as targets of genus-specific oligonucleotides for subsequent identification/detection tests in samples without culturing. On the contrary, some housekeeping genes coding for proteins show a much better chronometric capacity to discriminate highly related strains. Although both, species and loci, do not all evolve at exactly the same rate, published Aeromonas phylogenies were congruent to each other, indicating that, phylogenetic markers are synchronized and a concatenated multi-gene phylogeny, may be "the mirror" of the entire genomic relationships. Thanks to MLPA approaches, the discovery of new Aeromonas species and strains of rarely isolated species is today more frequent and, consequently, should be extensively promoted for isolate screening and species identification. Although, accumulated data still should be carefully catalogued to inherit a reliable database. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  10. Encoding phylogenetic trees in terms of weighted quartets.

    PubMed

    Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles

    2008-04-01

    One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.

  11. Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

    PubMed

    Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

    2015-05-01

    Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  13. Comprehensive phylogeny, biogeography and new classification of the diverse bee tribe Megachilini: Can we use DNA barcodes in phylogenies of large genera?

    PubMed

    Trunz, V; Packer, L; Vieu, J; Arrigo, N; Praz, C J

    2016-10-01

    Classification and evolutionary studies of particularly speciose clades pose important challenges, as phylogenetic analyses typically sample a small proportion of the existing diversity. We examine here one of the largest bee genera, the genus Megachile - the dauber and leafcutting bees. Besides presenting a phylogeny based on five nuclear genes (5480 aligned nucleotide positions), we attempt to use the phylogenetic signal of mitochondrial DNA barcodes, which are rapidly accumulating and already include a substantial proportion of the known species diversity in the genus. We used barcodes in two ways: first, to identify particularly divergent lineages and thus to guide taxon sampling in our nuclear phylogeny; second, to augment taxon sampling by combining nuclear markers (as backbone for ancient divergences) with DNA barcodes. Our results indicate that DNA barcodes bear phylogenetic signal limited to very recent divergences (3-4 my before present). Sampling within clades of very closely related species may be augmented using this technique, but our results also suggest statistically supported, but incongruent placements of some taxa. However, the addition of one single nuclear gene (LW-rhodopsin) to the DNA barcode data was enough to recover meaningful placement with high clade support values for nodes up to 15 million years old. We discuss different proposals for the generic classification of the tribe Megachilini. Finding a classification that is both in agreement with our phylogenetic hypotheses and practical in terms of diagnosability is particularly challenging as our analyses recover several well-supported clades that include morphologically heterogeneous lineages. We favour a classification that recognizes seven morphologically well-delimited genera in Megachilini: Coelioxys, Gronoceras, Heriadopsis, Matangapis, Megachile, Noteriades and Radoszkowskiana. Our results also lead to the following classification changes: the groups known as Dinavis, Neglectella

  14. Phylogenetic analysis of two Plectus mitochondrial genomes (Nematoda: Plectida) supports a sister group relationship between Plectida and Rhabditida within Chromadorea.

    PubMed

    Kim, Jiyeon; Kern, Elizabeth; Kim, Taeho; Sim, Mikang; Kim, Jaebum; Kim, Yuseob; Park, Chungoo; Nadler, Steven A; Park, Joong-Ki

    2017-02-01

    Plectida is an important nematode order with species that occupy many different biological niches. The order includes free-living aquatic and soil-dwelling species, but its phylogenetic position has remained uncertain. We sequenced the complete mitochondrial genomes of two members of this order, Plectus acuminatus and Plectus aquatilis and compared them with those of other major nematode clades. The genome size and base composition of these species are similar to other nematodes; 14,831 and 14,372bp, respectively, with AT contents of 71.0% and 70.1%. Gene content was also similar to other nematodes, but gene order and coding direction of Plectus mtDNAs were dissimilar from other chromadorean species. P. acuminatus and P. aquatilis are the first chromadorean species found to contain a gene inversion. We reconstructed mitochondrial genome phylogenetic trees using nucleotide and amino acid datasets from 87 nematodes that represent major nematode clades, including the Plectus sequences. Trees from phylogenetic analyses using maximum likelihood and Bayesian methods depicted Plectida as the sister group to other sequenced chromadorean nematodes. This finding is consistent with several phylogenetic results based on SSU rDNA, but disagrees with a classification based on morphology. Mitogenomes representing other basal chromadorean groups (Araeolaimida, Monhysterida, Desmodorida, Chromadorida) are needed to confirm their phylogenetic relationships. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Classification of Radiological Changes in Burst Fractures

    PubMed Central

    Şentürk, Salim; Öğrenci, Ahmet; Gürçay, Ahmet Gürhan; Abdioğlu, Ahmet Atilla; Yaman, Onur; Özer, Ali Fahir

    2018-01-01

    AIM: Burst fractures can occur with different radiological images after high energy. We aimed to simplify radiological staging of burst fractures. METHODS: Eighty patients whom exposed spinal trauma and had burst fracture were evaluated concerning age, sex, fracture segment, neurological deficit, secondary organ injury and radiological changes that occurred. RESULTS: We performed a new classification in burst fractures at radiological images. CONCLUSIONS: According to this classification system, secondary organ injury and neurological deficit can be an indicator of energy exposure. If energy is high, the clinical status will be worse. Thus, we can get an idea about the likelihood of neurological deficit and secondary organ injuries. This classification has simplified the radiological staging of burst fractures and is a classification that gives a very accurate idea about the neurological condition. PMID:29531604

  16. Tree-Based Unrooted Phylogenetic Networks.

    PubMed

    Francis, A; Huber, K T; Moulton, V

    2018-02-01

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent non-tree-like evolutionary histories that arise in organisms such as plants and bacteria, or uncertainty in evolutionary histories. An unrooted phylogenetic network on a non-empty, finite set X of taxa, or network, is a connected, simple graph in which every vertex has degree 1 or 3 and whose leaf set is X. It is called a phylogenetic tree if the underlying graph is a tree. In this paper we consider properties of tree-based networks, that is, networks that can be constructed by adding edges into a phylogenetic tree. We show that although they have some properties in common with their rooted analogues which have recently drawn much attention in the literature, they have some striking differences in terms of both their structural and computational properties. We expect that our results could eventually have applications to, for example, detecting horizontal gene transfer or hybridization which are important factors in the evolution of many organisms.

  17. Algorithms for Hyperspectral Endmember Extraction and Signature Classification with Morphological Dendritic Networks

    NASA Astrophysics Data System (ADS)

    Schmalz, M.; Ritter, G.

    Accurate multispectral or hyperspectral signature classification is key to the nonimaging detection and recognition of space objects. Additionally, signature classification accuracy depends on accurate spectral endmember determination [1]. Previous approaches to endmember computation and signature classification were based on linear operators or neural networks (NNs) expressed in terms of the algebra (R, +, x) [1,2]. Unfortunately, class separation in these methods tends to be suboptimal, and the number of signatures that can be accurately classified often depends linearly on the number of NN inputs. This can lead to poor endmember distinction, as well as potentially significant classification errors in the presence of noise or densely interleaved signatures. In contrast to traditional CNNs, autoassociative morphological memories (AMM) are a construct similar to Hopfield autoassociatived memories defined on the (R, +, ?,?) lattice algebra [3]. Unlimited storage and perfect recall of noiseless real valued patterns has been proven for AMMs [4]. However, AMMs suffer from sensitivity to specific noise models, that can be characterized as erosive and dilative noise. On the other hand, the prior definition of a set of endmembers corresponds to material spectra lying on vertices of the minimum convex region covering the image data. These vertices can be characterized as morphologically independent patterns. It has further been shown that AMMs can be based on dendritic computation [3,6]. These techniques yield improved accuracy and class segmentation/separation ability in the presence of highly interleaved signature data. In this paper, we present a procedure for endmember determination based on AMM noise sensitivity, which employs morphological dendritic computation. We show that detected endmembers can be exploited by AMM based classification techniques, to achieve accurate signature classification in the presence of noise, closely spaced or interleaved signatures, and

  18. Biomarker selection and classification of "-omics" data using a two-step bayes classification framework.

    PubMed

    Assawamakin, Anunchai; Prueksaaroon, Supakit; Kulawonganunchai, Supasak; Shaw, Philip James; Varavithya, Vara; Ruangrajitpakorn, Taneth; Tongsima, Sissades

    2013-01-01

    Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omics datasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray), and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.

  19. Algorithmic Classification of Five Characteristic Types of Paraphasias.

    PubMed

    Fergadiotis, Gerasimos; Gorman, Kyle; Bedrick, Steven

    2016-12-01

    This study was intended to evaluate a series of algorithms developed to perform automatic classification of paraphasic errors (formal, semantic, mixed, neologistic, and unrelated errors). We analyzed 7,111 paraphasias from the Moss Aphasia Psycholinguistics Project Database (Mirman et al., 2010) and evaluated the classification accuracy of 3 automated tools. First, we used frequency norms from the SUBTLEXus database (Brysbaert & New, 2009) to differentiate nonword errors and real-word productions. Then we implemented a phonological-similarity algorithm to identify phonologically related real-word errors. Last, we assessed the performance of a semantic-similarity criterion that was based on word2vec (Mikolov, Yih, & Zweig, 2013). Overall, the algorithmic classification replicated human scoring for the major categories of paraphasias studied with high accuracy. The tool that was based on the SUBTLEXus frequency norms was more than 97% accurate in making lexicality judgments. The phonological-similarity criterion was approximately 91% accurate, and the overall classification accuracy of the semantic classifier ranged from 86% to 90%. Overall, the results highlight the potential of tools from the field of natural language processing for the development of highly reliable, cost-effective diagnostic tools suitable for collecting high-quality measurement data for research and clinical purposes.

  20. Molecular Phylogeny of the Widely Distributed Marine Protists, Phaeodaria (Rhizaria, Cercozoa).

    PubMed

    Nakamura, Yasuhide; Imai, Ichiro; Yamaguchi, Atsushi; Tuji, Akihiro; Not, Fabrice; Suzuki, Noritoshi

    2015-07-01

    Phaeodarians are a group of widely distributed marine cercozoans. These plankton organisms can exhibit a large biomass in the environment and are supposed to play an important role in marine ecosystems and in material cycles in the ocean. Accurate knowledge of phaeodarian classification is thus necessary to better understand marine biology, however, phylogenetic information on Phaeodaria is limited. The present study analyzed 18S rDNA sequences encompassing all existing phaeodarian orders, to clarify their phylogenetic relationships and improve their taxonomic classification. The monophyly of Phaeodaria was confirmed and strongly supported by phylogenetic analysis with a larger data set than in previous studies. The phaeodarian clade contained 11 subclades which generally did not correspond to the families and orders of the current classification system. Two families (Challengeriidae and Aulosphaeridae) and two orders (Phaeogromida and Phaeocalpida) are possibly polyphyletic or paraphyletic, and consequently the classification needs to be revised at both the family and order levels by integrative taxonomy approaches. Two morphological criteria, 1) the scleracoma type and 2) its surface structure, could be useful markers at the family level. Copyright © 2015 Elsevier GmbH. All rights reserved.

  1. A molecular phylogenetic investigation of bakuella, anteholosticha, and caudiholosticha (protista, ciliophora, hypotrichia) based on three-gene sequences.

    PubMed

    Lv, Zhao; Shao, Chen; Yi, Zhenzhen; Warren, Alan

    2015-01-01

    Traditionally classifications of the Urostyloida have been mainly based on morphology and morphogenesis. Recent molecular phylogenetic analyses have been largely based on single-gene data for a limited number of taxa. Consequently, incongruence has arisen between the morphological/morphogenetic and the molecular data. In this study, the three phylogenetic markers (SSU rDNA, ITS1-5.8S-ITS2 region, and LSU-rDNA) of three urostyloid genera represented by four species (Bakuella granulifera, Anteholosticha monilata, Caudiholosticha sylvatica, and C. tetracirra) were sequenced to investigate their phylogeny. The results show that: (1) all three genera should be regarded as the members of the order Urostyloida within the subclass Hypotrichia, as indicated by morphological characters; (2) phylogenetic analyses and sequence similarities both indicate that neither Anteholosticha nor Caudiholosticha are monophyletic and the systematic assignment of both genera awaits further evaluation; and (3) Bakuella has a closer relationship with Urostyla than with bakuellids (e.g. Apobakuella and Metaurostylopsis), suggesting Bakuella may belong to the family Urostylidae rather than the family Bakuellidae. © 2014 The Author(s) Journal of Eukaryotic Microbiology © 2014 International Society of Protistologists.

  2. Nodal distances for rooted phylogenetic trees.

    PubMed

    Cardona, Gabriel; Llabrés, Mercè; Rosselló, Francesc; Valiente, Gabriel

    2010-08-01

    Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces M(n)(R) of real-valued n x n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L(p) metrics on M(n)(R), with p [epsilon] R(>0).

  3. Phylogenetically resolving epidemiologic linkage

    PubMed Central

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-01-01

    Although the use of phylogenetic trees in epidemiological investigations has become commonplace, their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the true transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. We confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results. PMID:26903617

  4. Phylogenetic diversity and biogeography of the Mamiellophyceae lineage of eukaryotic phytoplankton across the oceans.

    PubMed

    Monier, Adam; Worden, Alexandra Z; Richards, Thomas A

    2016-08-01

    High-throughput diversity amplicon sequencing of marine microbial samples has revealed that members of the Mamiellophyceae lineage are successful phytoplankton in many oceanic habitats. Indeed, these eukaryotic green algae can dominate the picoplanktonic biomass, however, given the broad expanses of the oceans, their geographical distributions and the phylogenetic diversity of some groups remain poorly characterized. As these algae play a foundational role in marine food webs, it is crucial to assess their global distribution in order to better predict potential changes in abundance and community structure. To this end, we analyzed the V9-18S small subunit rDNA sequences deposited from the Tara Oceans expedition to evaluate the diversity and biogeography of these phytoplankton. Our results show that the phylogenetic composition of Mamiellophyceae communities is in part determined by geographical provenance, and do not appear to be influenced - in the samples recovered - by water depth, at least at the resolution possible with the V9-18S. Phylogenetic classification of Mamiellophyceae sequences revealed that the Dolichomastigales order encompasses more sequence diversity than other orders in this lineage. These results indicate that a large fraction of the Mamiellophyceae diversity has been hitherto overlooked, likely because of a combination of size fraction, sequencing and geographical limitations. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.

  5. Species trees for the tree swallows (Genus Tachycineta): an alternative phylogenetic hypothesis to the mitochondrial gene tree.

    PubMed

    Dor, Roi; Carling, Matthew D; Lovette, Irby J; Sheldon, Frederick H; Winkler, David W

    2012-10-01

    The New World swallow genus Tachycineta comprises nine species that collectively have a wide geographic distribution and remarkable variation both within- and among-species in ecologically important traits. Existing phylogenetic hypotheses for Tachycineta are based on mitochondrial DNA sequences, thus they provide estimates of a single gene tree. In this study we sequenced multiple individuals from each species at 16 nuclear intron loci. We used gene concatenated approaches (Bayesian and maximum likelihood) as well as coalescent-based species tree inference to reconstruct phylogenetic relationships of the genus. We examined the concordance and conflict between the nuclear and mitochondrial trees and between concatenated and coalescent-based inferences. Our results provide an alternative phylogenetic hypothesis to the existing mitochondrial DNA estimate of phylogeny. This new hypothesis provides a more accurate framework in which to explore trait evolution and examine the evolution of the mitochondrial genome in this group. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. A machine learning approach for viral genome classification.

    PubMed

    Remita, Mohamed Amine; Halioui, Ahmed; Malick Diouara, Abou Abdallah; Daigle, Bruno; Kiani, Golrokh; Diallo, Abdoulaye Baniré

    2017-04-11

    Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The classification and annotation of these genomes constitute important assets in the discovery of genomic variability, taxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific well-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and accurate tools for classifying and typing newly sequenced strains of diverse virus families. Here, we introduce a virus classification platform, CASTOR, based on machine learning methods. CASTOR is inspired by a well-known technique in molecular biology: restriction fragment length polymorphism (RFLP). It simulates, in silico, the restriction digestion of genomic material by different enzymes into fragments. It uses two metrics to construct feature vectors for machine learning algorithms in the classification step. We benchmark CASTOR for the classification of distinct datasets of human papillomaviruses (HPV), hepatitis B viruses (HBV) and human immunodeficiency viruses type 1 (HIV-1). Results reveal true positive rates of 99%, 99% and 98% for HPV Alpha species, HBV genotyping and HIV-1 M subtyping, respectively. Furthermore, CASTOR shows a competitive performance compared to well-known HIV-1 specific classifiers (REGA and COMET) on whole genomes and pol fragments. The performance of CASTOR, its genericity and robustness could permit to perform novel and accurate large scale virus studies. The CASTOR web platform provides an open access, collaborative and reproducible machine learning classifiers. CASTOR can be accessed at http://castor.bioinfo.uqam.ca .

  7. SUNPLIN: simulation with uncertainty for phylogenetic investigations.

    PubMed

    Martins, Wellington S; Carmo, Welton C; Longo, Humberto J; Rosa, Thierson C; Rangel, Thiago F

    2013-11-15

    Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability. In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/. We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

  8. Improved Hierarchical Optimization-Based Classification of Hyperspectral Images Using Shape Analysis

    NASA Technical Reports Server (NTRS)

    Tarabalka, Yuliya; Tilton, James C.

    2012-01-01

    A new spectral-spatial method for classification of hyperspectral images is proposed. The HSegClas method is based on the integration of probabilistic classification and shape analysis within the hierarchical step-wise optimization algorithm. First, probabilistic support vector machines classification is applied. Then, at each iteration two neighboring regions with the smallest Dissimilarity Criterion (DC) are merged, and classification probabilities are recomputed. The important contribution of this work consists in estimating a DC between regions as a function of statistical, classification and geometrical (area and rectangularity) features. Experimental results are presented on a 102-band ROSIS image of the Center of Pavia, Italy. The developed approach yields more accurate classification results when compared to previously proposed methods.

  9. Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF MS.

    PubMed

    Hsieh, Sen-Yung; Tseng, Chiao-Li; Lee, Yun-Shien; Kuo, An-Jing; Sun, Chien-Feng; Lin, Yen-Hsiu; Chen, Jen-Kun

    2008-02-01

    Accurate and rapid identification of pathogenic microorganisms is of critical importance in disease treatment and public health. Conventional work flows are time-consuming, and procedures are multifaceted. MS can be an alternative but is limited by low efficiency for amino acid sequencing as well as low reproducibility for spectrum fingerprinting. We systematically analyzed the feasibility of applying MS for rapid and accurate bacterial identification. Directly applying bacterial colonies without further protein extraction to MALDI-TOF MS analysis revealed rich peak contents and high reproducibility. The MS spectra derived from 57 isolates comprising six human pathogenic bacterial species were analyzed using both unsupervised hierarchical clustering and supervised model construction via the Genetic Algorithm. Hierarchical clustering analysis categorized the spectra into six groups precisely corresponding to the six bacterial species. Precise classification was also maintained in an independently prepared set of bacteria even when the numbers of m/z values were reduced to six. In parallel, classification models were constructed via Genetic Algorithm analysis. A model containing 18 m/z values accurately classified independently prepared bacteria and identified those species originally not used for model construction. Moreover bacteria fewer than 10(4) cells and different species in bacterial mixtures were identified using the classification model approach. In conclusion, the application of MALDI-TOF MS in combination with a suitable model construction provides a highly accurate method for bacterial classification and identification. The approach can identify bacteria with low abundance even in mixed flora, suggesting that a rapid and accurate bacterial identification using MS techniques even before culture can be attained in the near future.

  10. On Tree-Based Phylogenetic Networks.

    PubMed

    Zhang, Louxin

    2016-07-01

    A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.

  11. Phylogenetic Status of an Unrecorded Species of Curvularia, C. spicifera, Based on Current Classification System of Curvularia and Bipolaris Group Using Multi Loci.

    PubMed

    Jeon, Sun Jeong; Nguyen, Thi Thuong Thuong; Lee, Hyang Burm

    2015-09-01

    A seed-borne fungus, Curvularia sp. EML-KWD01, was isolated from an indigenous wheat seed by standard blotter method. This fungus was characterized based on the morphological characteristics and molecular phylogenetic analysis. Phylogenetic status of the fungus was determined using sequences of three loci: rDNA internal transcribed spacer, large ribosomal subunit, and glyceraldehyde 3-phosphate dehydrogenase gene. Multi loci sequencing analysis revealed that this fungus was Curvularia spicifera within Curvularia group 2 of family Pleosporaceae.

  12. A drone detection with aircraft classification based on a camera array

    NASA Astrophysics Data System (ADS)

    Liu, Hao; Qu, Fangchao; Liu, Yingjian; Zhao, Wei; Chen, Yitong

    2018-03-01

    In recent years, because of the rapid popularity of drones, many people have begun to operate drones, bringing a range of security issues to sensitive areas such as airports and military locus. It is one of the important ways to solve these problems by realizing fine-grained classification and providing the fast and accurate detection of different models of drone. The main challenges of fine-grained classification are that: (1) there are various types of drones, and the models are more complex and diverse. (2) the recognition test is fast and accurate, in addition, the existing methods are not efficient. In this paper, we propose a fine-grained drone detection system based on the high resolution camera array. The system can quickly and accurately recognize the detection of fine grained drone based on hd camera.

  13. Functional & phylogenetic diversity of copepod communities

    NASA Astrophysics Data System (ADS)

    Benedetti, F.; Ayata, S. D.; Blanco-Bercial, L.; Cornils, A.; Guilhaumon, F.

    2016-02-01

    The diversity of natural communities is classically estimated through species identification (taxonomic diversity) but can also be estimated from the ecological functions performed by the species (functional diversity), or from the phylogenetic relationships among them (phylogenetic diversity). Estimating functional diversity requires the definition of specific functional traits, i.e., phenotypic characteristics that impact fitness and are relevant to ecosystem functioning. Estimating phylogenetic diversity requires the description of phylogenetic relationships, for instance by using molecular tools. In the present study, we focused on the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. First, we implemented a specific trait database for the most commonly-sampled and abundant copepod species of the Mediterranean Sea. Our database includes 191 species, described by seven traits encompassing diverse ecological functions: minimal and maximal body length, trophic group, feeding type, spawning strategy, diel vertical migration and vertical habitat. Clustering analysis in the functional trait space revealed that Mediterranean copepods can be gathered into groups that have different ecological roles. Second, we reconstructed a phylogenetic tree using the available sequences of 18S rRNA. Our tree included 154 of the analyzed Mediterranean copepod species. We used these two datasets to describe the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. The replacement component (turn-over) and the species richness difference component (nestedness) of the beta diversity indices were identified. Finally, by comparing various and complementary aspects of plankton diversity (taxonomic, functional, and phylogenetic diversity) we were able to gain a better understanding of the relationships among the zooplankton community, biodiversity, ecosystem function, and environmental forcing.

  14. treespace: Statistical exploration of landscapes of phylogenetic trees.

    PubMed

    Jombart, Thibaut; Kendall, Michelle; Almagro-Garcia, Jacob; Colijn, Caroline

    2017-11-01

    The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low-dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group-specific consensus phylogenies. treespace also provides a user-friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  15. SUNPLIN: Simulation with Uncertainty for Phylogenetic Investigations

    PubMed Central

    2013-01-01

    Background Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability. Results In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/. Conclusion We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets. PMID:24229408

  16. Phylogenetically resolving epidemiologic linkage

    DOE PAGES

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-02-22

    The use of phylogenetic trees in epidemiological investigations has become commonplace, but their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the truemore » transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. Moreover, we confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results.« less

  17. Spatial patterns of phylogenetic diversity.

    PubMed

    Morlon, Hélène; Schwilk, Dylan W; Bryant, Jessica A; Marquet, Pablo A; Rebelo, Anthony G; Tauss, Catherine; Bohannan, Brendan J M; Green, Jessica L

    2011-02-01

    Ecologists and conservation biologists have historically used species-area and distance-decay relationships as tools to predict the spatial distribution of biodiversity and the impact of habitat loss on biodiversity. These tools treat each species as evolutionarily equivalent, yet the importance of species' evolutionary history in their ecology and conservation is becoming increasingly evident. Here, we provide theoretical predictions for phylogenetic analogues of the species-area and distance-decay relationships. We use a random model of community assembly and a spatially explicit flora dataset collected in four Mediterranean-type regions to provide theoretical predictions for the increase in phylogenetic diversity - the total phylogenetic branch-length separating a set of species - with increasing area and the decay in phylogenetic similarity with geographic separation. These developments may ultimately provide insights into the evolution and assembly of biological communities, and guide the selection of protected areas. © 2010 Blackwell Publishing Ltd/CNRS.

  18. Refining Landsat classification results using digital terrain data

    USGS Publications Warehouse

    Miller, Wayne A.; Shasby, Mark

    1982-01-01

     Scientists at the U.S. Geological Survey's Earth Resources Observation systems (EROS) Data Center have recently completed two land-cover mapping projects in which digital terrain data were used to refine Landsat classification results. Digital ter rain data were incorporated into the Landsat classification process using two different procedures that required developing decision criteria either subjectively or quantitatively. The subjective procedure was used in a vegetation mapping project in Arizona, and the quantitative procedure was used in a forest-fuels mapping project in Montana. By incorporating digital terrain data into the Landsat classification process, more spatially accurate landcover maps were produced for both projects.

  19. Phylogenetic overdispersion of plant species in southern Brazilian savannas.

    PubMed

    Silva, I A; Batalha, M A

    2009-08-01

    Ecological communities are the result of not only present ecological processes, such as competition among species and environmental filtering, but also past and continuing evolutionary processes. Based on these assumptions, we may infer mechanisms of contemporary coexistence from the phylogenetic relationships of the species in a community. We studied the phylogenetic structure of plant communities in four cerrado sites, in southeastern Brazil. We calculated two raw phylogenetic distances among the species sampled. We estimated the phylogenetic structure by comparing the observed phylogenetic distances to the distribution of phylogenetic distances in null communities. We obtained null communities by randomizing the phylogenetic relationships of the regional pool of species. We found a phylogenetic overdispersion of the cerrado species. Phylogenetic overdispersion has several explanations, depending on the phylogenetic history of traits and contemporary ecological interactions. However, based on coexistence models between grasses and trees, density-dependent ecological forces, and the evolutionary history of the cerrado flora, we argue that the phylogenetic overdispersion of cerrado species is predominantly due to competitive interactions, herbivores and pathogen attacks, and ecological speciation. Future studies will need to include information on the phylogenetic history of plant traits.

  20. Automatic grade classification of Barretts Esophagus through feature enhancement

    NASA Astrophysics Data System (ADS)

    Ghatwary, Noha; Ahmed, Amr; Ye, Xujiong; Jalab, Hamid

    2017-03-01

    Barretts Esophagus (BE) is a precancerous condition that affects the esophagus tube and has the risk of developing esophageal adenocarcinoma. BE is the process of developing metaplastic intestinal epithelium and replacing the normal cells in the esophageal area. The detection of BE is considered difficult due to its appearance and properties. The diagnosis is usually done through both endoscopy and biopsy. Recently, Computer Aided Diagnosis systems have been developed to support physicians opinion when facing difficulty in detection/classification in different types of diseases. In this paper, an automatic classification of Barretts Esophagus condition is introduced. The presented method enhances the internal features of a Confocal Laser Endomicroscopy (CLE) image by utilizing a proposed enhancement filter. This filter depends on fractional differentiation and integration that improve the features in the discrete wavelet transform of an image. Later on, various features are extracted from each enhanced image on different levels for the multi-classification process. Our approach is validated on a dataset that consists of a group of 32 patients with 262 images with different histology grades. The experimental results demonstrated the efficiency of the proposed technique. Our method helps clinicians for more accurate classification. This potentially helps to reduce the need for biopsies needed for diagnosis, facilitate the regular monitoring of treatment/development of the patients case and can help train doctors with the new endoscopy technology. The accurate automatic classification is particularly important for the Intestinal Metaplasia (IM) type, which could turn into deadly cancerous. Hence, this work contributes to automatic classification that facilitates early intervention/treatment and decreasing biopsy samples needed.

  1. Influence of pansharpening techniques in obtaining accurate vegetation thematic maps

    NASA Astrophysics Data System (ADS)

    Ibarrola-Ulzurrun, Edurne; Gonzalo-Martin, Consuelo; Marcello-Ruiz, Javier

    2016-10-01

    In last decades, there have been a decline in natural resources, becoming important to develop reliable methodologies for their management. The appearance of very high resolution sensors has offered a practical and cost-effective means for a good environmental management. In this context, improvements are needed for obtaining higher quality of the information available in order to get reliable classified images. Thus, pansharpening enhances the spatial resolution of the multispectral band by incorporating information from the panchromatic image. The main goal in the study is to implement pixel and object-based classification techniques applied to the fused imagery using different pansharpening algorithms and the evaluation of thematic maps generated that serve to obtain accurate information for the conservation of natural resources. A vulnerable heterogenic ecosystem from Canary Islands (Spain) was chosen, Teide National Park, and Worldview-2 high resolution imagery was employed. The classes considered of interest were set by the National Park conservation managers. 7 pansharpening techniques (GS, FIHS, HCS, MTF based, Wavelet `à trous' and Weighted Wavelet `à trous' through Fractal Dimension Maps) were chosen in order to improve the data quality with the goal to analyze the vegetation classes. Next, different classification algorithms were applied at pixel-based and object-based approach, moreover, an accuracy assessment of the different thematic maps obtained were performed. The highest classification accuracy was obtained applying Support Vector Machine classifier at object-based approach in the Weighted Wavelet `à trous' through Fractal Dimension Maps fused image. Finally, highlight the difficulty of the classification in Teide ecosystem due to the heterogeneity and the small size of the species. Thus, it is important to obtain accurate thematic maps for further studies in the management and conservation of natural resources.

  2. Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

    PubMed

    Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

    2014-01-01

    Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.

  3. Phylogenetic Framework and Molecular Signatures for the Main Clades of the Phylum Actinobacteria

    PubMed Central

    Gao, Beile

    2012-01-01

    Summary: The phylum Actinobacteria harbors many important human pathogens and also provides one of the richest sources of natural products, including numerous antibiotics and other compounds of biotechnological interest. Thus, a reliable phylogeny of this large phylum and the means to accurately identify its different constituent groups are of much interest. Detailed phylogenetic and comparative analyses of >150 actinobacterial genomes reported here form the basis for achieving these objectives. In phylogenetic trees based upon 35 conserved proteins, most of the main groups of Actinobacteria as well as a number of their superageneric clades are resolved. We also describe large numbers of molecular markers consisting of conserved signature indels in protein sequences and whole proteins that are specific for either all Actinobacteria or their different clades (viz., orders, families, genera, and subgenera) at various taxonomic levels. These signatures independently support the existence of different phylogenetic clades, and based upon them, it is now possible to delimit the phylum Actinobacteria (excluding Coriobacteriia) and most of its major groups in clear molecular terms. The species distribution patterns of these markers also provide important information regarding the interrelationships among different main orders of Actinobacteria. The identified molecular markers, in addition to enabling the development of a stable and reliable phylogenetic framework for this phylum, also provide novel and powerful means for the identification of different groups of Actinobacteria in diverse environments. Genetic and biochemical studies on these Actinobacteria-specific markers should lead to the discovery of novel biochemical and/or other properties that are unique to different groups of Actinobacteria. PMID:22390973

  4. Different relationships between temporal phylogenetic turnover and phylogenetic similarity and in two forests were detected by a new null model.

    PubMed

    Huang, Jian-Xiong; Zhang, Jian; Shen, Yong; Lian, Ju-yu; Cao, Hong-lin; Ye, Wan-hui; Wu, Lin-fang; Bin, Yue

    2014-01-01

    Ecologists have been monitoring community dynamics with the purpose of understanding the rates and causes of community change. However, there is a lack of monitoring of community dynamics from the perspective of phylogeny. We attempted to understand temporal phylogenetic turnover in a 50 ha tropical forest (Barro Colorado Island, BCI) and a 20 ha subtropical forest (Dinghushan in southern China, DHS). To obtain temporal phylogenetic turnover under random conditions, two null models were used. The first shuffled names of species that are widely used in community phylogenetic analyses. The second simulated demographic processes with careful consideration on the variation in dispersal ability among species and the variations in mortality both among species and among size classes. With the two models, we tested the relationships between temporal phylogenetic turnover and phylogenetic similarity at different spatial scales in the two forests. Results were more consistent with previous findings using the second null model suggesting that the second null model is more appropriate for our purposes. With the second null model, a significantly positive relationship was detected between phylogenetic turnover and phylogenetic similarity in BCI at a 10 m×10 m scale, potentially indicating phylogenetic density dependence. This relationship in DHS was significantly negative at three of five spatial scales. This could indicate abiotic filtering processes for community assembly. Using variation partitioning, we found phylogenetic similarity contributed to variation in temporal phylogenetic turnover in the DHS plot but not in BCI plot. The mechanisms for community assembly in BCI and DHS vary from phylogenetic perspective. Only the second null model detected this difference indicating the importance of choosing a proper null model.

  5. Undergraduate Students’ Difficulties in Reading and Constructing Phylogenetic Tree

    NASA Astrophysics Data System (ADS)

    Sa'adah, S.; Tapilouw, F. S.; Hidayat, T.

    2017-02-01

    Representation is a very important communication tool to communicate scientific concepts. Biologists produce phylogenetic representation to express their understanding of evolutionary relationships. The phylogenetic tree is visual representation depict a hypothesis about the evolutionary relationship and widely used in the biological sciences. Phylogenetic tree currently growing for many disciplines in biology. Consequently, learning about phylogenetic tree become an important part of biological education and an interesting area for biology education research. However, research showed many students often struggle with interpreting the information that phylogenetic trees depict. The purpose of this study was to investigate undergraduate students’ difficulties in reading and constructing a phylogenetic tree. The method of this study is a descriptive method. In this study, we used questionnaires, interviews, multiple choice and open-ended questions, reflective journals and observations. The findings showed students experiencing difficulties, especially in constructing a phylogenetic tree. The students’ responds indicated that main reasons for difficulties in constructing a phylogenetic tree are difficult to placing taxa in a phylogenetic tree based on the data provided so that the phylogenetic tree constructed does not describe the actual evolutionary relationship (incorrect relatedness). Students also have difficulties in determining the sister group, character synapomorphy, autapomorphy from data provided (character table) and comparing among phylogenetic tree. According to them building the phylogenetic tree is more difficult than reading the phylogenetic tree. Finding this studies provide information to undergraduate instructor and students to overcome learning difficulties of reading and constructing phylogenetic tree.

  6. Constructing phylogenetic trees using interacting pathways.

    PubMed

    Wan, Peng; Che, Dongsheng

    2013-01-01

    Phylogenetic trees are used to represent evolutionary relationships among biological species or organisms. The construction of phylogenetic trees is based on the similarities or differences of their physical or genetic features. Traditional approaches of constructing phylogenetic trees mainly focus on physical features. The recent advancement of high-throughput technologies has led to accumulation of huge amounts of biological data, which in turn changed the way of biological studies in various aspects. In this paper, we report our approach of building phylogenetic trees using the information of interacting pathways. We have applied hierarchical clustering on two domains of organisms-eukaryotes and prokaryotes. Our preliminary results have shown the effectiveness of using the interacting pathways in revealing evolutionary relationships.

  7. Classification of earth terrain using polarimetric synthetic aperture radar images

    NASA Technical Reports Server (NTRS)

    Lim, H. H.; Swartz, A. A.; Yueh, H. A.; Kong, J. A.; Shin, R. T.; Van Zyl, J. J.

    1989-01-01

    Supervised and unsupervised classification techniques are developed and used to classify the earth terrain components from SAR polarimetric images of San Francisco Bay and Traverse City, Michigan. The supervised techniques include the Bayes classifiers, normalized polarimetric classification, and simple feature classification using discriminates such as the absolute and normalized magnitude response of individual receiver channel returns and the phase difference between receiver channels. An algorithm is developed as an unsupervised technique which classifies terrain elements based on the relationship between the orientation angle and the handedness of the transmitting and receiving polariation states. It is found that supervised classification produces the best results when accurate classifier training data are used, while unsupervised classification may be applied when training data are not available.

  8. The Evolutionary Ecology of Plant Disease: A Phylogenetic Perspective.

    PubMed

    Gilbert, Gregory S; Parker, Ingrid M

    2016-08-04

    An explicit phylogenetic perspective provides useful tools for phytopathology and plant disease ecology because the traits of both plants and microbes are shaped by their evolutionary histories. We present brief primers on phylogenetic signal and the analytical tools of phylogenetic ecology. We review the literature and find abundant evidence of phylogenetic signal in pathogens and plants for most traits involved in disease interactions. Plant nonhost resistance mechanisms and pathogen housekeeping functions are conserved at deeper phylogenetic levels, whereas molecular traits associated with rapid coevolutionary dynamics are more labile at branch tips. Horizontal gene transfer disrupts the phylogenetic signal for some microbial traits. Emergent traits, such as host range and disease severity, show clear phylogenetic signals. Therefore pathogen spread and disease impact are influenced by the phylogenetic structure of host assemblages. Phylogenetically rare species escape disease pressure. Phylogenetic tools could be used to develop predictive tools for phytosanitary risk analysis and reduce disease pressure in multispecies cropping systems.

  9. Nutritional status in sick children and adolescents is not accurately reflected by BMI-SDS.

    PubMed

    Fusch, Gerhard; Raja, Preeya; Dung, Nguyen Quang; Karaolis-Danckert, Nadina; Barr, Ronald; Fusch, Christoph

    2013-01-01

    Nutritional status provides helpful information of disease severity and treatment effectiveness. Body mass index standard deviation scores (BMI-SDS) provide an approximation of body composition and thus are frequently used to classify nutritional status of sick children and adolescents. However, the accuracy of estimating body composition in this population using BMI-SDS has not been assessed. Thus, this study aims to evaluate the accuracy of nutritional status classification in sick infants and adolescents using BMI-SDS, upon comparison to classification using percentage body fat (%BF) reference charts. BMI-SDS was calculated from anthropometric measurements and %BF was measured using dual-energy x-ray absorptiometry (DXA) for 393 sick children and adolescents (5 months-18 years). Subjects were classified by nutritional status (underweight, normal weight, overweight, and obese), using 2 methods: (1) BMI-SDS, based on age- and gender-specific percentiles, and (2) %BF reference charts (standard). Linear regression and a correlation analysis were conducted to compare agreement between both methods of nutritional status classification. %BF reference value comparisons were also made between 3 independent sources based on German, Canadian, and American study populations. Correlation between nutritional status classification by BMI-SDS and %BF agreed moderately (r (2) = 0.75, 0.76 in boys and girls, respectively). The misclassification of nutritional status in sick children and adolescents using BMI-SDS was 27% when using German %BF references. Similar rates observed when using Canadian and American %BF references (24% and 23%, respectively). Using BMI-SDS to determine nutritional status in a sick population is not considered an appropriate clinical tool for identifying individual underweight or overweight children or adolescents. However, BMI-SDS may be appropriate for longitudinal measurements or for screening purposes in large field studies. When accurate nutritional

  10. Multiclass classification of microarray data samples with a reduced number of genes

    PubMed Central

    2011-01-01

    Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522

  11. Branch classification: A new mechanism for improving branch predictor performance

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, P.Y.; Hao, E.; Patt, Y.

    There is wide agreement that one of the most significant impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction stream. Speculative execution is one solution to the branch problem, but speculative work is discarded if a branch is mispredicted. For it to be effective, speculative work is discarded if a branch is mispredicted. For it to be effective, speculative execution requires a very accurate branch predictor; 95% accuracy is not good enough. This paper proposes branch classification, a methodology for building more accurate branch predictors. Branch classification allows anmore » individual branch instruction to be associated with the branch predictor best suited to predict its direction. Using this approach, a hybrid branch predictor can be constructed such that each component branch predictor predicts those branches for which it is best suited. To demonstrate the usefulness of branch classification, an example classification scheme is given and a new hybrid predictor is built based on this scheme which achieves a higher prediction accuracy than any branch predictor previously reported in the literature.« less

  12. Inferring Phylogenetic Networks Using PhyloNet.

    PubMed

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  13. [Definition and classification of pulmonary arterial hypertension].

    PubMed

    Nakanishi, Norifumi

    2008-11-01

    Pulmonary hypertension(PH) is a disorder that may occur either in the setting of a variety of underlying medical conditions or as a disease that uniquely affects the pulmonary vasculature. Because an accurate diagnosis of PH in a patient is essential to establish an effective treatment, a classification of PH has been helpful. The first classification, established at WHO Symposium in 1973, classified PH into groups based on the known cause and defined primary pulmonary hypertension (PPH) as a separate entity of unknown cause. In 1998, the second World Symposium on PPH was held in Evian. Evian classification introduced the concept of conditions that directly affected the pulmonary vasculature (i.e., PAH), which included PPH. In 2003, the third World Symposium on PAH convened in Venice. In Venice classification, the term 'PPH' was abandoned in favor of 'idiopathic' within the group of disease known as 'PAH'.

  14. Identification and classification of silks using infrared spectroscopy

    PubMed Central

    Boulet-Audet, Maxime; Vollrath, Fritz; Holland, Chris

    2015-01-01

    ABSTRACT Lepidopteran silks number in the thousands and display a vast diversity of structures, properties and industrial potential. To map this remarkable biochemical diversity, we present an identification and screening method based on the infrared spectra of native silk feedstock and cocoons. Multivariate analysis of over 1214 infrared spectra obtained from 35 species allowed us to group silks into distinct hierarchies and a classification that agrees well with current phylogenetic data and taxonomies. This approach also provides information on the relative content of sericin, calcium oxalate, phenolic compounds, poly-alanine and poly(alanine-glycine) β-sheets. It emerged that the domesticated mulberry silkmoth Bombyx mori represents an outlier compared with other silkmoth taxa in terms of spectral properties. Interestingly, Epiphora bauhiniae was found to contain the highest amount of β-sheets reported to date for any wild silkmoth. We conclude that our approach provides a new route to determine cocoon chemical composition and in turn a novel, biological as well as material, classification of silks. PMID:26347557

  15. The space of ultrametric phylogenetic trees.

    PubMed

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  16. Spatial-spectral blood cell classification with microscopic hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Ran, Qiong; Chang, Lan; Li, Wei; Xu, Xiaofeng

    2017-10-01

    Microscopic hyperspectral images provide a new way for blood cell examination. The hyperspectral imagery can greatly facilitate the classification of different blood cells. In this paper, the microscopic hyperspectral images are acquired by connecting the microscope and the hyperspectral imager, and then tested for blood cell classification. For combined use of the spectral and spatial information provided by hyperspectral images, a spatial-spectral classification method is improved from the classical extreme learning machine (ELM) by integrating spatial context into the image classification task with Markov random field (MRF) model. Comparisons are done among ELM, ELM-MRF, support vector machines(SVM) and SVMMRF methods. Results show the spatial-spectral classification methods(ELM-MRF, SVM-MRF) perform better than pixel-based methods(ELM, SVM), and the proposed ELM-MRF has higher precision and show more accurate location of cells.

  17. Behavior Based Social Dimensions Extraction for Multi-Label Classification

    PubMed Central

    Li, Le; Xu, Junyi; Xiao, Weidong; Ge, Bin

    2016-01-01

    Classification based on social dimensions is commonly used to handle the multi-label classification task in heterogeneous networks. However, traditional methods, which mostly rely on the community detection algorithms to extract the latent social dimensions, produce unsatisfactory performance when community detection algorithms fail. In this paper, we propose a novel behavior based social dimensions extraction method to improve the classification performance in multi-label heterogeneous networks. In our method, nodes’ behavior features, instead of community memberships, are used to extract social dimensions. By introducing Latent Dirichlet Allocation (LDA) to model the network generation process, nodes’ connection behaviors with different communities can be extracted accurately, which are applied as latent social dimensions for classification. Experiments on various public datasets reveal that the proposed method can obtain satisfactory classification results in comparison to other state-of-the-art methods on smaller social dimensions. PMID:27049849

  18. Detecting taxonomic and phylogenetic signals in equid cheek teeth: towards new palaeontological and archaeological proxies

    NASA Astrophysics Data System (ADS)

    Cucchi, T.; Mohaseb, A.; Peigné, S.; Debue, K.; Orlando, L.; Mashkour, M.

    2017-04-01

    The Plio-Pleistocene evolution of Equus and the subsequent domestication of horses and donkeys remains poorly understood, due to the lack of phenotypic markers capable of tracing this evolutionary process in the palaeontological/archaeological record. Using images from 345 specimens, encompassing 15 extant taxa of equids, we quantified the occlusal enamel folding pattern in four mandibular cheek teeth with a single geometric morphometric protocol. We initially investigated the protocol accuracy by assigning each tooth to its correct anatomical position and taxonomic group. We then contrasted the phylogenetic signal present in each tooth shape with an exome-wide phylogeny from 10 extant equine species. We estimated the strength of the phylogenetic signal using a Brownian motion model of evolution with multivariate K statistic, and mapped the dental shape along the molecular phylogeny using an approach based on squared-change parsimony. We found clear evidence for the relevance of dental phenotypes to accurately discriminate all modern members of the genus Equus and capture their phylogenetic relationships. These results are valuable for both palaeontologists and zooarchaeologists exploring the spatial and temporal dynamics of the evolutionary history of the horse family, up to the latest domestication trajectories of horses and donkeys.

  19. Detecting taxonomic and phylogenetic signals in equid cheek teeth: towards new palaeontological and archaeological proxies

    PubMed Central

    Mohaseb, A.; Peigné, S.; Debue, K.; Orlando, L.; Mashkour, M.

    2017-01-01

    The Plio–Pleistocene evolution of Equus and the subsequent domestication of horses and donkeys remains poorly understood, due to the lack of phenotypic markers capable of tracing this evolutionary process in the palaeontological/archaeological record. Using images from 345 specimens, encompassing 15 extant taxa of equids, we quantified the occlusal enamel folding pattern in four mandibular cheek teeth with a single geometric morphometric protocol. We initially investigated the protocol accuracy by assigning each tooth to its correct anatomical position and taxonomic group. We then contrasted the phylogenetic signal present in each tooth shape with an exome-wide phylogeny from 10 extant equine species. We estimated the strength of the phylogenetic signal using a Brownian motion model of evolution with multivariate K statistic, and mapped the dental shape along the molecular phylogeny using an approach based on squared-change parsimony. We found clear evidence for the relevance of dental phenotypes to accurately discriminate all modern members of the genus Equus and capture their phylogenetic relationships. These results are valuable for both palaeontologists and zooarchaeologists exploring the spatial and temporal dynamics of the evolutionary history of the horse family, up to the latest domestication trajectories of horses and donkeys. PMID:28484618

  20. PhyloTreePruner: A Phylogenetic Tree-Based Approach for Selection of Orthologous Sequences for Phylogenomics.

    PubMed

    Kocot, Kevin M; Citarella, Mathew R; Moroz, Leonid L; Halanych, Kenneth M

    2013-01-01

    Molecular phylogenetics relies on accurate identification of orthologous sequences among the taxa of interest. Most orthology inference programs available for use in phylogenomics rely on small sets of pre-defined orthologs from model organisms or phenetic approaches such as all-versus-all sequence comparisons followed by Markov graph-based clustering. Such approaches have high sensitivity but may erroneously include paralogous sequences. We developed PhyloTreePruner, a software utility that uses a phylogenetic approach to refine orthology inferences made using phenetic methods. PhyloTreePruner checks single-gene trees for evidence of paralogy and generates a new alignment for each group containing only sequences inferred to be orthologs. Importantly, PhyloTreePruner takes into account support values on the tree and avoids unnecessarily deleting sequences in cases where a weakly supported tree topology incorrectly indicates paralogy. A test of PhyloTreePruner on a dataset generated from 11 completely sequenced arthropod genomes identified 2,027 orthologous groups sampled for all taxa. Phylogenetic analysis of the concatenated supermatrix yielded a generally well-supported topology that was consistent with the current understanding of arthropod phylogeny. PhyloTreePruner is freely available from http://sourceforge.net/projects/phylotreepruner/.

  1. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation

    PubMed Central

    Roger, Andrew J; Hug, Laura A

    2006-01-01

    Determining the relationships among and divergence times for the major eukaryotic lineages remains one of the most important and controversial outstanding problems in evolutionary biology. The sequencing and phylogenetic analyses of ribosomal RNA (rRNA) genes led to the first nearly comprehensive phylogenies of eukaryotes in the late 1980s, and supported a view where cellular complexity was acquired during the divergence of extant unicellular eukaryote lineages. More recently, however, refinements in analytical methods coupled with the availability of many additional genes for phylogenetic analysis showed that much of the deep structure of early rRNA trees was artefactual. Recent phylogenetic analyses of a multiple genes and the discovery of important molecular and ultrastructural phylogenetic characters have resolved eukaryotic diversity into six major hypothetical groups. Yet relationships among these groups remain poorly understood because of saturation of sequence changes on the billion-year time-scale, possible rapid radiations of major lineages, phylogenetic artefacts and endosymbiotic or lateral gene transfer among eukaryotes. Estimating the divergence dates between the major eukaryote lineages using molecular analyses is even more difficult than phylogenetic estimation. Error in such analyses comes from a myriad of sources including: (i) calibration fossil dates, (ii) the assumed phylogenetic tree, (iii) the nucleotide or amino acid substitution model, (iv) substitution number (branch length) estimates, (v) the model of how rates of evolution change over the tree, (vi) error inherent in the time estimates for a given model and (vii) how multiple gene data are treated. By reanalysing datasets from recently published molecular clock studies, we show that when errors from these various sources are properly accounted for, the confidence intervals on inferred dates can be very large. Furthermore, estimated dates of divergence vary hugely depending on the methods

  2. Charles Darwin, beetles and phylogenetics.

    PubMed

    Beutel, Rolf G; Friedrich, Frank; Leschen, Richard A B

    2009-11-01

    Here, we review Charles Darwin's relation to beetles and developments in coleopteran systematics in the last two centuries. Darwin was an enthusiastic beetle collector. He used beetles to illustrate different evolutionary phenomena in his major works, and astonishingly, an entire sub-chapter is dedicated to beetles in "The Descent of Man". During his voyage on the Beagle, Darwin was impressed by the high diversity of beetles in the tropics, and he remarked that, to his surprise, the majority of species were small and inconspicuous. However, despite his obvious interest in the group, he did not get involved in beetle taxonomy, and his theoretical work had little immediate impact on beetle classification. The development of taxonomy and classification in the late nineteenth and earlier twentieth century was mainly characterised by the exploration of new character systems (e.g. larval features and wing venation). In the mid-twentieth century, Hennig's new methodology to group lineages by derived characters revolutionised systematics of Coleoptera and other organisms. As envisioned by Darwin and Ernst Haeckel, the new Hennigian approach enabled systematists to establish classifications truly reflecting evolution. Roy A. Crowson and Howard E. Hinton, who both made tremendous contributions to coleopterology, had an ambivalent attitude towards the Hennigian ideas. The Mickoleit school combined detailed anatomical work with a classical Hennigian character evaluation, with stepwise tree building, comparatively few characters and a priori polarity assessment without explicit use of the outgroup comparison method. The rise of cladistic methods in the 1970s had a strong impact on beetle systematics. Cladistic computer programs facilitated parsimony analyses of large data matrices, mostly morphological characters not requiring detailed anatomical investigations. Molecular studies on beetle phylogeny started in the 1990s with modest taxon sampling and limited DNA data. This has

  3. Charles Darwin, beetles and phylogenetics

    NASA Astrophysics Data System (ADS)

    Beutel, Rolf G.; Friedrich, Frank; Leschen, Richard A. B.

    2009-11-01

    Here, we review Charles Darwin’s relation to beetles and developments in coleopteran systematics in the last two centuries. Darwin was an enthusiastic beetle collector. He used beetles to illustrate different evolutionary phenomena in his major works, and astonishingly, an entire sub-chapter is dedicated to beetles in “The Descent of Man”. During his voyage on the Beagle, Darwin was impressed by the high diversity of beetles in the tropics, and he remarked that, to his surprise, the majority of species were small and inconspicuous. However, despite his obvious interest in the group, he did not get involved in beetle taxonomy, and his theoretical work had little immediate impact on beetle classification. The development of taxonomy and classification in the late nineteenth and earlier twentieth century was mainly characterised by the exploration of new character systems (e.g. larval features and wing venation). In the mid-twentieth century, Hennig’s new methodology to group lineages by derived characters revolutionised systematics of Coleoptera and other organisms. As envisioned by Darwin and Ernst Haeckel, the new Hennigian approach enabled systematists to establish classifications truly reflecting evolution. Roy A. Crowson and Howard E. Hinton, who both made tremendous contributions to coleopterology, had an ambivalent attitude towards the Hennigian ideas. The Mickoleit school combined detailed anatomical work with a classical Hennigian character evaluation, with stepwise tree building, comparatively few characters and a priori polarity assessment without explicit use of the outgroup comparison method. The rise of cladistic methods in the 1970s had a strong impact on beetle systematics. Cladistic computer programs facilitated parsimony analyses of large data matrices, mostly morphological characters not requiring detailed anatomical investigations. Molecular studies on beetle phylogeny started in the 1990s with modest taxon sampling and limited DNA data

  4. A "TNM" classification system for cancer pain: the Edmonton Classification System for Cancer Pain (ECS-CP).

    PubMed

    Fainsinger, Robin L; Nekolaichuk, Cheryl L

    2008-06-01

    The purpose of this paper is to provide an overview of the development of a "TNM" cancer pain classification system for advanced cancer patients, the Edmonton Classification System for Cancer Pain (ECS-CP). Until we have a common international language to discuss cancer pain, understanding differences in clinical and research experience in opioid rotation and use remains problematic. The complexity of the cancer pain experience presents unique challenges for the classification of pain. To date, no universally accepted pain classification measure can accurately predict the complexity of pain management, particularly for patients with cancer pain that is difficult to treat. In response to this gap in clinical assessment, the Edmonton Staging System (ESS), a classification system for cancer pain, was developed. Difficulties in definitions and interpretation of some aspects of the ESS restricted acceptance and widespread use. Construct, inter-rater reliability, and predictive validity evidence have contributed to the development of the ECS-CP. The five features of the ECS-CP--Pain Mechanism, Incident Pain, Psychological Distress, Addictive Behavior and Cognitive Function--have demonstrated value in predicting pain management complexity. The development of a standardized classification system that is comprehensive, prognostic and simple to use could provide a common language for clinical management and research of cancer pain. An international study to assess the inter-rater reliability and predictive value of the ECS-CP is currently in progress.

  5. Consensus Classification Using Non-Optimized Classifiers.

    PubMed

    Brownfield, Brett; Lemos, Tony; Kalivas, John H

    2018-04-03

    Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.

  6. Undergraduate Students’ Initial Ability in Understanding Phylogenetic Tree

    NASA Astrophysics Data System (ADS)

    Sa'adah, S.; Hidayat, T.; Sudargo, Fransisca

    2017-04-01

    The Phylogenetic tree is a visual representation depicts a hypothesis about the evolutionary relationship among taxa. Evolutionary experts use this representation to evaluate the evidence for evolution. The phylogenetic tree is currently growing for many disciplines in biology. Consequently, learning about the phylogenetic tree has become an important part of biological education and an interesting area of biology education research. Skill to understanding and reasoning of the phylogenetic tree, (called tree thinking) is an important skill for biology students. However, research showed many students have difficulty in interpreting, constructing, and comparing among the phylogenetic tree, as well as experiencing a misconception in the understanding of the phylogenetic tree. Students are often not taught how to reason about evolutionary relationship depicted in the diagram. Students are also not provided with information about the underlying theory and process of phylogenetic. This study aims to investigate the initial ability of undergraduate students in understanding and reasoning of the phylogenetic tree. The research method is the descriptive method. Students are given multiple choice questions and an essay that representative by tree thinking elements. Each correct answer made percentages. Each student is also given questionnaires. The results showed that the undergraduate students’ initial ability in understanding and reasoning phylogenetic tree is low. Many students are not able to answer questions about the phylogenetic tree. Only 19 % undergraduate student who answered correctly on indicator evaluate the evolutionary relationship among taxa, 25% undergraduate student who answered correctly on indicator applying concepts of the clade, 17% undergraduate student who answered correctly on indicator determines the character evolution, and only a few undergraduate student who can construct the phylogenetic tree.

  7. Phylogenetic relationships of Malaysia's long-tailed macaques, Macaca fascicularis, based on cytochrome b sequences.

    PubMed

    Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir

    2014-01-01

    Phylogenetic relationships among Malaysia's long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo's population was distinguished from Peninsula's population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia's M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia.

  8. Improved Maximum Parsimony Models for Phylogenetic Networks.

    PubMed

    Van Iersel, Leo; Jones, Mark; Scornavacca, Celine

    2018-05-01

    Phylogenetic networks are well suited to represent evolutionary histories comprising reticulate evolution. Several methods aiming at reconstructing explicit phylogenetic networks have been developed in the last two decades. In this article, we propose a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modeled by the definitions currently present in the literature (namely, the "hardwired" and "softwired" parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.

  9. Phylogenetic Characterization of Transport Protein Superfamilies: Superiority of SuperfamilyTree Programs over Those Based on Multiple Alignments

    PubMed Central

    Chen, Jonathan S.; Reddy, Vamsee; Chen, Joshua H.; Shlykov, Maksim A.; Zheng, Wei Hao; Cho, Jaehoon; Yen, Ming Ren; Saier, Milton H.

    2012-01-01

    Transport proteins function in the translocation of ions, solutes and macromolecules across cellular and organellar membranes. These integral membrane proteins fall into >600 families as tabulated in the Transporter Classification Database (www.tcdb.org). Recent studies, some of which are reported here, define distant phylogenetic relationships between families with the creation of superfamilies. Several of these are analyzed using a novel set of programs designed to allow reliable prediction of phylogenetic trees when sequence divergence is too great to allow the use of multiple alignments. These new programs, called SuperfamilyTree1 and 2 (SFT1 and 2), allow display of protein and family relationships, respectively, based on thousands of comparative BLAST scores rather than multiple alignments. Superfamilies analyzed include: (1) Aerolysins, (2) RTX Toxins, (3) Defensins, (4) Ion Transporters, (5) Bile/Arsenite/Riboflavin Transporters, (6) Cation: Proton Antiporters, and (7) the Glucose/Fructose/Lactose superfamily within the prokaryotic phosphoenol pyruvate-dependent Phosphotransferase System. In addition to defining the phylogenetic relationships of the proteins and families within these seven superfamilies, evidence is provided showing that the SFT programs outperform programs that are based on multiple alignments whenever sequence divergence of superfamily members is extensive. The SFT programs should be applicable to virtually any superfamily of proteins or nucleic acids. PMID:22286036

  10. Genomic Repeat Abundances Contain Phylogenetic Signal

    PubMed Central

    Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.

    2015-01-01

    A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464

  11. Phylogenetic Factor Analysis.

    PubMed

    Tolkoff, Max R; Alfaro, Michael E; Baele, Guy; Lemey, Philippe; Suchard, Marc A

    2018-05-01

    Phylogenetic comparative methods explore the relationships between quantitative traits adjusting for shared evolutionary history. This adjustment often occurs through a Brownian diffusion process along the branches of the phylogeny that generates model residuals or the traits themselves. For high-dimensional traits, inferring all pair-wise correlations within the multivariate diffusion is limiting. To circumvent this problem, we propose phylogenetic factor analysis (PFA) that assumes a small unknown number of independent evolutionary factors arise along the phylogeny and these factors generate clusters of dependent traits. Set in a Bayesian framework, PFA provides measures of uncertainty on the factor number and groupings, combines both continuous and discrete traits, integrates over missing measurements and incorporates phylogenetic uncertainty with the help of molecular sequences. We develop Gibbs samplers based on dynamic programming to estimate the PFA posterior distribution, over 3-fold faster than for multivariate diffusion and a further order-of-magnitude more efficiently in the presence of latent traits. We further propose a novel marginal likelihood estimator for previously impractical models with discrete data and find that PFA also provides a better fit than multivariate diffusion in evolutionary questions in columbine flower development, placental reproduction transitions and triggerfish fin morphometry.

  12. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees

  13. Application of unweighted pair group methods with arithmetic average (UPGMA) for identification of kinship types and spreading of ebola virus through establishment of phylogenetic tree

    NASA Astrophysics Data System (ADS)

    Andriani, Tri; Irawan, Mohammad Isa

    2017-08-01

    Ebola Virus Disease (EVD) is a disease caused by a virus of the genus Ebolavirus (EBOV), family Filoviridae. Ebola virus is classifed into five types, namely Zaire ebolavirus (ZEBOV), Sudan ebolavirus (SEBOV), Bundibugyo ebolavirus (BEBOV), Tai Forest ebolavirus also known as Cote d'Ivoire ebolavirus (CIEBOV), and Reston ebolavirus (REBOV). Identification of kinship types of Ebola virus can be performed using phylogenetic trees. In this study, the phylogenetic tree constructed by UPGMA method in which there are Multiple Alignment using Progressive Method. The results concluded that the phylogenetic tree formation kinship ebola virus types that kind of Tai Forest ebolavirus close to Bundibugyo ebolavirus but the layout state ebola epidemic spread far apart. The genetic distance for this type of Bundibugyo ebolavirus with Tai Forest ebolavirus is 0.3725. Type Tai Forest ebolavirus similar to Bundibugyo ebolavirus not inuenced by the proximity of the area ebola epidemic spread.

  14. Neuropsychological Test Selection for Cognitive Impairment Classification: A Machine Learning Approach

    PubMed Central

    Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.

    2016-01-01

    Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171

  15. Probabilistic Graphical Model Representation in Phylogenetics

    PubMed Central

    Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

    2014-01-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559

  16. Fast and accurate phylogeny reconstruction using filtered spaced-word matches

    PubMed Central

    Sohrabi-Jahromi, Salma; Morgenstern, Burkhard

    2017-01-01

    Abstract Motivation: Word-based or ‘alignment-free’ algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. Results: We propose Filtered Spaced Word Matches (FSWM), a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don’t-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don’t-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don’t-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. Availability and Implementation: The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/ Contact: chris.leimeister@stud.uni-goettingen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073754

  17. Fast and accurate phylogeny reconstruction using filtered spaced-word matches.

    PubMed

    Leimeister, Chris-André; Sohrabi-Jahromi, Salma; Morgenstern, Burkhard

    2017-04-01

    Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. We propose Filtered Spaced Word Matches (FSWM) , a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don't-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don't-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don't-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/. chris.leimeister@stud.uni-goettingen.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  18. Accurate Classification of Diminutive Colorectal Polyps Using Computer-Aided Analysis.

    PubMed

    Chen, Peng-Jen; Lin, Meng-Chiung; Lai, Mei-Ju; Lin, Jung-Chun; Lu, Henry Horng-Shing; Tseng, Vincent S

    2018-02-01

    Narrow-band imaging is an image-enhanced form of endoscopy used to observed microstructures and capillaries of the mucosal epithelium which allows for real-time prediction of histologic features of colorectal polyps. However, narrow-band imaging expertise is required to differentiate hyperplastic from neoplastic polyps with high levels of accuracy. We developed and tested a system of computer-aided diagnosis with a deep neural network (DNN-CAD) to analyze narrow-band images of diminutive colorectal polyps. We collected 1476 images of neoplastic polyps and 681 images of hyperplastic polyps, obtained from the picture archiving and communications system database in a tertiary hospital in Taiwan. Histologic findings from the polyps were also collected and used as the reference standard. The images and data were used to train the DNN. A test set of images (96 hyperplastic and 188 neoplastic polyps, smaller than 5 mm), obtained from patients who underwent colonoscopies from March 2017 through August 2017, was then used to test the diagnostic ability of the DNN-CAD vs endoscopists (2 expert and 4 novice), who were asked to classify the images of the test set as neoplastic or hyperplastic. Their classifications were compared with findings from histologic analysis. The primary outcome measures were diagnostic accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic time. The accuracy, sensitivity, specificity, PPV, NPV, and diagnostic time were compared among DNN-CAD, the novice endoscopists, and the expert endoscopists. The study was designed to detect a difference of 10% in accuracy by a 2-sided McNemar test. In the test set, the DNN-CAD identified neoplastic or hyperplastic polyps with 96.3% sensitivity, 78.1% specificity, a PPV of 89.6%, and a NPV of 91.5%. Fewer than half of the novice endoscopists classified polyps with a NPV of 90% (their NPVs ranged from 73.9% to 84.0%). DNN-CAD classified polyps as

  19. Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.

    PubMed

    Mirzaei, Sajad; Wu, Yufeng

    2016-01-01

    Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.

  20. Accurate Classification of RNA Structures Using Topological Fingerprints

    PubMed Central

    Li, Kejie; Gribskov, Michael

    2016-01-01

    While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint. PMID:27755571

  1. Phylogenetic relationships of South American lizards of the genus Stenocercus (Squamata: Iguania): A new approach using a general mixture model for gene sequence data.

    PubMed

    Torres-Carvajal, Omar; Schulte, James A; Cadle, John E

    2006-04-01

    The South American iguanian lizard genus Stenocercus includes 54 species occurring mostly in the Andes and adjacent lowland areas from northern Venezuela and Colombia to central Argentina at elevations of 0-4000m. Small taxon or character sampling has characterized all phylogenetic analyses of Stenocercus, which has long been recognized as sister taxon to the Tropidurus Group. In this study, we use mtDNA sequence data to perform phylogenetic analyses that include 32 species of Stenocercus and 12 outgroup taxa. Monophyly of this genus is strongly supported by maximum parsimony and Bayesian analyses. Evolutionary relationships within Stenocercus are further analyzed with a Bayesian implementation of a general mixture model, which accommodates variability in the pattern of evolution across sites. These analyses indicate a basal split of Stenocercus into two clades, one of which receives very strong statistical support. In addition, we test previous hypotheses using non-parametric and parametric statistical methods, and provide a phylogenetic classification for Stenocercus.

  2. A Higher Level Classification of All Living Organisms

    PubMed Central

    Ruggiero, Michael A.; Gordon, Dennis P.; Orrell, Thomas M.; Bailly, Nicolas; Bourgoin, Thierry; Brusca, Richard C.; Cavalier-Smith, Thomas; Guiry, Michael D.; Kirk, Paul M.

    2015-01-01

    We present a consensus classification of life to embrace the more than 1.6 million species already provided by more than 3,000 taxonomists’ expert opinions in a unified and coherent, hierarchically ranked system known as the Catalogue of Life (CoL). The intent of this collaborative effort is to provide a hierarchical classification serving not only the needs of the CoL’s database providers but also the diverse public-domain user community, most of whom are familiar with the Linnaean conceptual system of ordering taxon relationships. This classification is neither phylogenetic nor evolutionary but instead represents a consensus view that accommodates taxonomic choices and practical compromises among diverse expert opinions, public usages, and conflicting evidence about the boundaries between taxa and the ranks of major taxa, including kingdoms. Certain key issues, some not fully resolved, are addressed in particular. Beyond its immediate use as a management tool for the CoL and ITIS (Integrated Taxonomic Information System), it is immediately valuable as a reference for taxonomic and biodiversity research, as a tool for societal communication, and as a classificatory “backbone” for biodiversity databases, museum collections, libraries, and textbooks. Such a modern comprehensive hierarchy has not previously existed at this level of specificity. PMID:25923521

  3. Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution

    PubMed Central

    Broughton, Richard E.; Betancur-R., Ricardo; Li, Chenhong; Arratia, Gloria; Ortí, Guillermo

    2013-01-01

    Over half of all vertebrates are “fishes”, which exhibit enormous diversity in morphology, physiology, behavior, reproductive biology, and ecology. Investigation of fundamental areas of vertebrate biology depend critically on a robust phylogeny of fishes, yet evolutionary relationships among the major actinopterygian and sarcopterygian lineages have not been conclusively resolved. Although a consensus phylogeny of teleosts has been emerging recently, it has been based on analyses of various subsets of actinopterygian taxa, but not on a full sample of all bony fishes. Here we conducted a comprehensive phylogenetic study on a broad taxonomic sample of 61 actinopterygian and sarcopterygian lineages (with a chondrichthyan outgroup) using a molecular data set of 21 independent loci. These data yielded a resolved phylogenetic hypothesis for extant Osteichthyes, including 1) reciprocally monophyletic Sarcopterygii and Actinopterygii, as currently understood, with polypteriforms as the first diverging lineage within Actinopterygii; 2) a monophyletic group containing gars and bowfin (= Holostei) as sister group to teleosts; and 3) the earliest diverging lineage among teleosts being Elopomorpha, rather than Osteoglossomorpha. Relaxed-clock dating analysis employing a set of 24 newly applied fossil calibrations reveals divergence times that are more consistent with paleontological estimates than previous studies. Establishing a new phylogenetic pattern with accurate divergence dates for bony fishes illustrates several areas where the fossil record is incomplete and provides critical new insights on diversification of this important vertebrate group. PMID:23788273

  4. Rearrangement moves on rooted phylogenetic networks

    PubMed Central

    Gambette, Philippe; van Iersel, Leo; Jones, Mark; Scornavacca, Celine

    2017-01-01

    Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network—that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose “horizontal” moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and “vertical” moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves—named rNNI and rSPR—reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results—separating the contributions of horizontal and vertical moves—we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a

  5. Automated classification of Acid Rock Drainage potential from Corescan drill core imagery

    NASA Astrophysics Data System (ADS)

    Cracknell, M. J.; Jackson, L.; Parbhakar-Fox, A.; Savinova, K.

    2017-12-01

    Classification of the acid forming potential of waste rock is important for managing environmental hazards associated with mining operations. Current methods for the classification of acid rock drainage (ARD) potential usually involve labour intensive and subjective assessment of drill core and/or hand specimens. Manual methods are subject to operator bias, human error and the amount of material that can be assessed within a given time frame is limited. The automated classification of ARD potential documented here is based on the ARD Index developed by Parbhakar-Fox et al. (2011). This ARD Index involves the combination of five indicators: A - sulphide content; B - sulphide alteration; C - sulphide morphology; D - primary neutraliser content; and E - sulphide mineral association. Several components of the ARD Index require accurate identification of sulphide minerals. This is achieved by classifying Corescan Red-Green-Blue true colour images into the presence or absence of sulphide minerals using supervised classification. Subsequently, sulphide classification images are processed and combined with Corescan SWIR-based mineral classifications to obtain information on sulphide content, indices representing sulphide textures (disseminated versus massive and degree of veining), and spatially associated minerals. This information is combined to calculate ARD Index indicator values that feed into the classification of ARD potential. Automated ARD potential classifications of drill core samples associated with a porphyry Cu-Au deposit are compared to manually derived classifications and those obtained by standard static geochemical testing and X-ray diffractometry analyses. Results indicate a high degree of similarity between automated and manual ARD potential classifications. Major differences between approaches are observed in sulphide and neutraliser mineral percentages, likely due to the subjective nature of manual estimates of mineral content. The automated approach

  6. Fourier transform inequalities for phylogenetic trees.

    PubMed

    Matsen, Frederick A

    2009-01-01

    Phylogenetic invariants are not the only constraints on site-pattern frequency vectors for phylogenetic trees. A mutation matrix, by its definition, is the exponential of a matrix with non-negative off-diagonal entries; this positivity requirement implies non-trivial constraints on the site-pattern frequency vectors. We call these additional constraints "edge-parameter inequalities". In this paper, we first motivate the edge-parameter inequalities by considering a pathological site-pattern frequency vector corresponding to a quartet tree with a negative internal edge. This site-pattern frequency vector nevertheless satisfies all of the constraints described up to now in the literature. We next describe two complete sets of edge-parameter inequalities for the group-based models; these constraints are square-free monomial inequalities in the Fourier transformed coordinates. These inequalities, along with the phylogenetic invariants, form a complete description of the set of site-pattern frequency vectors corresponding to bona fide trees. Said in mathematical language, this paper explicitly presents two finite lists of inequalities in Fourier coordinates of the form "monomial < or = 1", each list characterizing the phylogenetically relevant semialgebraic subsets of the phylogenetic varieties.

  7. Phylogenetic search through partial tree mixing

    PubMed Central

    2012-01-01

    Background Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. Results When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda Conclusions The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution. PMID:23320449

  8. Phylogenetic Analysis of Mitochondrial Outer Membrane β-Barrel Channels

    PubMed Central

    Wojtkowska, Małgorzata; Jąkalski, Marcin; Pieńkowska, Joanna R.; Stobienia, Olgierd; Karachitos, Andonis; Przytycka, Teresa M.; Weiner, January; Kmita, Hanna; Makałowski, Wojciech

    2012-01-01

    Transport of molecules across mitochondrial outer membrane is pivotal for a proper function of mitochondria. The transport pathways across the membrane are formed by ion channels that participate in metabolite exchange between mitochondria and cytoplasm (voltage-dependent anion-selective channel, VDAC) as well as in import of proteins encoded by nuclear genes (Tom40 and Sam50/Tob55). VDAC, Tom40, and Sam50/Tob55 are present in all eukaryotic organisms, encoded in the nuclear genome, and have β-barrel topology. We have compiled data sets of these protein sequences and studied their phylogenetic relationships with a special focus on the position of Amoebozoa. Additionally, we identified these protein-coding genes in Acanthamoeba castellanii and Dictyostelium discoideum to complement our data set and verify the phylogenetic position of these model organisms. Our analysis show that mitochondrial β-barrel channels from Archaeplastida (plants) and Opisthokonta (animals and fungi) experienced many duplication events that resulted in multiple paralogous isoforms and form well-defined monophyletic clades that match the current model of eukaryotic evolution. However, in representatives of Amoebozoa, Chromalveolata, and Excavata (former Protista), they do not form clearly distinguishable clades, although they locate basally to the plant and algae branches. In most cases, they do not posses paralogs and their sequences appear to have evolved quickly or degenerated. Consequently, the obtained phylogenies of mitochondrial outer membrane β-channels do not entirely reflect the recent eukaryotic classification system involving the six supergroups: Chromalveolata, Excavata, Archaeplastida, Rhizaria, Amoebozoa, and Opisthokonta. PMID:22155732

  9. How does cognition evolve? Phylogenetic comparative psychology

    PubMed Central

    Matthews, Luke J.; Hare, Brian A.; Nunn, Charles L.; Anderson, Rindy C.; Aureli, Filippo; Brannon, Elizabeth M.; Call, Josep; Drea, Christine M.; Emery, Nathan J.; Haun, Daniel B. M.; Herrmann, Esther; Jacobs, Lucia F.; Platt, Michael L.; Rosati, Alexandra G.; Sandel, Aaron A.; Schroepfer, Kara K.; Seed, Amanda M.; Tan, Jingzhi; van Schaik, Carel P.; Wobber, Victoria

    2014-01-01

    Now more than ever animal studies have the potential to test hypotheses regarding how cognition evolves. Comparative psychologists have developed new techniques to probe the cognitive mechanisms underlying animal behavior, and they have become increasingly skillful at adapting methodologies to test multiple species. Meanwhile, evolutionary biologists have generated quantitative approaches to investigate the phylogenetic distribution and function of phenotypic traits, including cognition. In particular, phylogenetic methods can quantitatively (1) test whether specific cognitive abilities are correlated with life history (e.g., lifespan), morphology (e.g., brain size), or socio-ecological variables (e.g., social system), (2) measure how strongly phylogenetic relatedness predicts the distribution of cognitive skills across species, and (3) estimate the ancestral state of a given cognitive trait using measures of cognitive performance from extant species. Phylogenetic methods can also be used to guide the selection of species comparisons that offer the strongest tests of a priori predictions of cognitive evolutionary hypotheses (i.e., phylogenetic targeting). Here, we explain how an integration of comparative psychology and evolutionary biology will answer a host of questions regarding the phylogenetic distribution and history of cognitive traits, as well as the evolutionary processes that drove their evolution. PMID:21927850

  10. Phylogenetic diversity measures based on Hill numbers.

    PubMed

    Chao, Anne; Chiu, Chun-Huo; Jost, Lou

    2010-11-27

    We propose a parametric class of phylogenetic diversity (PD) measures that are sensitive to both species abundance and species taxonomic or phylogenetic distances. This work extends the conventional parametric species-neutral approach (based on 'effective number of species' or Hill numbers) to take into account species relatedness, and also generalizes the traditional phylogenetic approach (based on 'total phylogenetic length') to incorporate species abundances. The proposed measure quantifies 'the mean effective number of species' over any time interval of interest, or the 'effective number of maximally distinct lineages' over that time interval. The product of the measure and the interval length quantifies the 'branch diversity' of the phylogenetic tree during that interval. The new measures generalize and unify many existing measures and lead to a natural definition of taxonomic diversity as a special case. The replication principle (or doubling property), an important requirement for species-neutral diversity, is generalized to PD. The widely used Rao's quadratic entropy and the phylogenetic entropy do not satisfy this essential property, but a simple transformation converts each to our measures, which do satisfy the property. The proposed approach is applied to forest data for interpreting the effects of thinning.

  11. How does cognition evolve? Phylogenetic comparative psychology.

    PubMed

    MacLean, Evan L; Matthews, Luke J; Hare, Brian A; Nunn, Charles L; Anderson, Rindy C; Aureli, Filippo; Brannon, Elizabeth M; Call, Josep; Drea, Christine M; Emery, Nathan J; Haun, Daniel B M; Herrmann, Esther; Jacobs, Lucia F; Platt, Michael L; Rosati, Alexandra G; Sandel, Aaron A; Schroepfer, Kara K; Seed, Amanda M; Tan, Jingzhi; van Schaik, Carel P; Wobber, Victoria

    2012-03-01

    Now more than ever animal studies have the potential to test hypotheses regarding how cognition evolves. Comparative psychologists have developed new techniques to probe the cognitive mechanisms underlying animal behavior, and they have become increasingly skillful at adapting methodologies to test multiple species. Meanwhile, evolutionary biologists have generated quantitative approaches to investigate the phylogenetic distribution and function of phenotypic traits, including cognition. In particular, phylogenetic methods can quantitatively (1) test whether specific cognitive abilities are correlated with life history (e.g., lifespan), morphology (e.g., brain size), or socio-ecological variables (e.g., social system), (2) measure how strongly phylogenetic relatedness predicts the distribution of cognitive skills across species, and (3) estimate the ancestral state of a given cognitive trait using measures of cognitive performance from extant species. Phylogenetic methods can also be used to guide the selection of species comparisons that offer the strongest tests of a priori predictions of cognitive evolutionary hypotheses (i.e., phylogenetic targeting). Here, we explain how an integration of comparative psychology and evolutionary biology will answer a host of questions regarding the phylogenetic distribution and history of cognitive traits, as well as the evolutionary processes that drove their evolution.

  12. Changing Patient Classification System for Hospital Reimbursement in Romania

    PubMed Central

    Radu, Ciprian-Paul; Chiriac, Delia Nona; Vladescu, Cristian

    2010-01-01

    Aim To evaluate the effects of the change in the diagnosis-related group (DRG) system on patient morbidity and hospital financial performance in the Romanian public health care system. Methods Three variables were assessed before and after the classification switch in July 2007: clinical outcomes, the case mix index, and hospital budgets, using the database of the National School of Public Health and Health Services Management, which contains data regularly received from hospitals reimbursed through the Romanian DRG scheme (291 in 2009). Results The lack of a Romanian system for the calculation of cost-weights imposed the necessity to use an imported system, which was criticized by some clinicians for not accurately reflecting resource consumption in Romanian hospitals. The new DRG classification system allowed a more accurate clinical classification. However, it also exposed a lack of physicians’ knowledge on diagnosing and coding procedures, which led to incorrect coding. Consequently, the reported hospital morbidity changed after the DRG switch, reflecting an increase in the national case mix index of 25% in 2009 (compared with 2007). Since hospitals received the same reimbursement over the first two years after the classification switch, the new DRG system led them sometimes to change patients' diagnoses in order to receive more funding. Conclusion Lack of oversight of hospital coding and reporting to the national reimbursement scheme allowed the increase in the case mix index. The complexity of the new classification system requires more resources (human and financial), better monitoring and evaluation, and improved legislation in order to achieve better hospital resource allocation and more efficient patient care. PMID:20564769

  13. Changing patient classification system for hospital reimbursement in Romania.

    PubMed

    Radu, Ciprian-Paul; Chiriac, Delia Nona; Vladescu, Cristian

    2010-06-01

    To evaluate the effects of the change in the diagnosis-related group (DRG) system on patient morbidity and hospital financial performance in the Romanian public health care system. Three variables were assessed before and after the classification switch in July 2007: clinical outcomes, the case mix index, and hospital budgets, using the database of the National School of Public Health and Health Services Management, which contains data regularly received from hospitals reimbursed through the Romanian DRG scheme (291 in 2009). The lack of a Romanian system for the calculation of cost-weights imposed the necessity to use an imported system, which was criticized by some clinicians for not accurately reflecting resource consumption in Romanian hospitals. The new DRG classification system allowed a more accurate clinical classification. However, it also exposed a lack of physicians' knowledge on diagnosing and coding procedures, which led to incorrect coding. Consequently, the reported hospital morbidity changed after the DRG switch, reflecting an increase in the national case-mix index of 25% in 2009 (compared with 2007). Since hospitals received the same reimbursement over the first two years after the classification switch, the new DRG system led them sometimes to change patients' diagnoses in order to receive more funding. Lack of oversight of hospital coding and reporting to the national reimbursement scheme allowed the increase in the case-mix index. The complexity of the new classification system requires more resources (human and financial), better monitoring and evaluation, and improved legislation in order to achieve better hospital resource allocation and more efficient patient care.

  14. A phylogeny and revised classification of Squamata, including 4161 species of lizards and snakes

    PubMed Central

    2013-01-01

    Background The extant squamates (>9400 known species of lizards and snakes) are one of the most diverse and conspicuous radiations of terrestrial vertebrates, but no studies have attempted to reconstruct a phylogeny for the group with large-scale taxon sampling. Such an estimate is invaluable for comparative evolutionary studies, and to address their classification. Here, we present the first large-scale phylogenetic estimate for Squamata. Results The estimated phylogeny contains 4161 species, representing all currently recognized families and subfamilies. The analysis is based on up to 12896 base pairs of sequence data per species (average = 2497 bp) from 12 genes, including seven nuclear loci (BDNF, c-mos, NT3, PDC, R35, RAG-1, and RAG-2), and five mitochondrial genes (12S, 16S, cytochrome b, ND2, and ND4). The tree provides important confirmation for recent estimates of higher-level squamate phylogeny based on molecular data (but with more limited taxon sampling), estimates that are very different from previous morphology-based hypotheses. The tree also includes many relationships that differ from previous molecular estimates and many that differ from traditional taxonomy. Conclusions We present a new large-scale phylogeny of squamate reptiles that should be a valuable resource for future comparative studies. We also present a revised classification of squamates at the family and subfamily level to bring the taxonomy more in line with the new phylogenetic hypothesis. This classification includes new, resurrected, and modified subfamilies within gymnophthalmid and scincid lizards, and boid, colubrid, and lamprophiid snakes. PMID:23627680

  15. IRIS COLOUR CLASSIFICATION SCALES--THEN AND NOW.

    PubMed

    Grigore, Mariana; Avram, Alina

    2015-01-01

    Eye colour is one of the most obvious phenotypic traits of an individual. Since the first documented classification scale developed in 1843, there have been numerous attempts to classify the iris colour. In the past centuries, iris colour classification scales has had various colour categories and mostly relied on comparison of an individual's eye with painted glass eyes. Once photography techniques were refined, standard iris photographs replaced painted eyes, but this did not solve the problem of painted/ printed colour variability in time. Early clinical scales were easy to use, but lacked objectivity and were not standardised or statistically tested for reproducibility. The era of automated iris colour classification systems came with the technological development. Spectrophotometry, digital analysis of high-resolution iris images, hyper spectral analysis of the human real iris and the dedicated iris colour analysis software, all accomplished an objective, accurate iris colour classification, but are quite expensive and limited in use to research environment. Iris colour classification systems evolved continuously due to their use in a wide range of studies, especially in the fields of anthropology, epidemiology and genetics. Despite the wide range of the existing scales, up until present there has been no generally accepted iris colour classification scale.

  16. Phylogenetic relationships, character evolution, and taxonomic implications within the slipper lobsters (Crustacea: Decapoda: Scyllaridae).

    PubMed

    Yang, Chien-Hui; Bracken-Grissom, Heather; Kim, Dohyup; Crandall, Keith A; Chan, Tin-Yam

    2012-01-01

    The slipper lobsters belong to the family Scyllaridae which contains a total of 20 genera and 89 species distributed across four subfamilies (Arctidinae, Ibacinae, Scyllarinae, and Theninae). We have collected nucleotide sequence data from regions of five different genes (16S, 18S, COI, 28S, H3) to estimate phylogenetic relationships among 54 species from the Scyllaridae with a focus on the species rich subfamily Scyllarinae. We have included in our analyses at least one representative from all 20 genera in the Scyllaridae and 35 of the 52 species within the Scyllarinae. Our resulting phylogenetic estimate shows the subfamilies are monophyletic, except for Ibacinae, which has paraphyletic relationships among genera. Many of the genera within the Scyllarinae form non-monophyletic groups, while the genera from all other subfamilies form well supported clades. We discuss the implications of this history on the evolution of morphological characters and ecological transitions (nearshore vs. offshore) within the slipper lobsters. Finally, we identify, through ancestral state character reconstructions, key morphological features diagnostic of the major clades of diversity within the Scyllaridae and relate this character evolution to current taxonomy and classification. Copyright © 2011 Elsevier Inc. All rights reserved.

  17. Nuclear and cpDNA sequences combined provide strong inference of higher phylogenetic relationships in the phlox family (Polemoniaceae).

    PubMed

    Johnson, Leigh A; Chan, Lauren M; Weese, Terri L; Busby, Lisa D; McMurry, Samuel

    2008-09-01

    Members of the phlox family (Polemoniaceae) serve as useful models for studying various evolutionary and biological processes. Despite its biological importance, no family-wide phylogenetic estimate based on multiple DNA regions with complete generic sampling is available. Here, we analyze one nuclear and five chloroplast DNA sequence regions (nuclear ITS, chloroplast matK, trnL intron plus trnL-trnF intergeneric spacer, and the trnS-trnG, trnD-trnT, and psbM-trnD intergenic spacers) using parsimony and Bayesian methods, as well as assessments of congruence and long branch attraction, to explore phylogenetic relationships among 84 ingroup species representing all currently recognized Polemoniaceae genera. Relationships inferred from the ITS and concatenated chloroplast regions are similar overall. A combined analysis provides strong support for the monophyly of Polemoniaceae and subfamilies Acanthogilioideae, Cobaeoideae, and Polemonioideae. Relationships among subfamilies, and thus for the precise root of Polemoniaceae, remain poorly supported. Within the largest subfamily, Polemonioideae, four clades corresponding to tribes Polemonieae, Phlocideae, Gilieae, and Loeselieae receive strong support. The monogeneric Polemonieae appears sister to Phlocideae. Relationships within Polemonieae, Phlocideae, and Gilieae are mostly consistent between analyses and data permutations. Many relationships within Loeselieae remain uncertain. Overall, inferred phylogenetic relationships support a higher-level classification for Polemoniaceae proposed in 2000.

  18. Overview of classification systems in peripheral artery disease.

    PubMed

    Hardman, Rulon L; Jazaeri, Omid; Yi, J; Smith, M; Gupta, Rajan

    2014-12-01

    Peripheral artery disease (PAD), secondary to atherosclerotic disease, is currently the leading cause of morbidity and mortality in the western world. While PAD is common, it is estimated that the majority of patients with PAD are undiagnosed and undertreated. The challenge to the treatment of PAD is to accurately diagnose the symptoms and determine treatment for each patient. The varied presentations of peripheral vascular disease have led to numerous classification schemes throughout the literature. Consistent grading of patients leads to both objective criteria for treating patients and a baseline for clinical follow-up. Reproducible classification systems are also important in clinical trials and when comparing medical, surgical, and endovascular treatment paradigms. This article reviews the various classification systems for PAD and advantages to each system.

  19. Comparative evolutionary diversity and phylogenetic structure across multiple forest dynamics plots: a mega-phylogeny approach

    PubMed Central

    Erickson, David L.; Jones, Frank A.; Swenson, Nathan G.; Pei, Nancai; Bourg, Norman A.; Chen, Wenna; Davies, Stuart J.; Ge, Xue-jun; Hao, Zhanqing; Howe, Robert W.; Huang, Chun-Lin; Larson, Andrew J.; Lum, Shawn K. Y.; Lutz, James A.; Ma, Keping; Meegaskumbura, Madhava; Mi, Xiangcheng; Parker, John D.; Fang-Sun, I.; Wright, S. Joseph; Wolf, Amy T.; Ye, W.; Xing, Dingliang; Zimmerman, Jess K.; Kress, W. John

    2014-01-01

    Forest dynamics plots, which now span longitudes, latitudes, and habitat types across the globe, offer unparalleled insights into the ecological and evolutionary processes that determine how species are assembled into communities. Understanding phylogenetic relationships among species in a community has become an important component of assessing assembly processes. However, the application of evolutionary information to questions in community ecology has been limited in large part by the lack of accurate estimates of phylogenetic relationships among individual species found within communities, and is particularly limiting in comparisons between communities. Therefore, streamlining and maximizing the information content of these community phylogenies is a priority. To test the viability and advantage of a multi-community phylogeny, we constructed a multi-plot mega-phylogeny of 1347 species of trees across 15 forest dynamics plots in the ForestGEO network using DNA barcode sequence data (rbcL, matK, and psbA-trnH) and compared community phylogenies for each individual plot with respect to support for topology and branch lengths, which affect evolutionary inference of community processes. The levels of taxonomic differentiation across the phylogeny were examined by quantifying the frequency of resolved nodes throughout. In addition, three phylogenetic distance (PD) metrics that are commonly used to infer assembly processes were estimated for each plot [PD, Mean Phylogenetic Distance (MPD), and Mean Nearest Taxon Distance (MNTD)]. Lastly, we examine the partitioning of phylogenetic diversity among community plots through quantification of inter-community MPD and MNTD. Overall, evolutionary relationships were highly resolved across the DNA barcode-based mega-phylogeny, and phylogenetic resolution for each community plot was improved when estimated within the context of the mega-phylogeny. Likewise, when compared with phylogenies for individual plots, estimates of

  20. Phylogenetic relationships of Malaysia’s long-tailed macaques, Macaca fascicularis, based on cytochrome b sequences

    PubMed Central

    Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir

    2014-01-01

    Abstract Phylogenetic relationships among Malaysia’s long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo’s population was distinguished from Peninsula’s population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia’s M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations

  1. Molecular Phylogeny of the Cliff Ferns (Woodsiaceae: Polypodiales) with a Proposed Infrageneric Classification

    PubMed Central

    Zhang, Xianchun; Xiang, Qiaoping

    2015-01-01

    The cliff fern family Woodsiaceae has experienced frequent taxonomic changes at the familial and generic ranks since its establishment. The bulk of its species were placed in Woodsia, while Cheilanthopsis, Hymenocystis, Physematium, and Protowoodsia are segregates recognized by some authors. Phylogenetic relationships among the genera of Woodsiaceae remain unclear because of the extreme morphological diversity and inadequate taxon sampling in phylogenetic studies to date. In this study, we carry out comprehensive phylogenetic analyses of Woodsiaceae using molecular evidence from four chloroplast DNA markers (atpA, matK, rbcL and trnL–F) and covering over half the currently recognized species. Our results show three main clades in Woodsiaceae corresponding to Physematium (clade I), Cheilanthopsis–Protowoodsia (clade II) and Woodsia s.s. (clade III). In the interest of preserving monophyly and taxonomic stability, a broadly defined Woodsia including the other segregates is proposed, which is characterized by the distinctive indument and inferior indusia. Therefore, we present a new subgeneric classification of the redefined Woodsia based on phylogenetic and ancestral state reconstructions to better reflect the morphological variation, geographic distribution pattern, and evolutionary history of the genus. Our analyses of the cytological character evolution support multiple aneuploidy events that have resulted in the reduction of chromosome base number from 41 to 33, 37, 38, 39 and 40 during the evolutionary history of the cliff ferns. PMID:26348852

  2. Identification of an Efficient Gene Expression Panel for Glioblastoma Classification

    PubMed Central

    Zelaya, Ivette; Laks, Dan R.; Zhao, Yining; Kawaguchi, Riki; Gao, Fuying; Kornblum, Harley I.; Coppola, Giovanni

    2016-01-01

    We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu. PMID:27855170

  3. Molecular classification based on apomorphic amino acids (Arthropoda, Hexapoda): Integrative taxonomy in the era of phylogenomics.

    PubMed

    Wu, Hao-Yang; Wang, Yan-Hui; Xie, Qiang; Ke, Yun-Ling; Bu, Wen-Jun

    2016-06-17

    With the great development of sequencing technologies and systematic methods, our understanding of evolutionary relationships at deeper levels within the tree of life has greatly improved over the last decade. However, the current taxonomic methodology is insufficient to describe the growing levels of diversity in both a standardised and general way due to the limitations of using only morphological traits to describe clades. Herein, we propose the idea of a molecular classification based on hierarchical and discrete amino acid characters. Clades are classified based on the results of phylogenetic analyses and described using amino acids with group specificity in phylograms. Practices based on the recently published phylogenomic datasets of insects together with 15 de novo sequenced transcriptomes in this study demonstrate that such a methodology can accommodate various higher ranks of taxonomy. Such an approach has the advantage of describing organisms in a standard and discrete way within a phylogenetic framework, thereby facilitating the recognition of clades from the view of the whole lineage, as indicated by PhyloCode. By combining identification keys and phylogenies, the molecular classification based on hierarchical and discrete characters may greatly boost the progress of integrative taxonomy.

  4. Molecular classification based on apomorphic amino acids (Arthropoda, Hexapoda): Integrative taxonomy in the era of phylogenomics

    PubMed Central

    Wu, Hao-Yang; Wang, Yan-Hui; Xie, Qiang; Ke, Yun-Ling; Bu, Wen-Jun

    2016-01-01

    With the great development of sequencing technologies and systematic methods, our understanding of evolutionary relationships at deeper levels within the tree of life has greatly improved over the last decade. However, the current taxonomic methodology is insufficient to describe the growing levels of diversity in both a standardised and general way due to the limitations of using only morphological traits to describe clades. Herein, we propose the idea of a molecular classification based on hierarchical and discrete amino acid characters. Clades are classified based on the results of phylogenetic analyses and described using amino acids with group specificity in phylograms. Practices based on the recently published phylogenomic datasets of insects together with 15 de novo sequenced transcriptomes in this study demonstrate that such a methodology can accommodate various higher ranks of taxonomy. Such an approach has the advantage of describing organisms in a standard and discrete way within a phylogenetic framework, thereby facilitating the recognition of clades from the view of the whole lineage, as indicated by PhyloCode. By combining identification keys and phylogenies, the molecular classification based on hierarchical and discrete characters may greatly boost the progress of integrative taxonomy. PMID:27312960

  5. Hydrologic Landscape Regionalisation Using Deductive Classification and Random Forests

    PubMed Central

    Brown, Stuart C.; Lester, Rebecca E.; Versace, Vincent L.; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of

  6. Hydrologic landscape regionalisation using deductive classification and random forests.

    PubMed

    Brown, Stuart C; Lester, Rebecca E; Versace, Vincent L; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of

  7. How reliable and accurate is the AO/OTA comprehensive classification for adult long-bone fractures?

    PubMed

    Meling, Terje; Harboe, Knut; Enoksen, Cathrine H; Aarflot, Morten; Arthursson, Astvaldur J; Søreide, Kjetil

    2012-07-01

    Reliable classification of fractures is important for treatment allocation and study comparisons. The overall accuracy of scoring applied to a general population of fractures is little known. This study aimed to investigate the accuracy and reliability of the comprehensive Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association classification for adult long-bone fractures and identify factors associated with poor coding agreement. Adults (>16 years) with long-bone fractures coded in a Fracture and Dislocation Registry at the Stavanger University Hospital during the fiscal year 2008 were included. An unblinded reference code dataset was generated for the overall accuracy assessment by two experienced orthopedic trauma surgeons. Blinded analysis of intrarater reliability was performed by rescoring and of interrater reliability by recoding of a randomly selected fracture sample. Proportion of agreement (PA) and kappa (κ) statistics are presented. Uni- and multivariate logistic regression analyses of factors predicting accuracy were performed. During the study period, 949 fractures were included and coded by 26 surgeons. For the intrarater analysis, overall agreements were κ = 0.67 (95% confidence interval [CI]: 0.64-0.70) and PA 69%. For interrater assessment, κ = 0.67 (95% CI: 0.62-0.72) and PA 69%. The accuracy of surgeons' blinded recoding was κ = 0.68 (95% CI: 0.65- 0.71) and PA 68%. Fracture type, frequency of the fracture, and segment fractured significantly influenced accuracy whereas the coder's experience did not. Both the reliability and accuracy of the comprehensive Arbeitsgemeinschaft für Osteosynthesefragen/Orthopedic Trauma Association classification for long-bone fractures ranged from substantial to excellent. Variations in coding accuracy seem to be related more to the fracture itself than the surgeon. Diagnostic study, level I.

  8. Molecular evolution of ependymin and the phylogenetic resolution of early divergences among euteleost fishes.

    PubMed

    Ortí, G; Meyer, A

    1996-04-01

    The rate and pattern of DNA evolution of ependymin, a single-copy gene coding for a highly expressed glycoprotein in the brain matrix of teleost fishes, is characterized and its phylogenetic utility for fish systematics is assessed. DNA sequences were determined from catfish, electric fish, and characiforms and compared with published ependymin sequences from cyprinids, salmon, pike, and herring. Among these groups, ependymin amino acid sequences were highly divergent (up to 60% sequence difference), but had surprisingly similar hydropathy profiles and invariant glycosylation sites, suggesting that functional properties of the proteins are conserved. Comparison of base composition at third codon positions and introns revealed AT-rich introns and GC-rich third codon positions, suggesting that the biased codon usage observed might not be due to mutational bias. Phylogenetic information content of third codon positions was surprisingly high and sufficient to recover the most basal nodes of the tree, in spite of the observation that pairwise distances (at third codon positions) were well above the presumed saturation level. This finding can be explained by the high proportion of phylogenetically informative nonsynonymous changes at third codon positions among these highly divergent proteins. Ependymin DNA sequences have established the first molecular evidence for the monophyly of a group containing salmonids and esociforms. In addition, ependymin suggests a sister group relationship of electric fish (Gymnotiformes) and Characiformes, constituting a significant departure from currently accepted classifications. However, relationships among characiform lineages were not completely resolved by ependymin sequences in spite of seemingly appropriate levels of variation among taxa and considerably low levels of homoplasy in the data (consistency index = 0.7). If the diversification of Characiformes took place in an "explosive" manner, over a relatively short period of time

  9. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less

  10. Photometric Supernova Classification with Machine Learning

    NASA Astrophysics Data System (ADS)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  11. [Research progress in molecular classification of gastric cancer].

    PubMed

    Zhou, Menglong; Li, Guichao; Zhang, Zhen

    2016-09-25

    Gastric cancer(GC) is a highly heterogeneous malignancy. The present widely used histopathological classifications have gradually failed to meet the needs of individualized diagnosis and treatment. Development of technologies such as microarray and next-generation sequencing (NGS) has allowed GC to be studied at the molecular level. Mechanisms about tumorigenesis and progression of GC can be elucidated in the aspects of gene mutations, chromosomal alterations, transcriptional and epigenetic changes, on the basis of which GC can be divided into several subtypes. The classifications of Tan's, Lei's, TCGA and ACRG are relatively comprehensive. Especially the TCGA and ACRG classifications have large sample size and abundant molecular profiling data, thus, the genomic characteristics of GC can be depicted more accurately. However, significant differences between both classifications still exist so that they cannot be substituted for each other. So far there is no widely accepted molecular classification of GC. Compared with TCGA classification, ACRG system may have more clinical significance in Chinese GC patients since the samples are mostly from Asian population and show better association with prognosis. The molecular classification of GC may provide the theoretical and experimental basis for early diagnosis, therapeutic efficacy prediction and treatment stratification while their clinical application is still limited. Future work should involve the application of molecular classifications in the clinical settings for improving the medical management of GC.

  12. Iris Image Classification Based on Hierarchical Visual Codebook.

    PubMed

    Zhenan Sun; Hui Zhang; Tieniu Tan; Jianyu Wang

    2014-06-01

    Iris recognition as a reliable method for personal identification has been well-studied with the objective to assign the class label of each iris image to a unique subject. In contrast, iris image classification aims to classify an iris image to an application specific category, e.g., iris liveness detection (classification of genuine and fake iris images), race classification (e.g., classification of iris images of Asian and non-Asian subjects), coarse-to-fine iris identification (classification of all iris images in the central database into multiple categories). This paper proposes a general framework for iris image classification based on texture analysis. A novel texture pattern representation method called Hierarchical Visual Codebook (HVC) is proposed to encode the texture primitives of iris images. The proposed HVC method is an integration of two existing Bag-of-Words models, namely Vocabulary Tree (VT), and Locality-constrained Linear Coding (LLC). The HVC adopts a coarse-to-fine visual coding strategy and takes advantages of both VT and LLC for accurate and sparse representation of iris texture. Extensive experimental results demonstrate that the proposed iris image classification method achieves state-of-the-art performance for iris liveness detection, race classification, and coarse-to-fine iris identification. A comprehensive fake iris image database simulating four types of iris spoof attacks is developed as the benchmark for research of iris liveness detection.

  13. California desert resource inventory using multispectral classification of digitally mosaicked Landsat frames

    NASA Technical Reports Server (NTRS)

    Bryant, N. A.; Mcleod, R. G.; Zobrist, A. L.; Johnson, H. B.

    1979-01-01

    Procedures for adjustment of brightness values between frames and the digital mosaicking of Landsat frames to standard map projections are developed for providing a continuous data base for multispectral thematic classification. A combination of local terrain variations in the Californian deserts and a global sampling strategy based on transects provided the framework for accurate classification throughout the entire geographic region.

  14. Centrifuge: rapid and sensitive classification of metagenomic sequences

    PubMed Central

    Song, Li; Breitwieser, Florian P.

    2016-01-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. PMID:27852649

  15. Scanning electron microscope automatic defect classification of process induced defects

    NASA Astrophysics Data System (ADS)

    Wolfe, Scott; McGarvey, Steve

    2017-03-01

    With the integration of high speed Scanning Electron Microscope (SEM) based Automated Defect Redetection (ADR) in both high volume semiconductor manufacturing and Research and Development (R and D), the need for reliable SEM Automated Defect Classification (ADC) has grown tremendously in the past few years. In many high volume manufacturing facilities and R and D operations, defect inspection is performed on EBeam (EB), Bright Field (BF) or Dark Field (DF) defect inspection equipment. A comma separated value (CSV) file is created by both the patterned and non-patterned defect inspection tools. The defect inspection result file contains a list of the inspection anomalies detected during the inspection tools' examination of each structure, or the examination of an entire wafers surface for non-patterned applications. This file is imported into the Defect Review Scanning Electron Microscope (DRSEM). Following the defect inspection result file import, the DRSEM automatically moves the wafer to each defect coordinate and performs ADR. During ADR the DRSEM operates in a reference mode, capturing a SEM image at the exact position of the anomalies coordinates and capturing a SEM image of a reference location in the center of the wafer. A Defect reference image is created based on the Reference image minus the Defect image. The exact coordinates of the defect is calculated based on the calculated defect position and the anomalies stage coordinate calculated when the high magnification SEM defect image is captured. The captured SEM image is processed through either DRSEM ADC binning, exporting to a Yield Analysis System (YAS), or a combination of both. Process Engineers, Yield Analysis Engineers or Failure Analysis Engineers will manually review the captured images to insure that either the YAS defect binning is accurately classifying the defects or that the DRSEM defect binning is accurately classifying the defects. This paper is an exploration of the feasibility of the

  16. Effective Feature Selection for Classification of Promoter Sequences.

    PubMed

    K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

    2016-01-01

    Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  17. When proglottids and scoleces conflict: phylogenetic relationships and a family-level classification of the Lecanicephalidea (Platyhelminthes: Cestoda).

    PubMed

    Jensen, Kirsten; Caira, Janine N; Cielocha, Joanna J; Littlewood, D Timothy J; Waeschenbach, Andrea

    2016-05-01

    This study presents the first comprehensive phylogenetic analysis of the interrelationships of the morphologically diverse elasmobranch-hosted tapeworm order Lecanicephalidea, based on molecular sequence data. With almost half of current generic diversity having been erected or resurrected within the last decade, an apparent conflict between scolex morphology and proglottid anatomy has hampered the assignment of many of these genera to families. Maximum likelihood and Bayesian analyses of two nuclear markers (D1-D3 of lsrDNA and complete ssrDNA) and two mitochondrial markers (partial rrnL and partial cox1) for 61 lecanicephalidean species representing 22 of the 25 valid genera were conducted; new sequence data were generated for 43 species and 11 genera, including three undescribed genera. The monophyly of the order was confirmed in all but the analyses based on cox1 data alone. Sesquipedalapex placed among species of Anteropora and was thus synonymized with the latter genus. Based on analyses of the concatenated dataset, eight major groups emerged which are herein formally recognised at the familial level. Existing family names (i.e., Lecanicephalidae, Polypocephalidae, Tetragonocephalidae, and Cephalobothriidae) are maintained for four of the eight clades, and new families are proposed for the remaining four groups (Aberrapecidae n. fam., Eniochobothriidae n. fam., Paraberrapecidae n. fam., and Zanobatocestidae n. fam.). The four new families and the Tetragonocephalidae are monogeneric, while the Cephalobothriidae, Lecanicephalidae and Polypocephalidae comprise seven, eight and four genera, respectively. As a result of their unusual morphologies, the three genera not included here (i.e., Corrugatocephalum, Healyum and Quadcuspibothrium) are considered incertae sedis within the order until their familial affinities can be examined in more detail. All eight families are newly circumscribed based on morphological features and a key to the families is provided

  18. Evolution and Classification of Myosins, a Paneukaryotic Whole-Genome Approach

    PubMed Central

    Sebé-Pedrós, Arnau; Grau-Bové, Xavier; Richards, Thomas A.; Ruiz-Trillo, Iñaki

    2014-01-01

    Myosins are key components of the eukaryotic cytoskeleton, providing motility for a broad diversity of cargoes. Therefore, understanding the origin and evolutionary history of myosin classes is crucial to address the evolution of eukaryote cell biology. Here, we revise the classification of myosins using an updated taxon sampling that includes newly or recently sequenced genomes and transcriptomes from key taxa. We performed a survey of eukaryotic genomes and phylogenetic analyses of the myosin gene family, reconstructing the myosin toolkit at different key nodes in the eukaryotic tree of life. We also identified the phylogenetic distribution of myosin diversity in terms of number of genes, associated protein domains and number of classes in each taxa. Our analyses show that new classes (i.e., paralogs) and domain architectures were continuously generated throughout eukaryote evolution, with a significant expansion of myosin abundance and domain architectural diversity at the stem of Holozoa, predating the origin of animal multicellularity. Indeed, single-celled holozoans have the most complex myosin complement among eukaryotes, with paralogs of most myosins previously considered animal specific. We recover a dynamic evolutionary history, with several lineage-specific expansions (e.g., the myosin III-like gene family diversification in choanoflagellates), convergence in protein domain architectures (e.g., fungal and animal chitin synthase myosins), and important secondary losses. Overall, our evolutionary scheme demonstrates that the ancestral eukaryote likely had a complex myosin repertoire that included six genes with different protein domain architectures. Finally, we provide an integrative and robust classification, useful for future genomic and functional studies on this crucial eukaryotic gene family. PMID:24443438

  19. Flying insect detection and classification with inexpensive sensors.

    PubMed

    Chen, Yanping; Why, Adena; Batista, Gustavo; Mafra-Neto, Agenor; Keogh, Eamonn

    2014-10-15

    An inexpensive, noninvasive system that could accurately classify flying insects would have important implications for entomological research, and allow for the development of many useful applications in vector and pest control for both medical and agricultural entomology. Given this, the last sixty years have seen many research efforts devoted to this task. To date, however, none of this research has had a lasting impact. In this work, we show that pseudo-acoustic optical sensors can produce superior data; that additional features, both intrinsic and extrinsic to the insect's flight behavior, can be exploited to improve insect classification; that a Bayesian classification approach allows to efficiently learn classification models that are very robust to over-fitting, and a general classification framework allows to easily incorporate arbitrary number of features. We demonstrate the findings with large-scale experiments that dwarf all previous works combined, as measured by the number of insects and the number of species considered.

  20. Flying Insect Detection and Classification with Inexpensive Sensors

    PubMed Central

    Chen, Yanping; Why, Adena; Batista, Gustavo; Mafra-Neto, Agenor; Keogh, Eamonn

    2014-01-01

    An inexpensive, noninvasive system that could accurately classify flying insects would have important implications for entomological research, and allow for the development of many useful applications in vector and pest control for both medical and agricultural entomology. Given this, the last sixty years have seen many research efforts devoted to this task. To date, however, none of this research has had a lasting impact. In this work, we show that pseudo-acoustic optical sensors can produce superior data; that additional features, both intrinsic and extrinsic to the insect’s flight behavior, can be exploited to improve insect classification; that a Bayesian classification approach allows to efficiently learn classification models that are very robust to over-fitting, and a general classification framework allows to easily incorporate arbitrary number of features. We demonstrate the findings with large-scale experiments that dwarf all previous works combined, as measured by the number of insects and the number of species considered. PMID:25350921

  1. Raster Vs. Point Cloud LiDAR Data Classification

    NASA Astrophysics Data System (ADS)

    El-Ashmawy, N.; Shaker, A.

    2014-09-01

    Airborne Laser Scanning systems with light detection and ranging (LiDAR) technology is one of the fast and accurate 3D point data acquisition techniques. Generating accurate digital terrain and/or surface models (DTM/DSM) is the main application of collecting LiDAR range data. Recently, LiDAR range and intensity data have been used for land cover classification applications. Data range and Intensity, (strength of the backscattered signals measured by the LiDAR systems), are affected by the flying height, the ground elevation, scanning angle and the physical characteristics of the objects surface. These effects may lead to uneven distribution of point cloud or some gaps that may affect the classification process. Researchers have investigated the conversion of LiDAR range point data to raster image for terrain modelling. Interpolation techniques have been used to achieve the best representation of surfaces, and to fill the gaps between the LiDAR footprints. Interpolation methods are also investigated to generate LiDAR range and intensity image data for land cover classification applications. In this paper, different approach has been followed to classifying the LiDAR data (range and intensity) for land cover mapping. The methodology relies on the classification of the point cloud data based on their range and intensity and then converted the classified points into raster image. The gaps in the data are filled based on the classes of the nearest neighbour. Land cover maps are produced using two approaches using: (a) the conventional raster image data based on point interpolation; and (b) the proposed point data classification. A study area covering an urban district in Burnaby, British Colombia, Canada, is selected to compare the results of the two approaches. Five different land cover classes can be distinguished in that area: buildings, roads and parking areas, trees, low vegetation (grass), and bare soil. The results show that an improvement of around 10 % in the

  2. Annotation and Classification of CRISPR-Cas Systems

    PubMed Central

    Makarova, Kira S.; Koonin, Eugene V.

    2018-01-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated proteins) is a prokaryotic adaptive immune system that is represented in most archaea and many bacteria. Among the currently known prokaryotic defense systems, the CRISPR-Cas genomic loci show unprecedented complexity and diversity. Classification of CRISPR-Cas variants that would capture their evolutionary relationships to the maximum possible extent is essential for comparative genomic and functional characterization of this theoretically and practically important system of adaptive immunity. To this end, a multipronged approach has been developed that combines phylogenetic analysis of the conserved Cas proteins with comparison of gene repertoires and arrangements in CRISPR-Cas loci. This approach led to the current classification of CRISPR-Cas systems into three distinct types and ten subtypes for each of which signature genes have been identified. Comparative genomic analysis of the CRISPR-Cas systems in new archaeal and bacterial genomes performed over the 3 years elapsed since the development of this classification makes it clear that new types and subtypes of CRISPR-Cas need to be introduced. Moreover, this classification system captures only part of the complexity of CRISPR-Cas organization and evolution, due to the intrinsic modularity and evolutionary mobility of these immunity systems, resulting in numerous recombinant variants. Moreover, most of the cas genes evolve rapidly, complicating the family assignment for many Cas proteins and the use of family profiles for the recognition of CRISPR-Cas subtype signatures. Further progress in the comparative analysis of CRISPR-Cas systems requires integration of the most sensitive sequence comparison tools, protein structure comparison, and refined approaches for comparison of gene neighborhoods. PMID:25981466

  3. Annotation and Classification of CRISPR-Cas Systems.

    PubMed

    Makarova, Kira S; Koonin, Eugene V

    2015-01-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated proteins) is a prokaryotic adaptive immune system that is represented in most archaea and many bacteria. Among the currently known prokaryotic defense systems, the CRISPR-Cas genomic loci show unprecedented complexity and diversity. Classification of CRISPR-Cas variants that would capture their evolutionary relationships to the maximum possible extent is essential for comparative genomic and functional characterization of this theoretically and practically important system of adaptive immunity. To this end, a multipronged approach has been developed that combines phylogenetic analysis of the conserved Cas proteins with comparison of gene repertoires and arrangements in CRISPR-Cas loci. This approach led to the current classification of CRISPR-Cas systems into three distinct types and ten subtypes for each of which signature genes have been identified. Comparative genomic analysis of the CRISPR-Cas systems in new archaeal and bacterial genomes performed over the 3 years elapsed since the development of this classification makes it clear that new types and subtypes of CRISPR-Cas need to be introduced. Moreover, this classification system captures only part of the complexity of CRISPR-Cas organization and evolution, due to the intrinsic modularity and evolutionary mobility of these immunity systems, resulting in numerous recombinant variants. Moreover, most of the cas genes evolve rapidly, complicating the family assignment for many Cas proteins and the use of family profiles for the recognition of CRISPR-Cas subtype signatures. Further progress in the comparative analysis of CRISPR-Cas systems requires integration of the most sensitive sequence comparison tools, protein structure comparison, and refined approaches for comparison of gene neighborhoods.

  4. Evaluation of the Unified Compensation and Classification Plan.

    ERIC Educational Resources Information Center

    Dade County Public Schools, Miami, FL. Office of Educational Accountability.

    The Unified Classification and Compensation Plan of the Dade County (Florida) Public Schools consists of four interdependent activities that include: (1) developing and maintaining accurate job descriptions, (2) conducting evaluations that recommend job worth and grade, (3) developing and maintaining rates of compensation for job values, and (4)…

  5. Bayesian models for comparative analysis integrating phylogenetic uncertainty.

    PubMed

    de Villemereuil, Pierre; Wells, Jessie A; Edwards, Robert D; Blomberg, Simon P

    2012-06-28

    Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses

  6. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    PubMed Central

    2012-01-01

    Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for

  7. Phylogenetic relationship among East Asian species of the Stegana genus group (Diptera, Drosophilidae).

    PubMed

    Li, Tong; Gao, Jian-jun; Lu, Jin-ming; Ji, Xing-lai; Chen, Hong-wei

    2013-01-01

    The phylogenetic relationship among 27 East Asian species of the Stegana genus group was reconstructed using DNA sequences of mitochondrial (COI and ND2) and nuclear (28S) genes. The results lent support to the current generic/subgeneric taxonomic classification in the genus group with the exceptions of the paraphyly of the genus Parastegana and the subgenus Oxyphortica in the genus Stegana. The ancestral areas and divergence times in the genus group were reconstructed/estimated, and accordingly, the biogeographical history of this important clade was discussed. It was proposed that, the evolution of the plant family Fagaceae, especially Quercus, may have played a certain role in facilitating the diversification of the Stegana genus group. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Open Reading Frame Phylogenetic Analysis on the Cloud

    PubMed Central

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  9. Classification of Aerial Photogrammetric 3d Point Clouds

    NASA Astrophysics Data System (ADS)

    Becker, C.; Häni, N.; Rosinskaya, E.; d'Angelo, E.; Strecha, C.

    2017-05-01

    We present a powerful method to extract per-point semantic class labels from aerial photogrammetry data. Labelling this kind of data is important for tasks such as environmental modelling, object classification and scene understanding. Unlike previous point cloud classification methods that rely exclusively on geometric features, we show that incorporating color information yields a significant increase in accuracy in detecting semantic classes. We test our classification method on three real-world photogrammetry datasets that were generated with Pix4Dmapper Pro, and with varying point densities. We show that off-the-shelf machine learning techniques coupled with our new features allow us to train highly accurate classifiers that generalize well to unseen data, processing point clouds containing 10 million points in less than 3 minutes on a desktop computer.

  10. Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins

    DOE PAGES

    Guo, Hao-Bo; Ma, Yue; Tuskan, Gerald A.; ...

    2018-01-01

    The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here)more » from the protein distribution densities in the LD space defined by ln( L ) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.« less

  11. Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guo, Hao-Bo; Ma, Yue; Tuskan, Gerald A.

    The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here)more » from the protein distribution densities in the LD space defined by ln( L ) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.« less

  12. Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins

    PubMed Central

    Ma, Yue; Tuskan, Gerald A.

    2018-01-01

    The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level. PMID:29686995

  13. Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics.

    PubMed

    Herbei, Radu; Kubatko, Laura

    2013-03-26

    Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.

  14. Comparative genomic analysis and phylogenetic position of Theileria equi

    PubMed Central

    2012-01-01

    role of the EMA family in persistence. T. equi has lost the putative genes for host cell transformation, or the genes were acquired by T. parva and T. annulata after divergence from T. equi. Our analysis identified 50 genes that will be useful for definitive phylogenetic classification of T. equi and closely related organisms. PMID:23137308

  15. Learning classification trees

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1991-01-01

    Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.

  16. Automated Decision Tree Classification of Corneal Shape

    PubMed Central

    Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.

    2011-01-01

    Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification

  17. Ensemble Sparse Classification of Alzheimer’s Disease

    PubMed Central

    Liu, Manhua; Zhang, Daoqiang; Shen, Dinggang

    2012-01-01

    The high-dimensional pattern classification methods, e.g., support vector machines (SVM), have been widely investigated for analysis of structural and functional brain images (such as magnetic resonance imaging (MRI)) to assist the diagnosis of Alzheimer’s disease (AD) including its prodromal stage, i.e., mild cognitive impairment (MCI). Most existing classification methods extract features from neuroimaging data and then construct a single classifier to perform classification. However, due to noise and small sample size of neuroimaging data, it is challenging to train only a global classifier that can be robust enough to achieve good classification performance. In this paper, instead of building a single global classifier, we propose a local patch-based subspace ensemble method which builds multiple individual classifiers based on different subsets of local patches and then combines them for more accurate and robust classification. Specifically, to capture the local spatial consistency, each brain image is partitioned into a number of local patches and a subset of patches is randomly selected from the patch pool to build a weak classifier. Here, the sparse representation-based classification (SRC) method, which has shown effective for classification of image data (e.g., face), is used to construct each weak classifier. Then, multiple weak classifiers are combined to make the final decision. We evaluate our method on 652 subjects (including 198 AD patients, 225 MCI and 229 normal controls) from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using MR images. The experimental results show that our method achieves an accuracy of 90.8% and an area under the ROC curve (AUC) of 94.86% for AD classification and an accuracy of 87.85% and an AUC of 92.90% for MCI classification, respectively, demonstrating a very promising performance of our method compared with the state-of-the-art methods for AD/MCI classification using MR images. PMID:22270352

  18. Object-based forest classification to facilitate landscape-scale conservation in the Mississippi Alluvial Valley

    USGS Publications Warehouse

    Mitchell, Michael; Wilson, R. Randy; Twedt, Daniel J.; Mini, Anne E.; James, J. Dale

    2016-01-01

    The Mississippi Alluvial Valley is a floodplain along the southern extent of the Mississippi River extending from southern Missouri to the Gulf of Mexico. This area once encompassed nearly 10 million ha of floodplain forests, most of which has been converted to agriculture over the past two centuries. Conservation programs in this region revolve around protection of existing forest and reforestation of converted lands. Therefore, an accurate and up to date classification of forest cover is essential for conservation planning, including efforts that prioritize areas for conservation activities. We used object-based image analysis with Random Forest classification to quickly and accurately classify forest cover. We used Landsat band, band ratio, and band index statistics to identify and define similar objects as our training sets instead of selecting individual training points. This provided a single rule-set that was used to classify each of the 11 Landsat 5 Thematic Mapper scenes that encompassed the Mississippi Alluvial Valley. We classified 3,307,910±85,344 ha (32% of this region) as forest. Our overall classification accuracy was 96.9% with Kappa statistic of 0.96. Because this method of forest classification is rapid and accurate, assessment of forest cover can be regularly updated and progress toward forest habitat goals identified in conservation plans can be periodically evaluated.

  19. Effective classification of the prevalence of Schistosoma mansoni.

    PubMed

    Mitchell, Shira A; Pagano, Marcello

    2012-12-01

    To present an effective classification method based on the prevalence of Schistosoma mansoni in the community. We created decision rules (defined by cut-offs for number of positive slides), which account for imperfect sensitivity, both with a simple adjustment of fixed sensitivity and with a more complex adjustment of changing sensitivity with prevalence. To reduce screening costs while maintaining accuracy, we propose a pooled classification method. To estimate sensitivity, we use the De Vlas model for worm and egg distributions. We compare the proposed method with the standard method to investigate differences in efficiency, measured by number of slides read, and accuracy, measured by probability of correct classification. Modelling varying sensitivity lowers the lower cut-off more significantly than the upper cut-off, correctly classifying regions as moderate rather than lower, thus receiving life-saving treatment. The classification method goes directly to classification on the basis of positive pools, avoiding having to know sensitivity to estimate prevalence. For model parameter values describing worm and egg distributions among children, the pooled method with 25 slides achieves an expected 89.9% probability of correct classification, whereas the standard method with 50 slides achieves 88.7%. Among children, it is more efficient and more accurate to use the pooled method for classification of S. mansoni prevalence than the current standard method. © 2012 Blackwell Publishing Ltd.

  20. IRIS COLOUR CLASSIFICATION SCALES – THEN AND NOW

    PubMed Central

    Grigore, Mariana; Avram, Alina

    2015-01-01

    Eye colour is one of the most obvious phenotypic traits of an individual. Since the first documented classification scale developed in 1843, there have been numerous attempts to classify the iris colour. In the past centuries, iris colour classification scales has had various colour categories and mostly relied on comparison of an individual’s eye with painted glass eyes. Once photography techniques were refined, standard iris photographs replaced painted eyes, but this did not solve the problem of painted/ printed colour variability in time. Early clinical scales were easy to use, but lacked objectivity and were not standardised or statistically tested for reproducibility. The era of automated iris colour classification systems came with the technological development. Spectrophotometry, digital analysis of high-resolution iris images, hyper spectral analysis of the human real iris and the dedicated iris colour analysis software, all accomplished an objective, accurate iris colour classification, but are quite expensive and limited in use to research environment. Iris colour classification systems evolved continuously due to their use in a wide range of studies, especially in the fields of anthropology, epidemiology and genetics. Despite the wide range of the existing scales, up until present there has been no generally accepted iris colour classification scale. PMID:27373112

  1. Phylogenetic Properties of RNA Viruses

    PubMed Central

    Pompei, Simone; Loreto, Vittorio; Tria, Francesca

    2012-01-01

    A new word, phylodynamics, was coined to emphasize the interconnection between phylogenetic properties, as observed for instance in a phylogenetic tree, and the epidemic dynamics of viruses, where selection, mediated by the host immune response, and transmission play a crucial role. The challenges faced when investigating the evolution of RNA viruses call for a virtuous loop of data collection, data analysis and modeling. This already resulted both in the collection of massive sequences databases and in the formulation of hypotheses on the main mechanisms driving qualitative differences observed in the (reconstructed) evolutionary patterns of different RNA viruses. Qualitatively, it has been observed that selection driven by the host immune response induces an uneven survival ability among co-existing strains. As a consequence, the imbalance level of the phylogenetic tree is manifestly more pronounced if compared to the case when the interaction with the host immune system does not play a central role in the evolutive dynamics. While many imbalance metrics have been introduced, reliable methods to discriminate in a quantitative way different level of imbalance are still lacking. In our work, we reconstruct and analyze the phylogenetic trees of six RNA viruses, with a special emphasis on the human Influenza A virus, due to its relevance for vaccine preparation as well as for the theoretical challenges it poses due to its peculiar evolutionary dynamics. We focus in particular on topological properties. We point out the limitation featured by standard imbalance metrics, and we introduce a new methodology with which we assign the correct imbalance level of the phylogenetic trees, in agreement with the phylodynamics of the viruses. Our thorough quantitative analysis allows for a deeper understanding of the evolutionary dynamics of the considered RNA viruses, which is crucial in order to provide a valuable framework for a quantitative assessment of theoretical

  2. Visualizing Phylogenetic Treespace Using Cartographic Projections

    NASA Astrophysics Data System (ADS)

    Sundberg, Kenneth; Clement, Mark; Snell, Quinn

    Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger datasets.

  3. Compression-based distance (CBD): a simple, rapid, and accurate method for microbiota composition comparison

    PubMed Central

    2013-01-01

    Background Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons. Results We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention. Conclusion CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets. PMID:23617892

  4. Compression-based distance (CBD): a simple, rapid, and accurate method for microbiota composition comparison.

    PubMed

    Yang, Fang; Chia, Nicholas; White, Bryan A; Schook, Lawrence B

    2013-04-23

    Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons. We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention. CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets.

  5. HHsvm: fast and accurate classification of profile–profile matches identified by HHsearch

    PubMed Central

    Dlakić, Mensur

    2009-01-01

    Motivation: Recently developed profile–profile methods rival structural comparisons in their ability to detect homology between distantly related proteins. Despite this tremendous progress, many genuine relationships between protein families cannot be recognized as comparisons of their profiles result in scores that are statistically insignificant. Results: Using known evolutionary relationships among protein superfamilies in SCOP database, support vector machines were trained on four sets of discriminatory features derived from the output of HHsearch. Upon validation, it was shown that the automatic classification of all profile–profile matches was superior to fixed threshold-based annotation in terms of sensitivity and specificity. The effectiveness of this approach was demonstrated by annotating several domains of unknown function from the Pfam database. Availability: Programs and scripts implementing the methods described in this manuscript are freely available from http://hhsvm.dlakiclab.org/. Contact: mdlakic@montana.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19773335

  6. The research on medical image classification algorithm based on PLSA-BOW model.

    PubMed

    Cao, C H; Cao, H L

    2016-04-29

    With the rapid development of modern medical imaging technology, medical image classification has become more important for medical diagnosis and treatment. To solve the existence of polysemous words and synonyms problem, this study combines the word bag model with PLSA (Probabilistic Latent Semantic Analysis) and proposes the PLSA-BOW (Probabilistic Latent Semantic Analysis-Bag of Words) model. In this paper we introduce the bag of words model in text field to image field, and build the model of visual bag of words model. The method enables the word bag model-based classification method to be further improved in accuracy. The experimental results show that the PLSA-BOW model for medical image classification can lead to a more accurate classification.

  7. Neuromuscular disease classification system

    NASA Astrophysics Data System (ADS)

    Sáez, Aurora; Acha, Begoña; Montero-Sánchez, Adoración; Rivas, Eloy; Escudero, Luis M.; Serrano, Carmen

    2013-06-01

    Diagnosis of neuromuscular diseases is based on subjective visual assessment of biopsies from patients by the pathologist specialist. A system for objective analysis and classification of muscular dystrophies and neurogenic atrophies through muscle biopsy images of fluorescence microscopy is presented. The procedure starts with an accurate segmentation of the muscle fibers using mathematical morphology and a watershed transform. A feature extraction step is carried out in two parts: 24 features that pathologists take into account to diagnose the diseases and 58 structural features that the human eye cannot see, based on the assumption that the biopsy is considered as a graph, where the nodes are represented by each fiber, and two nodes are connected if two fibers are adjacent. A feature selection using sequential forward selection and sequential backward selection methods, a classification using a Fuzzy ARTMAP neural network, and a study of grading the severity are performed on these two sets of features. A database consisting of 91 images was used: 71 images for the training step and 20 as the test. A classification error of 0% was obtained. It is concluded that the addition of features undetectable by the human visual inspection improves the categorization of atrophic patterns.

  8. Phylogenetic studies in Psathyrella focusing on sections Pennatae and Spadiceae--new evidence for the paraphyly of the genus.

    PubMed

    Vasutová, Martina; Antonín, Vladimír; Urban, Alexander

    2008-10-01

    The sections Pennatae and Spadiceae were chosen to test the agreement of current infrageneric classifications of Psathyrella (Psathyrellaceae, Agaricales) with molecular phylogenetic data and to evaluate the systematic significance of relevant morphological characters. The ITS and partial LSU regions of nu-rDNA from 53 specimens representing 34 species of Psathyrella were sequenced and analysed with parsimony-based and model-based phylogenetic methods. According to our analyses, the sections Pennatae and Spadiceae are polyphyletic and distributed across the family Psathyrellaceae, which is divided into at least five major groups. The first one comprises most of the included Psathyrella species and, probably, the whole genus Coprinellus. The second group is made up of Psathyrella gossypina and P. delineata. The third clade consists of the genus Coprinopsis and includes Psathyrella aff. huronensis and P. marcescibilis. The fourth clade is composed of two sister groups, the subgenus Homophron and the genus Lacrymaria, and the fifth group represents the genus Parasola including Psathyrella conopilus. These results are in agreement with neither the current circumscription of the two subgenera, Psathyra and Psathyrella, nor with the pre-sent disposition of the Psathyrellaceae. Taxonomically important morphological characters in the genus Psathyrella show a high degree of homoplasy. Although these characters are useful for species delimitation, and in some cases for the circumscription of sections, they appear insufficient for a phylogenetically correct generic concept.

  9. Comparison of Neural Networks and Tabular Nearest Neighbor Encoding for Hyperspectral Signature Classification in Unresolved Object Detection

    NASA Astrophysics Data System (ADS)

    Schmalz, M.; Ritter, G.; Key, R.

    Accurate and computationally efficient spectral signature classification is a crucial step in the nonimaging detection and recognition of spaceborne objects. In classical hyperspectral recognition applications using linear mixing models, signature classification accuracy depends on accurate spectral endmember discrimination [1]. If the endmembers cannot be classified correctly, then the signatures cannot be classified correctly, and object recognition from hyperspectral data will be inaccurate. In practice, the number of endmembers accurately classified often depends linearly on the number of inputs. This can lead to potentially severe classification errors in the presence of noise or densely interleaved signatures. In this paper, we present an comparison of emerging technologies for nonimaging spectral signature classfication based on a highly accurate, efficient search engine called Tabular Nearest Neighbor Encoding (TNE) [3,4] and a neural network technology called Morphological Neural Networks (MNNs) [5]. Based on prior results, TNE can optimize its classifier performance to track input nonergodicities, as well as yield measures of confidence or caution for evaluation of classification results. Unlike neural networks, TNE does not have a hidden intermediate data structure (e.g., the neural net weight matrix). Instead, TNE generates and exploits a user-accessible data structure called the agreement map (AM), which can be manipulated by Boolean logic operations to effect accurate classifier refinement algorithms. The open architecture and programmability of TNE's agreement map processing allows a TNE programmer or user to determine classification accuracy, as well as characterize in detail the signatures for which TNE did not obtain classification matches, and why such mis-matches occurred. In this study, we will compare TNE and MNN based endmember classification, using performance metrics such as probability of correct classification (Pd) and rate of false

  10. Folding and unfolding phylogenetic trees and networks.

    PubMed

    Huber, Katharina T; Moulton, Vincent; Steel, Mike; Wu, Taoyang

    2016-12-01

    Phylogenetic networks are rooted, labelled directed acyclic graphswhich are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network N can be "unfolded" to obtain a MUL-tree U(N) and, conversely, a MUL-tree T can in certain circumstances be "folded" to obtain aphylogenetic network F(T) that exhibits T. In this paper, we study properties of the operations U and F in more detail. In particular, we introduce the class of stable networks, phylogenetic networks N for which F(U(N)) is isomorphic to N, characterise such networks, and show that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network N can be related to displaying the tree in the MUL-tree U(N). To do this, we develop aphylogenetic analogue of graph fibrations. This allows us to view U(N) as the analogue of the universal cover of a digraph, and to establish a close connection between displaying trees in U(N) and reconciling phylogenetic trees with networks.

  11. BEASTling: A software tool for linguistic phylogenetics using BEAST 2.

    PubMed

    Maurits, Luke; Forkel, Robert; Kaiping, Gereon A; Atkinson, Quentin D

    2017-01-01

    We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.

  12. Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations.

    PubMed

    Fabelo, Himar; Ortega, Samuel; Ravi, Daniele; Kiran, B Ravi; Sosa, Coralia; Bulters, Diederik; Callicó, Gustavo M; Bulstrode, Harry; Szolna, Adam; Piñeiro, Juan F; Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O'Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto

    2018-01-01

    Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising

  13. Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

    PubMed Central

    Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O’Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto

    2018-01-01

    Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising

  14. Worldwide Phylogenetic Relationship of Avian Poxviruses

    PubMed Central

    Foster, Jeffrey T.; Dán, Ádám; Ip, Hon S.; Egstad, Kristina F.; Parker, Patricia G.; Higashiguchi, Jenni M.; Skinner, Michael A.; Höfle, Ursula; Kreizinger, Zsuzsa; Dorrestein, Gerry M.; Solt, Szabolcs; Sós, Endre; Kim, Young Jun; Uhart, Marcela; Pereda, Ariel; González-Hein, Gisela; Hidalgo, Hector; Blanco, Juan-Manuel; Erdélyi, Károly

    2013-01-01

    Poxvirus infections have been found in 230 species of wild and domestic birds worldwide in both terrestrial and marine environments. This ubiquity raises the question of how infection has been transmitted and globally dispersed. We present a comprehensive global phylogeny of 111 novel poxvirus isolates in addition to all available sequences from GenBank. Phylogenetic analysis of the Avipoxvirus genus has traditionally relied on one gene region (4b core protein). In this study we expanded the analyses to include a second locus (DNA polymerase gene), allowing for a more robust phylogenetic framework, finer genetic resolution within specific groups, and the detection of potential recombination. Our phylogenetic results reveal several major features of avipoxvirus evolution and ecology and propose an updated avipoxvirus taxonomy, including three novel subclades. The characterization of poxviruses from 57 species of birds in this study extends the current knowledge of their host range and provides the first evidence of the phylogenetic effect of genetic recombination of avipoxviruses. The repeated occurrence of avian family or order-specific grouping within certain clades (e.g., starling poxvirus, falcon poxvirus, raptor poxvirus, etc.) indicates a marked role of host adaptation, while the sharing of poxvirus species within prey-predator systems emphasizes the capacity for cross-species infection and limited host adaptation. Our study provides a broad and comprehensive phylogenetic analysis of the Avipoxvirus genus, an ecologically and environmentally important viral group, to formulate a genome sequencing strategy that will clarify avipoxvirus taxonomy. PMID:23408635

  15. Worldwide phylogenetic relationship of avian poxviruses

    USGS Publications Warehouse

    Gyuranecz, Miklós; Foster, Jeffrey T.; Dán, Ádám; Ip, Hon S.; Egstad, Kristina F.; Parker, Patricia G.; Higashiguchi, Jenni M.; Skinner, Michael A.; Höfle, Ursula; Kreizinger, Zsuzsa; Dorrestein, Gerry M.; Solt, Szabolcs; Sós, Endre; Kim, Young Jun; Uhart, Marcela; Pereda, Ariel; González-Hein, Gisela; Hidalgo, Hector; Blanco, Juan-Manuel; Erdélyi, Károly

    2013-01-01

    Poxvirus infections have been found in 230 species of wild and domestic birds worldwide in both terrestrial and marine environments. This ubiquity raises the question of how infection has been transmitted and globally dispersed. We present a comprehensive global phylogeny of 111 novel poxvirus isolates in addition to all available sequences from GenBank. Phylogenetic analysis of Avipoxvirus genus has traditionally relied on one gene region (4b core protein). In this study we have expanded the analyses to include a second locus (DNA polymerase gene), allowing for a more robust phylogenetic framework, finer genetic resolution within specific groups and the detection of potential recombination. Our phylogenetic results reveal several major features of avipoxvirus evolution and ecology and propose an updated avipoxvirus taxonomy, including three novel subclades. The characterization of poxviruses from 57 species of birds in this study extends the current knowledge of their host range and provides the first evidence of the phylogenetic effect of genetic recombination of avipoxviruses. The repeated occurrence of avian family or order-specific grouping within certain clades (e.g. starling poxvirus, falcon poxvirus, raptor poxvirus, etc.) indicates a marked role of host adaptation, while the sharing of poxvirus species within prey-predator systems emphasizes the capacity for cross-species infection and limited host adaptation. Our study provides a broad and comprehensive phylogenetic analysis of the Avipoxvirus genus, an ecologically and environmentally important viral group, to formulate a genome sequencing strategy that will clarify avipoxvirus taxonomy.

  16. World reclassification of the Cardiophorinae (Coleoptera, Elateridae), based on phylogenetic analyses of morphological characters

    PubMed Central

    Douglas, Hume B.

    2017-01-01

    Abstract The prior genus-level classification of Cardiophorinae had never been assessed phylogenetically, and not revised since 1906. A phylogeny for Cardiophorinae and Negastriinae is inferred by Bayesian analyses of 163 adult morphological characters to revise the generic classification. Parsimony analysis is also performed to assess the sensitivity of the Bayesian results to the choice of optimality criterion. Bayesian hypothesis testing rejected monophyly for: Negastriinae; Cardiophorinae (but monophyletic after addition of four taxa); Cardiophorini; cardiophorine genera Aphricus LeConte, 1853; Aptopus Eschscholtz, 1829; Cardiophorus Eschscholtz, 1829; Cardiotarsus Eschscholtz, 1836; Paracardiophorus Schwarz, 1895; Phorocardius Fleutiaux, 1931; Dicronychus sensu Platia, 1994; Dicronychus sensu Méquignon, 1931; Craspedostethus sensu Schwarz, 1906 (i.e., including Tropidiplus Fleutiaux, 1903); Paracardiophorus sensu Cobos, 1970, although well-supported alternative classifications were available for only some. Based on taxonomic interpretation of phylogenetic results: Nyctorini is syn. n. of Cardiophorini; Globothorax Fleutiaux, 1891 (Physodactylinae), Margogastrius Schwarz, 1903 (Physodactylinae), and Pachyelater Lesne, 1897 (Dendrometrinae) are transferred to Cardiophorinae. The following changes are proposed for cardiophorine genera: Aptopus Eschscholtz, 1829 is redefined to exclude Horistonotus-like species; Coptostethus Wollaston, 1854 is subgenus of Cardiophorus; Dicronychus Brullé, 1832 and Diocarphus Fleutiaux, 1947, Metacardiophorus Gurjeva, 1966, Platynychus Motschulsky, 1858, and Zygocardiophorus Iablokoff-Khnzorian and Mardjanian, 1981 are placed at genus rank; Paracardiophorus Schwarz, 1895 is redefined based on North American and Eurasian species only; Horistonotus Candèze, 1860 redefined to include species with multiple apices on each side of their tarsal claws; Patriciella Van Zwaluwenburg, 1953 is syn. n. of Aphricus LeConte, 1853; Teslasena

  17. World reclassification of the Cardiophorinae (Coleoptera, Elateridae), based on phylogenetic analyses of morphological characters.

    PubMed

    Douglas, Hume B

    2017-01-01

    The prior genus-level classification of Cardiophorinae had never been assessed phylogenetically, and not revised since 1906. A phylogeny for Cardiophorinae and Negastriinae is inferred by Bayesian analyses of 163 adult morphological characters to revise the generic classification. Parsimony analysis is also performed to assess the sensitivity of the Bayesian results to the choice of optimality criterion. Bayesian hypothesis testing rejected monophyly for: Negastriinae; Cardiophorinae (but monophyletic after addition of four taxa); Cardiophorini; cardiophorine genera Aphricus LeConte, 1853; Aptopus Eschscholtz, 1829; Cardiophorus Eschscholtz, 1829; Cardiotarsus Eschscholtz, 1836; Paracardiophorus Schwarz, 1895; Phorocardius Fleutiaux, 1931; Dicronychus sensu Platia, 1994; Dicronychus sensu Méquignon, 1931; Craspedostethus sensu Schwarz, 1906 (i.e., including Tropidiplus Fleutiaux, 1903); Paracardiophorus sensu Cobos, 1970, although well-supported alternative classifications were available for only some. Based on taxonomic interpretation of phylogenetic results: Nyctorini is syn. n. of Cardiophorini; Globothorax Fleutiaux, 1891 (Physodactylinae), Margogastrius Schwarz, 1903 (Physodactylinae), and Pachyelater Lesne, 1897 (Dendrometrinae) are transferred to Cardiophorinae. The following changes are proposed for cardiophorine genera: Aptopus Eschscholtz, 1829 is redefined to exclude Horistonotus -like species; Coptostethus Wollaston, 1854 is subgenus of Cardiophorus ; Dicronychus Brullé, 1832 and Diocarphus Fleutiaux, 1947, Metacardiophorus Gurjeva, 1966, Platynychus Motschulsky, 1858, and Zygocardiophorus Iablokoff-Khnzorian and Mardjanian, 1981 are placed at genus rank; Paracardiophorus Schwarz, 1895 is redefined based on North American and Eurasian species only; Horistonotus Candèze, 1860 redefined to include species with multiple apices on each side of their tarsal claws; Patriciella Van Zwaluwenburg, 1953 is syn. n. of Aphricus LeConte, 1853; Teslasena Fleutiaux

  18. Advanced eddy current test signal analysis for steam generator tube defect classification and characterization

    NASA Astrophysics Data System (ADS)

    McClanahan, James Patrick

    Eddy Current Testing (ECT) is a Non-Destructive Examination (NDE) technique that is widely used in power generating plants (both nuclear and fossil) to test the integrity of heat exchanger (HX) and steam generator (SG) tubing. Specifically for this research, laboratory-generated, flawed tubing data were examined. The purpose of this dissertation is to develop and implement an automated method for the classification and an advanced characterization of defects in HX and SG tubing. These two improvements enhanced the robustness of characterization as compared to traditional bobbin-coil ECT data analysis methods. A more robust classification and characterization of the tube flaw in-situ (while the SG is on-line but not when the plant is operating), should provide valuable information to the power industry. The following are the conclusions reached from this research. A feature extraction program acquiring relevant information from both the mixed, absolute and differential data was successfully implemented. The CWT was utilized to extract more information from the mixed, complex differential data. Image Processing techniques used to extract the information contained in the generated CWT, classified the data with a high success rate. The data were accurately classified, utilizing the compressed feature vector and using a Bayes classification system. An estimation of the upper bound for the probability of error, using the Bhattacharyya distance, was successfully applied to the Bayesian classification. The classified data were separated according to flaw-type (classification) to enhance characterization. The characterization routine used dedicated, flaw-type specific ANNs that made the characterization of the tube flaw more robust. The inclusion of outliers may help complete the feature space so that classification accuracy is increased. Given that the eddy current test signals appear very similar, there may not be sufficient information to make an extremely accurate (>95

  19. Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

    PubMed

    Kovács, Endre R; Benko, Mária

    2009-03-01

    Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.

  20. The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

    PubMed

    Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G

    2015-11-04

    In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.

  1. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution

    PubMed Central

    Kendall, Michelle; Colijn, Caroline

    2016-01-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. Key words: phylogenetics, evolution, tree metrics, genetics, sequencing. PMID:27343287

  2. Marker-Based Hierarchical Segmentation and Classification Approach for Hyperspectral Imagery

    NASA Technical Reports Server (NTRS)

    Tarabalka, Yuliya; Tilton, James C.; Benediktsson, Jon Atli; Chanussot, Jocelyn

    2011-01-01

    The Hierarchical SEGmentation (HSEG) algorithm, which is a combination of hierarchical step-wise optimization and spectral clustering, has given good performances for hyperspectral image analysis. This technique produces at its output a hierarchical set of image segmentations. The automated selection of a single segmentation level is often necessary. We propose and investigate the use of automatically selected markers for this purpose. In this paper, a novel Marker-based HSEG (M-HSEG) method for spectral-spatial classification of hyperspectral images is proposed. First, pixelwise classification is performed and the most reliably classified pixels are selected as markers, with the corresponding class labels. Then, a novel constrained marker-based HSEG algorithm is applied, resulting in a spectral-spatial classification map. The experimental results show that the proposed approach yields accurate segmentation and classification maps, and thus is attractive for hyperspectral image analysis.

  3. The Diaporthe sojae species complex: Phylogenetic re-assessment of pathogens associated with soybean, cucurbits and other field crops.

    PubMed

    Udayanga, Dhanushka; Castlebury, Lisa A; Rossman, Amy Y; Chukeatirote, Ekachai; Hyde, Kevin D

    2015-05-01

    Phytopathogenic species of Diaporthe are associated with a number of soybean diseases including seed decay, pod and stem blight and stem canker and lead to considerable crop production losses worldwide. Accurate morphological identification of the species that cause these diseases has been difficult. In this study, we determined the phylogenetic relationships and species boundaries of Diaporthe longicolla, Diaporthe phaseolorum, Diaporthe sojae and closely related taxa. Species boundaries for this complex were determined based on combined phylogenetic analysis of five gene regions: partial sequences of calmodulin (CAL), beta-tubulin (TUB), histone-3 (HIS), translation elongation factor 1-α (EF1-α), and the nuclear ribosomal internal transcribed spacers (ITS). Phylogenetic analyses revealed that this large complex of taxa is comprised of soybean pathogens as well as species associated with herbaceous field crops and weeds. Diaporthe arctii, Diaporthe batatas, D. phaseolorum and D. sojae are epitypified. The seed decay pathogen D. longicolla was determined to be distinct from D. sojae. D. phaseolorum, originally associated with stem and leaf blight of Lima bean, was not found to be associated with soybean. A new species, Diaporthe ueckerae on Cucumis melo, is introduced with description and illustrations. Published by Elsevier Ltd.

  4. The 7th lung cancer TNM classification and staging system: Review of the changes and implications.

    PubMed

    Mirsadraee, Saeed; Oswal, Dilip; Alizadeh, Yalda; Caulo, Andrea; van Beek, Edwin

    2012-04-28

    Lung cancer is the most common cause of death from cancer in males, accounting for more than 1.4 million deaths in 2008. It is a growing concern in China, Asia and Africa as well. Accurate staging of the disease is an important part of the management as it provides estimation of patient's prognosis and identifies treatment sterategies. It also helps to build a database for future staging projects. A major revision of lung cancer staging has been announced with effect from January 2010. The new classification is based on a larger surgical and non-surgical cohort of patients, and thus more accurate in terms of outcome prediction compared to the previous classification. There are several original papers regarding this new classification which give comprehensive description of the methodology, the changes in the staging and the statistical analysis. This overview is a simplified description of the changes in the new classification and their potential impact on patients' treatment and prognosis.

  5. Evaluation of Hydrometeor Classification for Winter Mixed-Phase Precipitation Events

    NASA Astrophysics Data System (ADS)

    Hickman, B.; Troemel, S.; Ryzhkov, A.; Simmer, C.

    2016-12-01

    Hydrometeor classification algorithms (HCL) typically discriminate radar echoes into several classes including rain (light, medium, heavy), hail, dry snow, wet snow, ice crystals, graupel and rain-hail mixtures. Despite the strength of HCL for precipitation dominated by a single phase - especially warm-season classification - shortcomings exist for mixed-phase precipitation classification. Properly identifying mixed-phase can lead to more accurate precipitation estimates, and better forecasts for aviation weather and ground warnings. Cold season precipitation classification is also highly important due to their potentially high impact on society (e.g. black ice, ice accumulation, snow loads), but due to the varying nature of the hydrometeor - density, dielectric constant, shape - reliable classification via radar alone is not capable. With the addition of thermodynamic information of the atmosphere, either from weather models or sounding data, it has been possible to extend more and more into winter time precipitation events. Yet, inaccuracies still exist in separating more benign (ice pellets) from more the more hazardous (freezing rain) events. We have investigated winter mixed-phase precipitation cases which include freezing rain, ice pellets, and rain-snow transitions from several events in Germany in order to move towards a reliable nowcasting of winter precipitation in hopes to provide faster, more accurate winter time warnings. All events have been confirmed to have the specified precipitation from ground reports. Classification of the events is achieved via a combination of inputs from a bulk microphysics numerical weather prediction model and the German dual-polarimetric C-band radar network, into a 1D spectral bin microphysical model (SBC) which explicitly treats the processes of melting, refreezing, and ice nucleation to predict four near-surface precipitation types: rain, snow, freezing rain, ice pellets, rain/snow mixture, and freezing rain

  6. Predicting rates of interspecific interaction from phylogenetic trees.

    PubMed

    Nuismer, Scott L; Harmon, Luke J

    2015-01-01

    Integrating phylogenetic information can potentially improve our ability to explain species' traits, patterns of community assembly, the network structure of communities, and ecosystem function. In this study, we use mathematical models to explore the ecological and evolutionary factors that modulate the explanatory power of phylogenetic information for communities of species that interact within a single trophic level. We find that phylogenetic relationships among species can influence trait evolution and rates of interaction among species, but only under particular models of species interaction. For example, when interactions within communities are mediated by a mechanism of phenotype matching, phylogenetic trees make specific predictions about trait evolution and rates of interaction. In contrast, if interactions within a community depend on a mechanism of phenotype differences, phylogenetic information has little, if any, predictive power for trait evolution and interaction rate. Together, these results make clear and testable predictions for when and how evolutionary history is expected to influence contemporary rates of species interaction. © 2014 John Wiley & Sons Ltd/CNRS.

  7. Rough set classification based on quantum logic

    NASA Astrophysics Data System (ADS)

    Hassan, Yasser F.

    2017-11-01

    By combining the advantages of quantum computing and soft computing, the paper shows that rough sets can be used with quantum logic for classification and recognition systems. We suggest the new definition of rough set theory as quantum logic theory. Rough approximations are essential elements in rough set theory, the quantum rough set model for set-valued data directly construct set approximation based on a kind of quantum similarity relation which is presented here. Theoretical analyses demonstrate that the new model for quantum rough sets has new type of decision rule with less redundancy which can be used to give accurate classification using principles of quantum superposition and non-linear quantum relations. To our knowledge, this is the first attempt aiming to define rough sets in representation of a quantum rather than logic or sets. The experiments on data-sets have demonstrated that the proposed model is more accuracy than the traditional rough sets in terms of finding optimal classifications.

  8. Impact of genomics on the understanding of microbial evolution and classification: the importance of Darwin's views on classification.

    PubMed

    Gupta, Radhey S

    2016-07-01

    Analyses of genome sequences, by some approaches, suggest that the widespread occurrence of horizontal gene transfers (HGTs) in prokaryotes disguises their evolutionary relationships and have led to questioning of the Darwinian model of evolution for prokaryotes. These inferences are critically examined in the light of comparative genome analysis, characteristic synapomorphies, phylogenetic trees and Darwin's views on examining evolutionary relationships. Genome sequences are enabling discovery of numerous molecular markers (synapomorphies) such as conserved signature indels (CSIs) and conserved signature proteins (CSPs), which are distinctive characteristics of different prokaryotic taxa. Based on these molecular markers, exhibiting high degree of specificity and predictive ability, numerous prokaryotic taxa of different ranks, currently identified based on the 16S rRNA gene trees, can now be reliably demarcated in molecular terms. Within all studied groups, multiple CSIs and CSPs have been identified for successive nested clades providing reliable information regarding their hierarchical relationships and these inferences are not affected by HGTs. These results strongly support Darwin's views on evolution and classification and supplement the current phylogenetic framework based on 16S rRNA in important respects. The identified molecular markers provide important means for developing novel diagnostics, therapeutics and for functional studies providing important insights regarding prokaryotic taxa. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Classification

    NASA Astrophysics Data System (ADS)

    Oza, Nikunj

    2012-03-01

    would represent one sunspot’s classification (y_i) and the corresponding set of measurements (x_i). The output of a supervised learning algorithm is a model h that approximates the unknown mapping from the inputs to the outputs. In our example, h would map from the sunspot measurements to the type of sunspot. We may have a test set S—a set of examples not used in training that we use to test how well the model h predicts the outputs on new examples. Just as with the examples in T, the examples in S are assumed to be independent and identically distributed (i.i.d.) draws from the distribution D. We measure the error of h on the test set as the proportion of test cases that h misclassifies: 1/|S| Sigma(x,y union S)[I(h(x)!= y)] where I(v) is the indicator function—it returns 1 if v is true and 0 otherwise. In our sunspot classification example, we would identify additional examples of sunspots that were not used in generating the model, and use these to determine how accurate the model is—the fraction of the test samples that the model classifies correctly. An example of a classification model is the decision tree shown in Figure 23.1. We will discuss the decision tree learning algorithm in more detail later—for now, we assume that, given a training set with examples of sunspots, this decision tree is derived. This can be used to classify previously unseen examples of sunpots. For example, if a new sunspot’s inputs indicate that its "Group Length" is in the range 10-15, then the decision tree would classify the sunspot as being of type “E,” whereas if the "Group Length" is "NULL," the "Magnetic Type" is "bipolar," and the "Penumbra" is "rudimentary," then it would be classified as type "C." In this chapter, we will add to the above description of classification problems. We will discuss decision trees and several other classification models. In particular, we will discuss the learning algorithms that generate these classification models, how to use them to

  10. An accurate method of extracting fat droplets in liver images for quantitative evaluation

    NASA Astrophysics Data System (ADS)

    Ishikawa, Masahiro; Kobayashi, Naoki; Komagata, Hideki; Shinoda, Kazuma; Yamaguchi, Masahiro; Abe, Tokiya; Hashiguchi, Akinori; Sakamoto, Michiie

    2015-03-01

    The steatosis in liver pathological tissue images is a promising indicator of nonalcoholic fatty liver disease (NAFLD) and the possible risk of hepatocellular carcinoma (HCC). The resulting values are also important for ensuring the automatic and accurate classification of HCC images, because the existence of many fat droplets is likely to create errors in quantifying the morphological features used in the process. In this study we propose a method that can automatically detect, and exclude regions with many fat droplets by using the feature values of colors, shapes and the arrangement of cell nuclei. We implement the method and confirm that it can accurately detect fat droplets and quantify the fat droplet ratio of actual images. This investigation also clarifies the effective characteristics that contribute to accurate detection.

  11. Centrifuge: rapid and sensitive classification of metagenomic sequences.

    PubMed

    Kim, Daehwan; Song, Li; Breitwieser, Florian P; Salzberg, Steven L

    2016-12-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. © 2016 Kim et al.; Published by Cold Spring Harbor Laboratory Press.

  12. Classification of cardiac patient states using artificial neural networks

    PubMed Central

    Kannathal, N; Acharya, U Rajendra; Lim, Choo Min; Sadasivan, PK; Krishnan, SM

    2003-01-01

    Electrocardiogram (ECG) is a nonstationary signal; therefore, the disease indicators may occur at random in the time scale. This may require the patient be kept under observation for long intervals in the intensive care unit of hospitals for accurate diagnosis. The present study examined the classification of the states of patients with certain diseases in the intensive care unit using their ECG and an Artificial Neural Networks (ANN) classification system. The states were classified into normal, abnormal and life threatening. Seven significant features extracted from the ECG were fed as input parameters to the ANN for classification. Three neural network techniques, namely, back propagation, self-organizing maps and radial basis functions, were used for classification of the patient states. The ANN classifier in this case was observed to be correct in approximately 99% of the test cases. This result was further improved by taking 13 features of the ECG as input for the ANN classifier. PMID:19649222

  13. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

    PubMed

    Alcantara, Luiz Carlos Junior; Cassol, Sharon; Libin, Pieter; Deforche, Koen; Pybus, Oliver G; Van Ranst, Marc; Galvão-Castro, Bernardo; Vandamme, Anne-Mieke; de Oliveira, Tulio

    2009-07-01

    Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.

  14. Molecular identification and phylogenetic analysis of Wuchereria bancrofti from human blood samples in Egypt.

    PubMed

    Abdel-Shafi, Iman R; Shoieb, Eman Y; Attia, Samar S; Rubio, José M; Ta-Tang, Thuy-Huong; El-Badry, Ayman A

    2017-03-01

    Lymphatic filariasis (LF) is a serious vector-borne health problem, and Wuchereria bancrofti (W.b) is the major cause of LF worldwide and is focally endemic in Egypt. Identification of filarial infection using traditional morphologic and immunological criteria can be difficult and lead to misdiagnosis. The aim of the present study was molecular detection of W.b in residents in endemic areas in Egypt, sequence variance analysis, and phylogenetic analysis of W.b DNA. Collected blood samples from residents in filariasis endemic areas in five governorates were subjected to semi-nested PCR targeting repeated DNA sequence, for detection of W.b DNA. PCR products were sequenced; subsequently, a phylogenetic analysis of the obtained sequences was performed. Out of 300 blood samples, W.b DNA was identified in 48 (16%). Sequencing analysis confirmed PCR results identifying only W.b species. Sequence alignment and phylogenetic analysis indicated genetically distinct clusters of W.b among the study population. Study results demonstrated that the semi-nested PCR proved to be an effective diagnostic tool for accurate and rapid detection of W.b infections in nano-epidemics and is applicable for samples collected in the daytime as well as the night time. PCR products sequencing and phylogenitic analysis revealed three different nucleotide sequences variants. Further genetic studies of W.b in Egypt and other endemic areas are needed to distinguish related strains and the various ecological as well as drug effects exerted on them to support W.b elimination.

  15. Maximizing the phylogenetic diversity of seed banks.

    PubMed

    Griffiths, Kate E; Balding, Sharon T; Dickie, John B; Lewis, Gwilym P; Pearce, Tim R; Grenyer, Richard

    2015-04-01

    Ex situ conservation efforts such as those of zoos, botanical gardens, and seed banks will form a vital complement to in situ conservation actions over the coming decades. It is therefore necessary to pay the same attention to the biological diversity represented in ex situ conservation facilities as is often paid to protected-area networks. Building the phylogenetic diversity of ex situ collections will strengthen our capacity to respond to biodiversity loss. Since 2000, the Millennium Seed Bank Partnership has banked seed from 14% of the world's plant species. We assessed the taxonomic, geographic, and phylogenetic diversity of the Millennium Seed Bank collection of legumes (Leguminosae). We compared the collection with all known legume genera, their known geographic range (at country and regional levels), and a genus-level phylogeny of the legume family constructed for this study. Over half the phylogenetic diversity of legumes at the genus level was represented in the Millennium Seed Bank. However, pragmatic prioritization of species of economic importance and endangerment has led to the banking of a less-than-optimal phylogenetic diversity and prioritization of range-restricted species risks an underdispersed collection. The current state of the phylogenetic diversity of legumes in the Millennium Seed Bank could be substantially improved through the strategic banking of relatively few additional taxa. Our method draws on tools that are widely applied to in situ conservation planning, and it can be used to evaluate and improve the phylogenetic diversity of ex situ collections. © 2014 Society for Conservation Biology.

  16. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    PubMed Central

    Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

    2006-01-01

    Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid

  17. Phylogenetic analysis of fungal heterotrimeric G protein-encoding genes and their expression during dimorphism in Mucor circinelloides.

    PubMed

    Valle-Maldonado, Marco Iván; Jácome-Galarza, Irvin Eduardo; Díaz-Pérez, Alma Laura; Martínez-Cadena, Guadalupe; Campos-García, Jesús; Ramírez-Díaz, Martha Isela; Reyes-De la Cruz, Homero; Riveros-Rosas, Héctor; Díaz-Pérez, César; Meza-Carmen, Víctor

    2015-12-01

    In fungi, heterotrimeric G proteins are key regulators of biological processes such as mating, virulence, morphology, among others. Mucor circinelloides is a model organism for many biological processes, and its genome contains the largest known repertoire of genes that encode putative heterotrimeric G protein subunits in the fungal kingdom: twelve Gα (McGpa1-12), three Gβ (McGpb1-3), and three Gγ (McGpg1-3). Phylogenetic analysis of fungal Gα showed that they are divided into four distinct groups as reported previously. Fungal Gβ and Gγ are also divided into four phylogenetic groups, and to our understanding this is the first report of a phylogenetic classification for fungal Gβ and Gγ subunits. Almost all genes that encode putative heterotrimeric G subunits in M. circinelloides are differentially expressed during dimorphic growth, except for McGpg1 (Gγ) that showed very low mRNA levels at all developmental stages. Moreover, several of the subunits are expressed in a similar pattern and at the same level, suggesting that they constitute discrete complexes. For example, McGpb3 (Gβ), and McGpg2 (Gγ), are co-expressed during mycelium growth, and McGpa1, McGpb2, and McGpg2, are co-expressed during yeast development. These findings provide the conceptual framework to study the biological role of these genes during M. circinelloides morphogenesis. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  18. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution.

    PubMed

    Kendall, Michelle; Colijn, Caroline

    2016-10-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. phylogenetics, evolution, tree metrics, genetics, sequencing. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. Learning semantic histopathological representation for basal cell carcinoma classification

    NASA Astrophysics Data System (ADS)

    Gutiérrez, Ricardo; Rueda, Andrea; Romero, Eduardo

    2013-03-01

    Diagnosis of a histopathology glass slide is a complex process that involves accurate recognition of several structures, their function in the tissue and their relation with other structures. The way in which the pathologist represents the image content and the relations between those objects yields a better and accurate diagnoses. Therefore, an appropriate semantic representation of the image content will be useful in several analysis tasks such as cancer classification, tissue retrieval and histopahological image analysis, among others. Nevertheless, to automatically recognize those structures and extract their inner semantic meaning are still very challenging tasks. In this paper we introduce a new semantic representation that allows to describe histopathological concepts suitable for classification. The approach herein identify local concepts using a dictionary learning approach, i.e., the algorithm learns the most representative atoms from a set of random sampled patches, and then models the spatial relations among them by counting the co-occurrence between atoms, while penalizing the spatial distance. The proposed approach was compared with a bag-of-features representation in a tissue classification task. For this purpose, 240 histological microscopical fields of view, 24 per tissue class, were collected. Those images fed a Support Vector Machine classifier per class, using 120 images as train set and the remaining ones for testing, maintaining the same proportion of each concept in the train and test sets. The obtained classification results, averaged from 100 random partitions of training and test sets, shows that our approach is more sensitive in average than the bag-of-features representation in almost 6%.

  20. Phylogenetic Copy-Number Factorization of Multiple Tumor Samples.

    PubMed

    Zaccaria, Simone; El-Kebir, Mohammed; Klau, Gunnar W; Raphael, Benjamin J

    2018-04-16

    Cancer is an evolutionary process driven by somatic mutations. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the many types of mutations in cancer and the fact that nearly all cancer sequencing is of a bulk tumor, measuring a superposition of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy-number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy-number data from multiple samples of a tumor. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that either perform deconvolution/factorization of mixed tumor samples or build phylogenetic trees assuming homogeneous tumor samples. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher resolution view of copy-number evolution of this cancer than published analyses.

  1. Molecular phylogenetics and species delimitation of leaf-toed geckos (Phyllodactylidae: Phyllodactylus) throughout the Mexican tropical dry forest.

    PubMed

    Blair, Christopher; Méndez de la Cruz, Fausto R; Law, Christopher; Murphy, Robert W

    2015-03-01

    Methods and approaches for accurate species delimitation continue to be a highly controversial subject in the systematics community. Inaccurate assessment of species' limits precludes accurate inference of historical evolutionary processes. Recent evidence suggests that multilocus coalescent methods show promise in delimiting species in cryptic clades. We combine multilocus sequence data with coalescence-based phylogenetics in a hypothesis-testing framework to assess species limits and elucidate the timing of diversification in leaf-toed geckos (Phyllodactylus) of Mexico's dry forests. Tropical deciduous forests (TDF) of the Neotropics are among the planet's most diverse ecosystems. However, in comparison to moist tropical forests, little is known about the mode and tempo of biotic evolution throughout this threatened biome. We find increased speciation and substantial, cryptic molecular diversity originating following the formation of Mexican TDF 30-20million years ago due to orogenesis of the Sierra Madre Occidental and Mexican Volcanic Belt. Phylogenetic results suggest that the Mexican Volcanic Belt, the Rio Fuerte, and Isthmus of Tehuantepec may be important biogeographic barriers. Single- and multilocus coalescent analyses suggest that nearly every sampling locality may be a distinct species. These results suggest unprecedented levels of diversity, a complex evolutionary history, and that the formation and expansion of TDF vegetation in the Miocene may have influenced subsequent cladogenesis of leaf-toed geckos throughout western Mexico. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. The phylogenetic roots of human lethal violence.

    PubMed

    Gómez, José María; Verdú, Miguel; González-Megías, Adela; Méndez, Marcos

    2016-10-13

    The psychological, sociological and evolutionary roots of conspecific violence in humans are still debated, despite attracting the attention of intellectuals for over two millennia. Here we propose a conceptual approach towards understanding these roots based on the assumption that aggression in mammals, including humans, has a significant phylogenetic component. By compiling sources of mortality from a comprehensive sample of mammals, we assessed the percentage of deaths due to conspecifics and, using phylogenetic comparative tools, predicted this value for humans. The proportion of human deaths phylogenetically predicted to be caused by interpersonal violence stood at 2%. This value was similar to the one phylogenetically inferred for the evolutionary ancestor of primates and apes, indicating that a certain level of lethal violence arises owing to our position within the phylogeny of mammals. It was also similar to the percentage seen in prehistoric bands and tribes, indicating that we were as lethally violent then as common mammalian evolutionary history would predict. However, the level of lethal violence has changed through human history and can be associated with changes in the socio-political organization of human populations. Our study provides a detailed phylogenetic and historical context against which to compare levels of lethal violence observed throughout our history.

  3. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.

    PubMed

    Sayyari, Erfan; Mirarab, Siavash

    2016-11-11

    Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.

  4. Refuting phylogenetic relationships

    PubMed Central

    Bucknam, James; Boucher, Yan; Bapteste, Eric

    2006-01-01

    Background Phylogenetic methods are philosophically grounded, and so can be philosophically biased in ways that limit explanatory power. This constitutes an important methodologic dimension not often taken into account. Here we address this dimension in the context of concatenation approaches to phylogeny. Results We discuss some of the limits of a methodology restricted to verificationism, the philosophy on which gene concatenation practices generally rely. As an alternative, we describe a software which identifies and focuses on impossible or refuted relationships, through a simple analysis of bootstrap bipartitions, followed by multivariate statistical analyses. We show how refuting phylogenetic relationships could in principle facilitate systematics. We also apply our method to the study of two complex phylogenies: the phylogeny of the archaea and the phylogeny of the core of genes shared by all life forms. While many groups are rejected, our results left open a possible proximity of N. equitans and the Methanopyrales, of the Archaea and the Cyanobacteria, and as well the possible grouping of the Methanobacteriales/Methanoccocales and Thermosplasmatales, of the Spirochaetes and the Actinobacteria and of the Proteobacteria and firmicutes. Conclusion It is sometimes easier (and preferable) to decide which species do not group together than which ones do. When possible topologies are limited, identifying local relationships that are rejected may be a useful alternative to classical concatenation approaches aiming to find a globally resolved tree on the basis of weak phylogenetic markers. Reviewers This article was reviewed by Mark Ragan, Eugene V Koonin and J Peter Gogarten. PMID:16956399

  5. Comparisons of neural networks to standard techniques for image classification and correlation

    NASA Technical Reports Server (NTRS)

    Paola, Justin D.; Schowengerdt, Robert A.

    1994-01-01

    Neural network techniques for multispectral image classification and spatial pattern detection are compared to the standard techniques of maximum-likelihood classification and spatial correlation. The neural network produced a more accurate classification than maximum-likelihood of a Landsat scene of Tucson, Arizona. Some of the errors in the maximum-likelihood classification are illustrated using decision region and class probability density plots. As expected, the main drawback to the neural network method is the long time required for the training stage. The network was trained using several different hidden layer sizes to optimize both the classification accuracy and training speed, and it was found that one node per class was optimal. The performance improved when 3x3 local windows of image data were entered into the net. This modification introduces texture into the classification without explicit calculation of a texture measure. Larger windows were successfully used for the detection of spatial features in Landsat and Magellan synthetic aperture radar imagery.

  6. Phylogenetic turnover along local environmental gradients in tropical forest communities.

    PubMed

    Baldeck, C A; Kembel, S W; Harms, K E; Yavitt, J B; John, R; Turner, B L; Madawala, S; Gunatilleke, N; Gunatilleke, S; Bunyavejchewin, S; Kiratiprayoon, S; Yaacob, A; Supardi, M N N; Valencia, R; Navarrete, H; Davies, S J; Chuyong, G B; Kenfack, D; Thomas, D W; Dalling, J W

    2016-10-01

    While the importance of local-scale habitat niches in shaping tree species turnover along environmental gradients in tropical forests is well appreciated, relatively little is known about the influence of phylogenetic signal in species' habitat niches in shaping local community structure. We used detailed maps of the soil resource and topographic variation within eight 24-50 ha tropical forest plots combined with species phylogenies created from the APG III phylogeny to examine how phylogenetic beta diversity (indicating the degree of phylogenetic similarity of two communities) was related to environmental gradients within tropical tree communities. Using distance-based redundancy analysis we found that phylogenetic beta diversity, expressed as either nearest neighbor distance or mean pairwise distance, was significantly related to both soil and topographic variation in all study sites. In general, more phylogenetic beta diversity within a forest plot was explained by environmental variables this was expressed as nearest neighbor distance versus mean pairwise distance (3.0-10.3 % and 0.4-8.8 % of variation explained among plots, respectively), and more variation was explained by soil resource variables than topographic variables using either phylogenetic beta diversity metric. We also found that patterns of phylogenetic beta diversity expressed as nearest neighbor distance were consistent with previously observed patterns of niche similarity among congeneric species pairs in these plots. These results indicate the importance of phylogenetic signal in local habitat niches in shaping the phylogenetic structure of tropical tree communities, especially at the level of close phylogenetic neighbors, where similarity in habitat niches is most strongly preserved.

  7. Spatial phylogenetics of the native California flora.

    PubMed

    Thornhill, Andrew H; Baldwin, Bruce G; Freyman, William A; Nosratinia, Sonia; Kling, Matthew M; Morueta-Holme, Naia; Madsen, Thomas P; Ackerly, David D; Mishler, Brent D

    2017-10-26

    California is a world floristic biodiversity hotspot where the terms neo- and paleo-endemism were first applied. Using spatial phylogenetics, it is now possible to evaluate biodiversity from an evolutionary standpoint, including discovering significant areas of neo- and paleo-endemism, by combining spatial information from museum collections and DNA-based phylogenies. Here we used a distributional dataset of 1.39 million herbarium specimens, a phylogeny of 1083 operational taxonomic units (OTUs) and 9 genes, and a spatial randomization test to identify regions of significant phylogenetic diversity, relative phylogenetic diversity, and phylogenetic endemism (PE), as well as to conduct a categorical analysis of neo- and paleo-endemism (CANAPE). We found (1) extensive phylogenetic clustering in the South Coast Ranges, southern Great Valley, and deserts of California; (2) significant concentrations of short branches in the Mojave and Great Basin Deserts and the South Coast Ranges and long branches in the northern Great Valley, Sierra Nevada foothills, and the northwestern and southwestern parts of the state; (3) significant concentrations of paleo-endemism in Northwestern California, the northern Great Valley, and western Sonoran Desert, and neo-endemism in the White-Inyo Range, northern Mojave Desert, and southern Channel Islands. Multiple analyses were run to observe the effects on significance patterns of using different phylogenetic tree topologies (uncalibrated trees versus time-calibrated ultrametric trees) and using different representations of OTU ranges (herbarium specimen locations versus species distribution models). These analyses showed that examining the geographic distributions of branch lengths in a statistical framework adds a new dimension to California floristics that, in comparison with climatic data, helps to illuminate causes of endemism. In particular, the concentration of significant PE in more arid regions of California extends previous ideas

  8. Phylogenetic relationships in three species of canine Demodex mite based on partial sequences of mitochondrial 16S rDNA.

    PubMed

    Sastre, Natalia; Ravera, Ivan; Villanueva, Sergio; Altet, Laura; Bardagí, Mar; Sánchez, Armand; Francino, Olga; Ferrer, Lluís

    2012-12-01

    The historical classification of Demodex mites has been based on their hosts and morphological features. Genome sequencing has proved to be a very effective taxonomic tool in phylogenetic studies and has been applied in the classification of Demodex. Mitochondrial 16S rDNA has been demonstrated to be an especially useful marker to establish phylogenetic relationships. To amplify and sequence a segment of the mitochondrial 16S rDNA from Demodex canis and Demodex injai, as well as from the short-bodied mite called, unofficially, D. cornei and to determine their genetic proximity. Demodex mites were examined microscopically and classified as Demodex folliculorum (one sample), D. canis (four samples), D. injai (two samples) or the short-bodied species D. cornei (three samples). DNA was extracted, and a 338 bp fragment of the 16S rDNA was amplified and sequenced. The sequences of the four D. canis mites were identical and shared 99.6 and 97.3% identity with two D. canis sequences available at GenBank. The sequences of the D. cornei isolates were identical and showed 97.8, 98.2 and 99.6% identity with the D. canis isolates. The sequences of the two D. injai isolates were also identical and showed 76.6% identity with the D. canis sequence. Demodex canis and D. injai are two different species, with a genetic distance of 23.3%. It would seem that the short-bodied Demodex mite D. cornei is a morphological variant of D. canis. © 2012 The Authors. Veterinary Dermatology © 2012 ESVD and ACVD.

  9. Link prediction boosted psychiatry disorder classification for functional connectivity network

    NASA Astrophysics Data System (ADS)

    Li, Weiwei; Mei, Xue; Wang, Hao; Zhou, Yu; Huang, Jiashuang

    2017-02-01

    Functional connectivity network (FCN) is an effective tool in psychiatry disorders classification, and represents cross-correlation of the regional blood oxygenation level dependent signal. However, FCN is often incomplete for suffering from missing and spurious edges. To accurate classify psychiatry disorders and health control with the incomplete FCN, we first `repair' the FCN with link prediction, and then exact the clustering coefficients as features to build a weak classifier for every FCN. Finally, we apply a boosting algorithm to combine these weak classifiers for improving classification accuracy. Our method tested by three datasets of psychiatry disorder, including Alzheimer's Disease, Schizophrenia and Attention Deficit Hyperactivity Disorder. The experimental results show our method not only significantly improves the classification accuracy, but also efficiently reconstructs the incomplete FCN.

  10. Towards an eco-phylogenetic framework for infectious disease ecology.

    PubMed

    Fountain-Jones, Nicholas M; Pearse, William D; Escobar, Luis E; Alba-Casals, Ana; Carver, Scott; Davies, T Jonathan; Kraberger, Simona; Papeş, Monica; Vandegrift, Kurt; Worsley-Tonks, Katherine; Craft, Meggan E

    2018-05-01

    Identifying patterns and drivers of infectious disease dynamics across multiple scales is a fundamental challenge for modern science. There is growing awareness that it is necessary to incorporate multi-host and/or multi-parasite interactions to understand and predict current and future disease threats better, and new tools are needed to help address this task. Eco-phylogenetics (phylogenetic community ecology) provides one avenue for exploring multi-host multi-parasite systems, yet the incorporation of eco-phylogenetic concepts and methods into studies of host pathogen dynamics has lagged behind. Eco-phylogenetics is a transformative approach that uses evolutionary history to infer present-day dynamics. Here, we present an eco-phylogenetic framework to reveal insights into parasite communities and infectious disease dynamics across spatial and temporal scales. We illustrate how eco-phylogenetic methods can help untangle the mechanisms of host-parasite dynamics from individual (e.g. co-infection) to landscape scales (e.g. parasite/host community structure). An improved ecological understanding of multi-host and multi-pathogen dynamics across scales will increase our ability to predict disease threats. © 2017 Cambridge Philosophical Society.

  11. Toward a Novel Multilocus Phylogenetic Taxonomy for the Dermatophytes.

    PubMed

    de Hoog, G Sybren; Dukik, Karolina; Monod, Michel; Packeu, Ann; Stubbe, Dirk; Hendrickx, Marijke; Kupsch, Christiane; Stielow, J Benjamin; Freeke, Joanna; Göker, Markus; Rezaei-Matehkolaei, Ali; Mirhendi, Hossein; Gräser, Yvonne

    2017-02-01

    Type and reference strains of members of the onygenalean family Arthrodermataceae have been sequenced for rDNA ITS and partial LSU, the ribosomal 60S protein, and fragments of β-tubulin and translation elongation factor 3. The resulting phylogenetic trees showed a large degree of correspondence, and topologies matched those of earlier published phylogenies demonstrating that the phylogenetic representation of dermatophytes and dermatophyte-like fungi has reached an acceptable level of stability. All trees showed Trichophyton to be polyphyletic. In the present paper, Trichophyton is restricted to mainly the derived clade, resulting in classification of nearly all anthropophilic dermatophytes in Trichophyton and Epidermophyton, along with some zoophilic species that regularly infect humans. Microsporum is restricted to some species around M. canis, while the geophilic species and zoophilic species that are more remote from the human sphere are divided over Arthroderma, Lophophyton and Nannizzia. A new genus Guarromyces is proposed for Keratinomyces ceretanicus. Thirteen new combinations are proposed; in an overview of all described species it is noted that the largest number of novelties was introduced during the decades 1920-1940, when morphological characters were used in addition to clinical features. Species are neo- or epi-typified where necessary, which was the case in Arthroderma curreyi, Epidermophyton floccosum, Lophophyton gallinae, Trichophyton equinum, T. mentagrophytes, T. quinckeanum, T. schoenleinii, T. soudanense, and T. verrucosum. In the newly proposed taxonomy, Trichophyton contains 16 species, Epidermophyton one species, Nannizzia 9 species, Microsporum 3 species, Lophophyton 1 species, Arthroderma 21 species and Ctenomyces 1 species, but more detailed studies remain needed to establish species borderlines. Each species now has a single valid name. Two new genera are introduced: Guarromyces and Paraphyton. The number of genera has increased, but

  12. Classification of infectious bursal disease virus into genogroups.

    PubMed

    Michel, Linda O; Jackwood, Daral J

    2017-12-01

    Infectious bursal disease virus (IBDV) causes infectious bursal disease (IBD), an immunosuppressive disease of poultry. The current classification scheme of IBDV is confusing because it is based on antigenic types (variant and classical) as well as pathotypes. Many of the amino acid changes differentiating these various classifications are found in a hypervariable region of the capsid protein VP2 (hvVP2), the major host protective antigen. Data from this study were used to propose a new classification scheme for IBDV based solely on genogroups identified from phylogenetic analysis of the hvVP2 of strains worldwide. Seven major genogroups were identified, some of which are geographically restricted and others that have global dispersion, such as genogroup 1. Genogroup 2 viruses are predominately distributed in North America, while genogroup 3 viruses are most often identified on other continents. Additionally, we have identified a population of genogroup 3 vvIBDV isolates that have an amino acid change from alanine to threonine at position 222 while maintaining other residues conserved in this genogroup (I242, I256 and I294). A222T is an important mutation because amino acid 222 is located in the first of four surface loops of hvVP2. A similar shift from proline to threonine at 222 is believed to play a role in the significant antigenic change of the genogroup 2 IBDV strains, suggesting that antigenic drift may be occurring in genogroup 3, possibly in response to antigenic pressure from vaccination.

  13. Diverse Region-Based CNN for Hyperspectral Image Classification.

    PubMed

    Zhang, Mengmeng; Li, Wei; Du, Qian

    2018-06-01

    Convolutional neural network (CNN) is of great interest in machine learning and has demonstrated excellent performance in hyperspectral image classification. In this paper, we propose a classification framework, called diverse region-based CNN, which can encode semantic context-aware representation to obtain promising features. With merging a diverse set of discriminative appearance factors, the resulting CNN-based representation exhibits spatial-spectral context sensitivity that is essential for accurate pixel classification. The proposed method exploiting diverse region-based inputs to learn contextual interactional features is expected to have more discriminative power. The joint representation containing rich spectral and spatial information is then fed to a fully connected network and the label of each pixel vector is predicted by a softmax layer. Experimental results with widely used hyperspectral image data sets demonstrate that the proposed method can surpass any other conventional deep learning-based classifiers and other state-of-the-art classifiers.

  14. Hyperspectral Image Classification via Multitask Joint Sparse Representation and Stepwise MRF Optimization.

    PubMed

    Yuan, Yuan; Lin, Jianzhe; Wang, Qi

    2016-12-01

    Hyperspectral image (HSI) classification is a crucial issue in remote sensing. Accurate classification benefits a large number of applications such as land use analysis and marine resource utilization. But high data correlation brings difficulty to reliable classification, especially for HSI with abundant spectral information. Furthermore, the traditional methods often fail to well consider the spatial coherency of HSI that also limits the classification performance. To address these inherent obstacles, a novel spectral-spatial classification scheme is proposed in this paper. The proposed method mainly focuses on multitask joint sparse representation (MJSR) and a stepwise Markov random filed framework, which are claimed to be two main contributions in this procedure. First, the MJSR not only reduces the spectral redundancy, but also retains necessary correlation in spectral field during classification. Second, the stepwise optimization further explores the spatial correlation that significantly enhances the classification accuracy and robustness. As far as several universal quality evaluation indexes are concerned, the experimental results on Indian Pines and Pavia University demonstrate the superiority of our method compared with the state-of-the-art competitors.

  15. Accurate positioning based on acoustic and optical sensors

    NASA Astrophysics Data System (ADS)

    Cai, Kerong; Deng, Jiahao; Guo, Hualing

    2009-11-01

    Unattended laser target designator (ULTD) was designed to partly take the place of conventional LTDs for accurate positioning and laser marking. Analyzed the precision, accuracy and errors of acoustic sensor array, the requirements of laser generator, and the technology of image analysis and tracking, the major system modules were determined. The target's classification, velocity and position can be measured by sensors, and then coded laser beam will be emitted intelligently to mark the excellent position at the excellent time. The conclusion shows that, ULTD can not only avoid security threats, be deployed massively, and accomplish battle damage assessment (BDA), but also be fit for information-based warfare.

  16. Object Detection and Classification by Decision-Level Fusion for Intelligent Vehicle Systems.

    PubMed

    Oh, Sang-Il; Kang, Hang-Bong

    2017-01-22

    To understand driving environments effectively, it is important to achieve accurate detection and classification of objects detected by sensor-based intelligent vehicle systems, which are significantly important tasks. Object detection is performed for the localization of objects, whereas object classification recognizes object classes from detected object regions. For accurate object detection and classification, fusing multiple sensor information into a key component of the representation and perception processes is necessary. In this paper, we propose a new object-detection and classification method using decision-level fusion. We fuse the classification outputs from independent unary classifiers, such as 3D point clouds and image data using a convolutional neural network (CNN). The unary classifiers for the two sensors are the CNN with five layers, which use more than two pre-trained convolutional layers to consider local to global features as data representation. To represent data using convolutional layers, we apply region of interest (ROI) pooling to the outputs of each layer on the object candidate regions generated using object proposal generation to realize color flattening and semantic grouping for charge-coupled device and Light Detection And Ranging (LiDAR) sensors. We evaluate our proposed method on a KITTI benchmark dataset to detect and classify three object classes: cars, pedestrians and cyclists. The evaluation results show that the proposed method achieves better performance than the previous methods. Our proposed method extracted approximately 500 proposals on a 1226 × 370 image, whereas the original selective search method extracted approximately 10 6 × n proposals. We obtained classification performance with 77.72% mean average precision over the entirety of the classes in the moderate detection level of the KITTI benchmark dataset.

  17. Object Detection and Classification by Decision-Level Fusion for Intelligent Vehicle Systems

    PubMed Central

    Oh, Sang-Il; Kang, Hang-Bong

    2017-01-01

    To understand driving environments effectively, it is important to achieve accurate detection and classification of objects detected by sensor-based intelligent vehicle systems, which are significantly important tasks. Object detection is performed for the localization of objects, whereas object classification recognizes object classes from detected object regions. For accurate object detection and classification, fusing multiple sensor information into a key component of the representation and perception processes is necessary. In this paper, we propose a new object-detection and classification method using decision-level fusion. We fuse the classification outputs from independent unary classifiers, such as 3D point clouds and image data using a convolutional neural network (CNN). The unary classifiers for the two sensors are the CNN with five layers, which use more than two pre-trained convolutional layers to consider local to global features as data representation. To represent data using convolutional layers, we apply region of interest (ROI) pooling to the outputs of each layer on the object candidate regions generated using object proposal generation to realize color flattening and semantic grouping for charge-coupled device and Light Detection And Ranging (LiDAR) sensors. We evaluate our proposed method on a KITTI benchmark dataset to detect and classify three object classes: cars, pedestrians and cyclists. The evaluation results show that the proposed method achieves better performance than the previous methods. Our proposed method extracted approximately 500 proposals on a 1226×370 image, whereas the original selective search method extracted approximately 106×n proposals. We obtained classification performance with 77.72% mean average precision over the entirety of the classes in the moderate detection level of the KITTI benchmark dataset. PMID:28117742

  18. ["Long-branch Attraction" artifact in phylogenetic reconstruction].

    PubMed

    Li, Yi-Wei; Yu, Li; Zhang, Ya-Ping

    2007-06-01

    Phylogenetic reconstruction among various organisms not only helps understand their evolutionary history but also reveal several fundamental evolutionary questions. Understanding of the evolutionary relationships among organisms establishes the foundation for the investigations of other biological disciplines. However, almost all the widely used phylogenetic methods have limitations which fail to eliminate systematic errors effectively, preventing the reconstruction of true organismal relationships. "Long-branch Attraction" (LBA) artifact is one of the most disturbing factors in phylogenetic reconstruction. In this review, the conception and analytic method as well as the avoidance strategy of LBA were summarized. In addition, several typical examples were provided. The approach to avoid and resolve LBA artifact has been discussed.

  19. Understanding phylogenetic incongruence: lessons from phyllostomid bats

    PubMed Central

    Dávalos, Liliana M; Cirranello, Andrea L; Geisler, Jonathan H; Simmons, Nancy B

    2012-01-01

    All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive

  20. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo

    2016-02-01

    Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  1. Phylogenetic relationships of some species of the family Echinostomatidae Odner, 1910 (Trematoda), inferred from nuclear rDNA sequences and karyological analysis

    PubMed Central

    Stanevičiūtė, Gražina; Stunžėnas, Virmantas; Petkevičiūtė, Romualda

    2015-01-01

    Abstract The family Echinostomatidae Looss, 1899 exhibits a substantial taxonomic diversity, morphological criteria adopted by different authors have resulted in its subdivision into an impressive number of subfamilies. The status of the subfamily Echinochasminae Odhner, 1910 was changed in various classifications. Genetic characteristics and phylogenetic analysis of four Echinostomatidae species – Echinochasmus sp., Echinochasmus coaxatus Dietz, 1909, Stephanoprora pseudoechinata (Olsson, 1876) and Echinoparyphium mordwilkoi Skrjabin, 1915 were obtained to understand well enough the homogeneity of the Echinochasminae and phylogenetic relationships within the Echinostomatidae. Chromosome set and nuclear rDNA (ITS2 and 28S) sequences of parthenites of Echinochasmus sp. were studied. The karyotype of this species (2n=20, one pair of large bi-armed chromosomes and others are smaller-sized, mainly one-armed, chromosomes) differed from that previously described for two other representatives of the Echinochasminae, Echinochasmus beleocephalus (von Linstow, 1893), 2n=14, and Episthmium bursicola (Creplin, 1937), 2n=18. In phylogenetic trees based on ITS2 and 28S datasets, a well-supported subclade with Echinochasmus sp. and Stephanoprora pseudoechinata clustered with one well-supported clade together with Echinochasmus japonicus Tanabe, 1926 (data only for 28S) and Echinochasmus coaxatus. These results supported close phylogenetic relationships between Echinochasmus Dietz, 1909 and Stephanoprora Odhner, 1902. Phylogenetic analysis revealed a clear separation of related species of Echinostomatoidea restricted to prosobranch snails as first intermediate hosts, from other species of Echinostomatidae and Psilostomidae, developing in Lymnaeoidea snails as first intermediate hosts. According to the data based on rDNA phylogeny, it was supposed that evolution of parasitic flukes linked with first intermediate hosts. Digeneans parasitizing prosobranch snails showed higher

  2. Phylogenetic relationships of some species of the family Echinostomatidae Odner, 1910 (Trematoda), inferred from nuclear rDNA sequences and karyological analysis.

    PubMed

    Stanevičiūtė, Gražina; Stunžėnas, Virmantas; Petkevičiūtė, Romualda

    2015-01-01

    The family Echinostomatidae Looss, 1899 exhibits a substantial taxonomic diversity, morphological criteria adopted by different authors have resulted in its subdivision into an impressive number of subfamilies. The status of the subfamily Echinochasminae Odhner, 1910 was changed in various classifications. Genetic characteristics and phylogenetic analysis of four Echinostomatidae species - Echinochasmus sp., Echinochasmuscoaxatus Dietz, 1909, Stephanoprorapseudoechinata (Olsson, 1876) and Echinoparyphiummordwilkoi Skrjabin, 1915 were obtained to understand well enough the homogeneity of the Echinochasminae and phylogenetic relationships within the Echinostomatidae. Chromosome set and nuclear rDNA (ITS2 and 28S) sequences of parthenites of Echinochasmus sp. were studied. The karyotype of this species (2n=20, one pair of large bi-armed chromosomes and others are smaller-sized, mainly one-armed, chromosomes) differed from that previously described for two other representatives of the Echinochasminae, Echinochasmusbeleocephalus (von Linstow, 1893), 2n=14, and Episthmiumbursicola (Creplin, 1937), 2n=18. In phylogenetic trees based on ITS2 and 28S datasets, a well-supported subclade with Echinochasmus sp. and Stephanoprorapseudoechinata clustered with one well-supported clade together with Echinochasmusjaponicus Tanabe, 1926 (data only for 28S) and Echinochasmuscoaxatus. These results supported close phylogenetic relationships between Echinochasmus Dietz, 1909 and Stephanoprora Odhner, 1902. Phylogenetic analysis revealed a clear separation of related species of Echinostomatoidea restricted to prosobranch snails as first intermediate hosts, from other species of Echinostomatidae and Psilostomidae, developing in Lymnaeoidea snails as first intermediate hosts. According to the data based on rDNA phylogeny, it was supposed that evolution of parasitic flukes linked with first intermediate hosts. Digeneans parasitizing prosobranch snails showed higher dynamic of karyotype

  3. Two Influential Primate Classifications Logically Aligned

    PubMed Central

    Franz, Nico M.; Pier, Naomi M.; Reeder, Deeann M.; Chen, Mingmin; Yu, Shizhuo; Kianmajd, Parisa; Bowers, Shawn; Ludäscher, Bertram

    2016-01-01

    Classifications and phylogenies of perceived natural entities change in the light of new evidence. Taxonomic changes, translated into Code-compliant names, frequently lead to name:meaning dissociations across succeeding treatments. Classification standards such as the Mammal Species of the World (MSW) may experience significant levels of taxonomic change from one edition to the next, with potential costs to long-term, large-scale information integration. This circumstance challenges the biodiversity and phylogenetic data communities to express taxonomic congruence and incongruence in ways that both humans and machines can process, that is, to logically represent taxonomic alignments across multiple classifications. We demonstrate that such alignments are feasible for two classifications of primates corresponding to the second and third MSW editions. Our approach has three main components: (i) use of taxonomic concept labels, that is name sec. author (where sec. means according to), to assemble each concept hierarchy separately via parent/child relationships; (ii) articulation of select concepts across the two hierarchies with user-provided Region Connection Calculus (RCC-5) relationships; and (iii) the use of an Answer Set Programming toolkit to infer and visualize logically consistent alignments of these input constraints. Our use case entails the Primates sec. Groves (1993; MSW2–317 taxonomic concepts; 233 at the species level) and Primates sec. Groves (2005; MSW3–483 taxonomic concepts; 376 at the species level). Using 402 RCC-5 input articulations, the reasoning process yields a single, consistent alignment and 153,111 Maximally Informative Relations that constitute a comprehensive meaning resolution map for every concept pair in the Primates sec. MSW2/MSW3. The complete alignment, and various partitions thereof, facilitate quantitative analyses of name:meaning dissociation, revealing that nearly one in three taxonomic names are not reliable across

  4. Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics

    PubMed Central

    Kolaczkowski, Bryan; Thornton, Joseph W.

    2009-01-01

    Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis. PMID:20011052

  5. Long-branch attraction bias and inconsistency in Bayesian phylogenetics.

    PubMed

    Kolaczkowski, Bryan; Thornton, Joseph W

    2009-12-09

    Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias--which is apparent under both controlled simulation conditions and in analyses of empirical sequence data--also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages--that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.

  6. Prioritizing Populations for Conservation Using Phylogenetic Networks

    PubMed Central

    Volkmann, Logan; Martyn, Iain; Moulton, Vincent; Spillner, Andreas; Mooers, Arne O.

    2014-01-01

    In the face of inevitable future losses to biodiversity, ranking species by conservation priority seems more than prudent. Setting conservation priorities within species (i.e., at the population level) may be critical as species ranges become fragmented and connectivity declines. However, existing approaches to prioritization (e.g., scoring organisms by their expected genetic contribution) are based on phylogenetic trees, which may be poor representations of differentiation below the species level. In this paper we extend evolutionary isolation indices used in conservation planning from phylogenetic trees to phylogenetic networks. Such networks better represent population differentiation, and our extension allows populations to be ranked in order of their expected contribution to the set. We illustrate the approach using data from two imperiled species: the spotted owl Strix occidentalis in North America and the mountain pygmy-possum Burramys parvus in Australia. Using previously published mitochondrial and microsatellite data, we construct phylogenetic networks and score each population by its relative genetic distinctiveness. In both cases, our phylogenetic networks capture the geographic structure of each species: geographically peripheral populations harbor less-redundant genetic information, increasing their conservation rankings. We note that our approach can be used with all conservation-relevant distances (e.g., those based on whole-genome, ecological, or adaptive variation) and suggest it be added to the assortment of tools available to wildlife managers for allocating effort among threatened populations. PMID:24586451

  7. Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling.

    PubMed

    Gerard, David; Gibbs, H Lisle; Kubatko, Laura

    2011-10-06

    A well-known characteristic of multi-locus data is that each locus has its own phylogenetic history which may differ substantially from the overall phylogenetic history of the species. Although the possibility that this arises through incomplete lineage sorting is often incorporated in models for the species-level phylogeny, it is much less common for hybridization to also be formally included in such models. We have modified the evolutionary model of Meng and Kubatko (2009) to incorporate intraspecific sampling of multiple individuals for estimation of speciation times and times of hybridization events for testing for hybridization in the presence of incomplete lineage sorting. We have also utilized a more efficient algorithm for obtaining our estimates. Using simulations, we demonstrate that our approach performs well under conditions motivated by an empirical data set for Sistrurus rattlesnakes where putative hybridization has occurred. We further demonstrate that the method is able to accurately detect the signature of hybridization in the data, while this signal may be obscured when other species-tree inference methods that ignore hybridization are used. Our approach is shown to be powerful in detecting hybridization when it is present. When applied to the Sistrurus data, we find no evidence of hybridization; instead, it appears that putative hybrid snakes in Missouri are most likely pure S. catenatus tergeminus in origin, which has significant conservation implications.

  8. Phylogenetic structure of soil bacterial communities predicts ecosystem functioning.

    PubMed

    Pérez-Valera, Eduardo; Goberna, Marta; Verdú, Miguel

    2015-05-01

    Quantifying diversity with phylogeny-informed metrics helps understand the effects of diversity on ecosystem functioning (EF). The sign of these effects remains controversial because phylogenetic diversity and taxonomic identity may interactively influence EF. Positive relationships, traditionally attributed to complementarity effects, seem unimportant in natural soil bacterial communities. Negative relationships could be attributed to fitness differences leading to the overrepresentation of few productive clades, a mechanism recently invoked to assemble soil bacteria communities. We tested in two ecosystems contrasting in terms of environmental heterogeneity whether two metrics of phylogenetic community structure, a simpler measure of phylogenetic diversity (NRI) and a more complex metric incorporating taxonomic identity (PCPS), correctly predict microbially mediated EF. We show that the relationship between phylogenetic diversity and EF depends on the taxonomic identity of the main coexisting lineages. Phylogenetic diversity was negatively related to EF in soils where a marked fertility gradient exists and a single and productive clade (Proteobacteria) outcompete other clades in the most fertile plots. However, phylogenetic diversity was unrelated to EF in soils where the fertility gradient is less marked and Proteobacteria coexist with other abundant lineages. Including the taxonomic identity of bacterial lineages in metrics of phylogenetic community structure allows the prediction of EF in both ecosystems. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Two fast and accurate heuristic RBF learning rules for data classification.

    PubMed

    Rouhani, Modjtaba; Javan, Dawood S

    2016-03-01

    This paper presents new Radial Basis Function (RBF) learning methods for classification problems. The proposed methods use some heuristics to determine the spreads, the centers and the number of hidden neurons of network in such a way that the higher efficiency is achieved by fewer numbers of neurons, while the learning algorithm remains fast and simple. To retain network size limited, neurons are added to network recursively until termination condition is met. Each neuron covers some of train data. The termination condition is to cover all training data or to reach the maximum number of neurons. In each step, the center and spread of the new neuron are selected based on maximization of its coverage. Maximization of coverage of the neurons leads to a network with fewer neurons and indeed lower VC dimension and better generalization property. Using power exponential distribution function as the activation function of hidden neurons, and in the light of new learning approaches, it is proved that all data became linearly separable in the space of hidden layer outputs which implies that there exist linear output layer weights with zero training error. The proposed methods are applied to some well-known datasets and the simulation results, compared with SVM and some other leading RBF learning methods, show their satisfactory and comparable performance. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. The neuron classification problem

    PubMed Central

    Bota, Mihail; Swanson, Larry W.

    2007-01-01

    A systematic account of neuron cell types is a basic prerequisite for determining the vertebrate nervous system global wiring diagram. With comprehensive lineage and phylogenetic information unavailable, a general ontology based on structure-function taxonomy is proposed and implemented in a knowledge management system, and a prototype analysis of select regions (including retina, cerebellum, and hypothalamus) presented. The supporting Brain Architecture Knowledge Management System (BAMS) Neuron ontology is online and its user interface allows queries about terms and their definitions, classification criteria based on the original literature and “Petilla Convention” guidelines, hierarchies, and relations—with annotations documenting each ontology entry. Combined with three BAMS modules for neural regions, connections between regions and neuron types, and molecules, the Neuron ontology provides a general framework for physical descriptions and computational modeling of neural systems. The knowledge management system interacts with other web resources, is accessible in both XML and RDF/OWL, is extendible to the whole body, and awaits large-scale data population requiring community participation for timely implementation. PMID:17582506

  11. A program to compute the soft Robinson-Foulds distance between phylogenetic networks.

    PubMed

    Lu, Bingxin; Zhang, Louxin; Leong, Hon Wai

    2017-03-14

    Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete. A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks. Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.

  12. Classification of Children Intelligence with Fuzzy Logic Method

    NASA Astrophysics Data System (ADS)

    Syahminan; ika Hidayati, Permata

    2018-04-01

    Intelligence of children s An Important Thing To Know The Parents Early on. Typing Can be done With a Child’s intelligence Grouping Dominant Characteristics Of each Type of Intelligence. To Make it easier for Parents in Determining The type of Children’s intelligence And How to Overcome them, for It Created A Classification System Intelligence Grouping Children By Using Fuzzy logic method For determination Of a Child’s degree of intelligence type. From the analysis We concluded that The presence of Intelligence Classification systems Pendulum Children With Fuzzy Logic Method Of determining The type of The Child’s intelligence Can be Done in a way That is easier And The results More accurate Conclusions Than Manual tests.

  13. The Complete Chloroplast Genome Sequences of Five Epimedium Species: Lights into Phylogenetic and Taxonomic Analyses

    PubMed Central

    Zhang, Yanjun; Du, Liuwen; Liu, Ao; Chen, Jianjun; Wu, Li; Hu, Weiming; Zhang, Wei; Kim, Kyunghee; Lee, Sang-Choon; Yang, Tae-Jin; Wang, Ying

    2016-01-01

    Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp) genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR) region and the single-copy (SC) boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR) and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants. PMID:27014326

  14. YBYRÁ facilitates comparison of large phylogenetic trees.

    PubMed

    Machado, Denis Jacob

    2015-07-01

    The number and size of tree topologies that are being compared by phylogenetic systematists is increasing due to technological advancements in high-throughput DNA sequencing. However, we still lack tools to facilitate comparison among phylogenetic trees with a large number of terminals. The "YBYRÁ" project integrates software solutions for data analysis in phylogenetics. It comprises tools for (1) topological distance calculation based on the number of shared splits or clades, (2) sensitivity analysis and automatic generation of sensitivity plots and (3) clade diagnoses based on different categories of synapomorphies. YBYRÁ also provides (4) an original framework to facilitate the search for potential rogue taxa based on how much they affect average matching split distances (using MSdist). YBYRÁ facilitates comparison of large phylogenetic trees and outperforms competing software in terms of usability and time efficiency, specially for large data sets. The programs that comprises this toolkit are written in Python, hence they do not require installation and have minimum dependencies. The entire project is available under an open-source licence at http://www.ib.usp.br/grant/anfibios/researchSoftware.html .

  15. Disentangling the phylogenetic and ecological components of spider phenotypic variation.

    PubMed

    Gonçalves-Souza, Thiago; Diniz-Filho, José Alexandre Felizola; Romero, Gustavo Quevedo

    2014-01-01

    An understanding of how the degree of phylogenetic relatedness influences the ecological similarity among species is crucial to inferring the mechanisms governing the assembly of communities. We evaluated the relative importance of spider phylogenetic relationships and ecological niche (plant morphological variables) to the variation in spider body size and shape by comparing spiders at different scales: (i) between bromeliads and dicot plants (i.e., habitat scale) and (ii) among bromeliads with distinct architectural features (i.e., microhabitat scale). We partitioned the interspecific variation in body size and shape into phylogenetic (that express trait values as expected by phylogenetic relationships among species) and ecological components (that express trait values independent of phylogenetic relationships). At the habitat scale, bromeliad spiders were larger and flatter than spiders associated with the surrounding dicots. At this scale, plant morphology sorted out close related spiders. Our results showed that spider flatness is phylogenetically clustered at the habitat scale, whereas it is phylogenetically overdispersed at the microhabitat scale, although phylogenic signal is present in both scales. Taken together, these results suggest that whereas at the habitat scale selective colonization affect spider body size and shape, at fine scales both selective colonization and adaptive evolution determine spider body shape. By partitioning the phylogenetic and ecological components of phenotypic variation, we were able to disentangle the evolutionary history of distinct spider traits and show that plant architecture plays a role in the evolution of spider body size and shape. We also discussed the relevance in considering multiple scales when studying phylogenetic community structure.

  16. Disentangling the Phylogenetic and Ecological Components of Spider Phenotypic Variation

    PubMed Central

    Gonçalves-Souza, Thiago; Diniz-Filho, José Alexandre Felizola; Romero, Gustavo Quevedo

    2014-01-01

    An understanding of how the degree of phylogenetic relatedness influences the ecological similarity among species is crucial to inferring the mechanisms governing the assembly of communities. We evaluated the relative importance of spider phylogenetic relationships and ecological niche (plant morphological variables) to the variation in spider body size and shape by comparing spiders at different scales: (i) between bromeliads and dicot plants (i.e., habitat scale) and (ii) among bromeliads with distinct architectural features (i.e., microhabitat scale). We partitioned the interspecific variation in body size and shape into phylogenetic (that express trait values as expected by phylogenetic relationships among species) and ecological components (that express trait values independent of phylogenetic relationships). At the habitat scale, bromeliad spiders were larger and flatter than spiders associated with the surrounding dicots. At this scale, plant morphology sorted out close related spiders. Our results showed that spider flatness is phylogenetically clustered at the habitat scale, whereas it is phylogenetically overdispersed at the microhabitat scale, although phylogenic signal is present in both scales. Taken together, these results suggest that whereas at the habitat scale selective colonization affect spider body size and shape, at fine scales both selective colonization and adaptive evolution determine spider body shape. By partitioning the phylogenetic and ecological components of phenotypic variation, we were able to disentangle the evolutionary history of distinct spider traits and show that plant architecture plays a role in the evolution of spider body size and shape. We also discussed the relevance in considering multiple scales when studying phylogenetic community structure. PMID:24651264

  17. Texture as a basis for acoustic classification of substrate in the nearshore region

    NASA Astrophysics Data System (ADS)

    Dennison, A.; Wattrus, N. J.

    2016-12-01

    Segmentation and classification of substrate type from two locations in Lake Superior, are predicted using multivariate statistical processing of textural measures derived from shallow-water, high-resolution multibeam bathymetric data. During a multibeam sonar survey, both bathymetric and backscatter data are collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on substrate type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. Preliminary results from an analysis of bathymetric data and ground-truth samples collected from the Amnicon River, Superior, Wisconsin, and the Lester River, Duluth, Minnesota, demonstrate the ability to process and develop a novel classification scheme of the bottom type in two geomorphologically distinct areas.

  18. Gastric precancerous diseases classification using CNN with a concise model.

    PubMed

    Zhang, Xu; Hu, Weiling; Chen, Fei; Liu, Jiquan; Yang, Yuanhang; Wang, Liangjing; Duan, Huilong; Si, Jianmin

    2017-01-01

    Gastric precancerous diseases (GPD) may deteriorate into early gastric cancer if misdiagnosed, so it is important to help doctors recognize GPD accurately and quickly. In this paper, we realize the classification of 3-class GPD, namely, polyp, erosion, and ulcer using convolutional neural networks (CNN) with a concise model called the Gastric Precancerous Disease Network (GPDNet). GPDNet introduces fire modules from SqueezeNet to reduce the model size and parameters about 10 times while improving speed for quick classification. To maintain classification accuracy with fewer parameters, we propose an innovative method called iterative reinforced learning (IRL). After training GPDNet from scratch, we apply IRL to fine-tune the parameters whose values are close to 0, and then we take the modified model as a pretrained model for the next training. The result shows that IRL can improve the accuracy about 9% after 6 iterations. The final classification accuracy of our GPDNet was 88.90%, which is promising for clinical GPD recognition.

  19. Headache classification: criticism and suggestions.

    PubMed

    Manzoni, G C; Torelli, P

    2004-10-01

    The International Classification of Headache Disorders 2nd Edition (ICHD-II), published in 2004, marks an unquestionable progress from the preceding 1988 edition, but the in-depth analysis it offers is not immune from drawbacks and shortcomings. First of all, it is still basically a classification of attacks and not of syndromes. For the migraine group, while the revised classification more accurately characterises migraine with aura, it fails to provide a sufficiently structured description of those forms of migraine without aura that over the years evolve to so-called daily chronic forms. These forms are not adequately recognised as chronic migraine, which ICHD-II includes among the complications of migraine. The inclusion of short-lasting unilateral neuralgiform headache attacks with conjunctival injection and tearing (SUNCT) in the cluster headache group is bound to generate some perplexity, while the recognition of new daily persistent headache (NDPH) included in the group of other primary headaches as a separate clinical entity appears somewhat premature. Doubts are also raised by the actual existence of triptan-overuse headache, which ICHD-II includes in Group 8 among medication-overuse headaches. Finally, the addition of headache attributed to psychiatric disorder, which is certainly a good option in perspective, is not yet supported by an adequate systematisation.

  20. Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf.

    PubMed

    Cardona, Gabriel; Mir, Arnau; Rosselló, Francesc; Rotger, Lucía; Sánchez, David

    2013-01-16

    Phylogenetic tree comparison metrics are an important tool in the study of evolution, and hence the definition of such metrics is an interesting problem in phylogenetics. In a paper in Taxon fifty years ago, Sokal and Rohlf proposed to measure quantitatively the difference between a pair of phylogenetic trees by first encoding them by means of their half-matrices of cophenetic values, and then comparing these matrices. This idea has been used several times since then to define dissimilarity measures between phylogenetic trees but, to our knowledge, no proper metric on weighted phylogenetic trees with nested taxa based on this idea has been formally defined and studied yet. Actually, the cophenetic values of pairs of different taxa alone are not enough to single out phylogenetic trees with weighted arcs or nested taxa. For every (rooted) phylogenetic tree T, let its cophenetic vectorφ(T) consist of all pairs of cophenetic values between pairs of taxa in T and all depths of taxa in T. It turns out that these cophenetic vectors single out weighted phylogenetic trees with nested taxa. We then define a family of cophenetic metrics dφ,p by comparing these cophenetic vectors by means of Lp norms, and we study, either analytically or numerically, some of their basic properties: neighbors, diameter, distribution, and their rank correlation with each other and with other metrics. The cophenetic metrics can be safely used on weighted phylogenetic trees with nested taxa and no restriction on degrees, and they can be computed in O(n2) time, where n stands for the number of taxa. The metrics dφ,1 and dφ,2 have positive skewed distributions, and they show a low rank correlation with the Robinson-Foulds metric and the nodal metrics, and a very high correlation with each other and with the splitted nodal metrics. The diameter of dφ,p, for p⩾1 , is in O(n(p+2)/p), and thus for low p they are more discriminative, having a wider range of values.

  1. Comparison of two Classification methods (MLC and SVM) to extract land use and land cover in Johor Malaysia

    NASA Astrophysics Data System (ADS)

    Rokni Deilmai, B.; Ahmad, B. Bin; Zabihi, H.

    2014-06-01

    Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification.

  2. The problem and promise of scale dependency in community phylogenetics.

    PubMed

    Swenson, Nathan G; Enquist, Brian J; Pither, Jason; Thompson, Jill; Zimmerman, Jess K

    2006-10-01

    The problem of scale dependency is widespread in investigations of ecological communities. Null model investigations of community assembly exemplify the challenges involved because they typically include subjectively defined "regional species pools." The burgeoning field of community phylogenetics appears poised to face similar challenges. Our objective is to quantify the scope of the problem of scale dependency by comparing the phylogenetic structure of assemblages across contrasting geographic and taxonomic scales. We conduct phylogenetic analyses on communities within three tropical forests, and perform a sensitivity analysis with respect to two scaleable inputs: taxonomy and species pool size. We show that (1) estimates of phylogenetic overdispersion within local assemblages depend strongly on the taxonomic makeup of the local assemblage and (2) comparing the phylogenetic structure of a local assemblage to a species pool drawn from increasingly larger geographic scales results in an increased signal of phylogenetic clustering. We argue that, rather than posing a problem, "scale sensitivities" are likely to reveal general patterns of diversity that could help identify critical scales at which local or regional influences gain primacy for the structuring of communities. In this way, community phylogenetics promises to fill an important gap in community ecology and biogeography research.

  3. Student Interpretations of Phylogenetic Trees in an Introductory Biology Course

    PubMed Central

    Dees, Jonathan; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa relatedness on phylogenetic trees, to measure the prevalence of correct taxa-relatedness interpretations, and to determine how student reasoning and correctness change in response to instruction and over time. Counting synapomorphies and nodes between taxa were the most common forms of incorrect reasoning, which presents a pedagogical dilemma concerning labeled synapomorphies on phylogenetic trees. Students also independently generated an alternative form of correct reasoning using monophyletic groups, the use of which decreased in popularity over time. Approximately half of all students were able to correctly interpret taxa relatedness on phylogenetic trees, and many memorized correct reasoning without understanding its application. Broad initial instruction that allowed students to generate inferences on their own contributed very little to phylogenetic tree understanding, while targeted instruction on evolutionary relationships improved understanding to some extent. Phylogenetic trees, which can directly affect student understanding of evolution, appear to offer introductory biology instructors a formidable pedagogical challenge. PMID:25452489

  4. A framework for classification of prokaryotic protein kinases.

    PubMed

    Tyagi, Nidhi; Anamika, Krishanpal; Srinivasan, Narayanaswamy

    2010-05-26

    Overwhelming majority of the Serine/Threonine protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Serine/Threonine protein kinases recognized from genomes of prokaryotes have been used to develop a classification framework for prokaryotic Ser/Thr protein kinases. We have used traditional sequence alignment and phylogenetic approaches and clustered the prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence database we recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, we also identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses. Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity and protein-protein interactions in the

  5. Novel gene sets improve set-level classification of prokaryotic gene expression data.

    PubMed

    Holec, Matěj; Kuželka, Ondřej; Železný, Filip

    2015-10-28

    Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.

  6. A methodological investigation of hominoid craniodental morphology and phylogenetics.

    PubMed

    Bjarnason, Alexander; Chamberlain, Andrew T; Lockwood, Charles A

    2011-01-01

    The evolutionary relationships of extant great apes and humans have been largely resolved by molecular studies, yet morphology-based phylogenetic analyses continue to provide conflicting results. In order to further investigate this discrepancy we present bootstrap clade support of morphological data based on two quantitative datasets, one dataset consisting of linear measurements of the whole skull from 5 hominoid genera and the second dataset consisting of 3D landmark data from the temporal bone of 5 hominoid genera, including 11 sub-species. Using similar protocols for both datasets, we were able to 1) compare distance-based phylogenetic methods to cladistic parsimony of quantitative data converted into discrete character states, 2) vary outgroup choice to observe its effect on phylogenetic inference, and 3) analyse male and female data separately to observe the effect of sexual dimorphism on phylogenies. Phylogenetic analysis was sensitive to methodological decisions, particularly outgroup selection, where designation of Pongo as an outgroup and removal of Hylobates resulted in greater congruence with the proposed molecular phylogeny. The performance of distance-based methods also justifies their use in phylogenetic analysis of morphological data. It is clear from our analyses that hominoid phylogenetics ought not to be used as an example of conflict between the morphological and molecular, but as an example of how outgroup and methodological choices can affect the outcome of phylogenetic analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.

  7. Spatial Mutual Information Based Hyperspectral Band Selection for Classification

    PubMed Central

    2015-01-01

    The amount of information involved in hyperspectral imaging is large. Hyperspectral band selection is a popular method for reducing dimensionality. Several information based measures such as mutual information have been proposed to reduce information redundancy among spectral bands. Unfortunately, mutual information does not take into account the spatial dependency between adjacent pixels in images thus reducing its robustness as a similarity measure. In this paper, we propose a new band selection method based on spatial mutual information. As validation criteria, a supervised classification method using support vector machine (SVM) is used. Experimental results of the classification of hyperspectral datasets show that the proposed method can achieve more accurate results. PMID:25918742

  8. Phylogenetic Analyses of Armillaria Reveal at Least 15 Phylogenetic Lineages in China, Seven of Which Are Associated with Cultivated Gastrodia elata

    PubMed Central

    Guo, Ting; Wang, Han Chen; Xue, Wan Qiu; Zhao, Jun; Yang, Zhu L.

    2016-01-01

    Fungal species of Armillaria, which can act as plant pathogens and/or symbionts of the Chinese traditional medicinal herb Gastrodia elata (“Tianma”), are ecologically and economically important and have consequently attracted the attention of mycologists. However, their taxonomy has been highly dependent on morphological characterization and mating tests. In this study, we phylogenetically analyzed Chinese Armillaria samples using the sequences of the internal transcribed spacer region, translation elongation factor-1 alpha gene and beta-tubulin gene. Our data revealed at least 15 phylogenetic lineages of Armillaria from China, of which seven were newly discovered and two were recorded from China for the first time. Fourteen Chinese biological species of Armillaria, which were previously defined based on mating tests, could be assigned to the 15 phylogenetic lineages identified herein. Seven of the 15 phylogenetic lineages were found to be disjunctively distributed in different continents of the Northern Hemisphere, while eight were revealed to be endemic to certain continents. In addition, we found that seven phylogenetic lineages of Armillaria were used for the cultivation of Tianma, only two of which had been recorded to be associated with Tianma previously. We also illustrated that G. elata f. glauca (“Brown Tianma”) and G. elata f. elata (“Red Tianma”), two cultivars of Tianma grown in different regions of China, form symbiotic relationships with different phylogenetic lineages of Armillaria. These findings should aid the development of Tianma cultivation in China. PMID:27138686

  9. Applying species-tree analyses to deep phylogenetic histories: challenges and potential suggested from a survey of empirical phylogenetic studies.

    PubMed

    Lanier, Hayley C; Knowles, L Lacey

    2015-02-01

    Coalescent-based methods for species-tree estimation are becoming a dominant approach for reconstructing species histories from multi-locus data, with most of the studies examining these methodologies focused on recently diverged species. However, deeper phylogenies, such as the datasets that comprise many Tree of Life (ToL) studies, also exhibit gene-tree discordance. This discord may also arise from the stochastic sorting of gene lineages during the speciation process (i.e., reflecting the random coalescence of gene lineages in ancestral populations). It remains unknown whether guidelines regarding methodologies and numbers of loci established by simulation studies at shallow tree depths translate into accurate species relationships for deeper phylogenetic histories. We address this knowledge gap and specifically identify the challenges and limitations of species-tree methods that account for coalescent variance for deeper phylogenies. Using simulated data with characteristics informed by empirical studies, we evaluate both the accuracy of estimated species trees and the characteristics associated with recalcitrant nodes, with a specific focus on whether coalescent variance is generally responsible for the lack of resolution. By determining the proportion of coalescent genealogies that support a particular node, we demonstrate that (1) species-tree methods account for coalescent variance at deep nodes and (2) mutational variance - not gene-tree discord arising from the coalescent - posed the primary challenge for accurate reconstruction across the tree. For example, many nodes were accurately resolved despite predicted discord from the random coalescence of gene lineages and nodes with poor support were distributed across a range of depths (i.e., they were not restricted to a particular recent divergences). Given their broad taxonomic scope and large sampling of taxa, deep level phylogenies pose several potential methodological complications including

  10. Deep learning for brain tumor classification

    NASA Astrophysics Data System (ADS)

    Paul, Justin S.; Plassard, Andrew J.; Landman, Bennett A.; Fabbri, Daniel

    2017-03-01

    Recent research has shown that deep learning methods have performed well on supervised machine learning, image classification tasks. The purpose of this study is to apply deep learning methods to classify brain images with different tumor types: meningioma, glioma, and pituitary. A dataset was publicly released containing 3,064 T1-weighted contrast enhanced MRI (CE-MRI) brain images from 233 patients with either meningioma, glioma, or pituitary tumors split across axial, coronal, or sagittal planes. This research focuses on the 989 axial images from 191 patients in order to avoid confusing the neural networks with three different planes containing the same diagnosis. Two types of neural networks were used in classification: fully connected and convolutional neural networks. Within these two categories, further tests were computed via the augmentation of the original 512×512 axial images. Training neural networks over the axial data has proven to be accurate in its classifications with an average five-fold cross validation of 91.43% on the best trained neural network. This result demonstrates that a more general method (i.e. deep learning) can outperform specialized methods that require image dilation and ring-forming subregions on tumors.

  11. Classification of Pelteobagrus fish in Poyang Lake based on mitochondrial COI gene sequence.

    PubMed

    Zhong, Bin; Chen, Ting-Ting; Gong, Rui-Yue; Zhao, Zhe-Xia; Wang, Binhua; Fang, Chunlin; Mao, Hui-Ling

    2016-11-01

    We use DNA molecular marker technology to correct the deficiency of traditional morphological taxonomy. Totality 770 Pelteobagrus fish from Poyang Lake were collected. After preliminary morphological classification, random selected eight samples in each species for DNA extraction. Mitochondrial COI gene sequence was cloned with universal primers and sequenced. The results showed that there are four species of Pelteobagrus living in Poyang Lake. The average of intraspecific genetic distance value was 0.003, while the average interspecific genetic distance was 0.128. The interspecific genetic distance is far more than intraspecific genetic distance. Besides, phylogenetic tree analysis revealed that molecular systematics was in accord with morphological classification. It indicated that COI gene is an effective DNA molecular marker in Pelteobagrus classification. Surprisingly, the intraspecific difference of some individuals (P. e6, P. n6, P. e5, and P. v4) from their original named exceeded species threshold (2%), which should be renewedly classified into Pelteobagrus fulvidraco. However, another individual P. v3 was very different, because its genetic distance was over 8.4% difference from original named Pelteobagrus vachelli. Its taxonomic status remained to be further studied.

  12. BEASTling: A software tool for linguistic phylogenetics using BEAST 2

    PubMed Central

    Forkel, Robert; Kaiping, Gereon A.; Atkinson, Quentin D.

    2017-01-01

    We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts. PMID:28796784

  13. Climate-driven extinctions shape the phylogenetic structure of temperate tree floras.

    PubMed

    Eiserhardt, Wolf L; Borchsenius, Finn; Plum, Christoffer M; Ordonez, Alejandro; Svenning, Jens-Christian

    2015-03-01

    When taxa go extinct, unique evolutionary history is lost. If extinction is selective, and the intrinsic vulnerabilities of taxa show phylogenetic signal, more evolutionary history may be lost than expected under random extinction. Under what conditions this occurs is insufficiently known. We show that late Cenozoic climate change induced phylogenetically selective regional extinction of northern temperate trees because of phylogenetic signal in cold tolerance, leading to significantly and substantially larger than random losses of phylogenetic diversity (PD). The surviving floras in regions that experienced stronger extinction are phylogenetically more clustered, indicating that non-random losses of PD are of increasing concern with increasing extinction severity. Using simulations, we show that a simple threshold model of survival given a physiological trait with phylogenetic signal reproduces our findings. Our results send a strong warning that we may expect future assemblages to be phylogenetically and possibly functionally depauperate if anthropogenic climate change affects taxa similarly. © 2015 John Wiley & Sons Ltd/CNRS.

  14. Increased phylogenetic resolution using target enrichment in Rubus

    USDA-ARS?s Scientific Manuscript database

    Phylogenetic analyses in Rubus L. have been challenging due to polyploidy, hybridization, and apomixis within the genus. Wide morphological diversity occurs within and between species, contributing to challenges at lower and higher systematic levels. Phylogenetic inferences to date have been based o...

  15. Fuzzy Classification of High Resolution Remote Sensing Scenes Using Visual Attention Features.

    PubMed

    Li, Linyi; Xu, Tingbao; Chen, Yun

    2017-01-01

    In recent years the spatial resolutions of remote sensing images have been improved greatly. However, a higher spatial resolution image does not always lead to a better result of automatic scene classification. Visual attention is an important characteristic of the human visual system, which can effectively help to classify remote sensing scenes. In this study, a novel visual attention feature extraction algorithm was proposed, which extracted visual attention features through a multiscale process. And a fuzzy classification method using visual attention features (FC-VAF) was developed to perform high resolution remote sensing scene classification. FC-VAF was evaluated by using remote sensing scenes from widely used high resolution remote sensing images, including IKONOS, QuickBird, and ZY-3 images. FC-VAF achieved more accurate classification results than the others according to the quantitative accuracy evaluation indices. We also discussed the role and impacts of different decomposition levels and different wavelets on the classification accuracy. FC-VAF improves the accuracy of high resolution scene classification and therefore advances the research of digital image analysis and the applications of high resolution remote sensing images.

  16. Interpreting the universal phylogenetic tree

    NASA Technical Reports Server (NTRS)

    Woese, C. R.

    2000-01-01

    The universal phylogenetic tree not only spans all extant life, but its root and earliest branchings represent stages in the evolutionary process before modern cell types had come into being. The evolution of the cell is an interplay between vertically derived and horizontally acquired variation. Primitive cellular entities were necessarily simpler and more modular in design than are modern cells. Consequently, horizontal gene transfer early on was pervasive, dominating the evolutionary dynamic. The root of the universal phylogenetic tree represents the first stage in cellular evolution when the evolving cell became sufficiently integrated and stable to the erosive effects of horizontal gene transfer that true organismal lineages could exist.

  17. Phylogenetics beyond biology.

    PubMed

    Retzlaff, Nancy; Stadler, Peter F

    2018-06-21

    Evolutionary processes have been described not only in biology but also for a wide range of human cultural activities including languages and law. In contrast to the evolution of DNA or protein sequences, the detailed mechanisms giving rise to the observed evolution-like processes are not or only partially known. The absence of a mechanistic model of evolution implies that it remains unknown how the distances between different taxa have to be quantified. Considering distortions of metric distances, we first show that poor choices of the distance measure can lead to incorrect phylogenetic trees. Based on the well-known fact that phylogenetic inference requires additive metrics, we then show that the correct phylogeny can be computed from a distance matrix [Formula: see text] if there is a monotonic, subadditive function [Formula: see text] such that [Formula: see text] is additive. The required metric-preserving transformation [Formula: see text] can be computed as the solution of an optimization problem. This result shows that the problem of phylogeny reconstruction is well defined even if a detailed mechanistic model of the evolutionary process remains elusive.

  18. Power law tails in phylogenetic systems.

    PubMed

    Qin, Chongli; Colwell, Lucy J

    2018-01-23

    Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.

  19. Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM

    PubMed Central

    Zhao, Zhizhen; Singer, Amit

    2014-01-01

    We introduce a new rotationally invariant viewing angle classification method for identifying, among a large number of cryo-EM projection images, similar views without prior knowledge of the molecule. Our rotationally invariant features are based on the bispectrum. Each image is denoised and compressed using steerable principal component analysis (PCA) such that rotating an image is equivalent to phase shifting the expansion coefficients. Thus we are able to extend the theory of bispectrum of 1D periodic signals to 2D images. The randomized PCA algorithm is then used to efficiently reduce the dimensionality of the bispectrum coefficients, enabling fast computation of the similarity between any pair of images. The nearest neighbors provide an initial classification of similar viewing angles. In this way, rotational alignment is only performed for images with their nearest neighbors. The initial nearest neighbor classification and alignment are further improved by a new classification method called vector diffusion maps. Our pipeline for viewing angle classification and alignment is experimentally shown to be faster and more accurate than reference-free alignment with rotationally invariant K-means clustering, MSA/MRA 2D classification, and their modern approximations. PMID:24631969

  20. Phylogenetic tree construction based on 2D graphical representation

    NASA Astrophysics Data System (ADS)

    Liao, Bo; Shan, Xinzhou; Zhu, Wen; Li, Renfa

    2006-04-01

    A new approach based on the two-dimensional (2D) graphical representation of the whole genome sequence [Bo Liao, Chem. Phys. Lett., 401(2005) 196.] is proposed to analyze the phylogenetic relationships of genomes. The evolutionary distances are obtained through measuring the differences among the 2D curves. The fuzzy theory is used to construct phylogenetic tree. The phylogenetic relationships of H5N1 avian influenza virus illustrate the utility of our approach.

  1. Three-Way Analysis of Spectrospatial Electromyography Data: Classification and Interpretation

    PubMed Central

    Kauppi, Jukka-Pekka; Hahne, Janne; Müller, Klaus-Robert; Hyvärinen, Aapo

    2015-01-01

    Classifying multivariate electromyography (EMG) data is an important problem in prosthesis control as well as in neurophysiological studies and diagnosis. With modern high-density EMG sensor technology, it is possible to capture the rich spectrospatial structure of the myoelectric activity. We hypothesize that multi-way machine learning methods can efficiently utilize this structure in classification as well as reveal interesting patterns in it. To this end, we investigate the suitability of existing three-way classification methods to EMG-based hand movement classification in spectrospatial domain, as well as extend these methods by sparsification and regularization. We propose to use Fourier-domain independent component analysis as preprocessing to improve classification and interpretability of the results. In high-density EMG experiments on hand movements across 10 subjects, three-way classification yielded higher average performance compared with state-of-the art classification based on temporal features, suggesting that the three-way analysis approach can efficiently utilize detailed spectrospatial information of high-density EMG. Phase and amplitude patterns of features selected by the classifier in finger-movement data were found to be consistent with known physiology. Thus, our approach can accurately resolve hand and finger movements on the basis of detailed spectrospatial information, and at the same time allows for physiological interpretation of the results. PMID:26039100

  2. Contrasting biodiversity-ecosystem functioning relationships in phylogenetic and functional diversity.

    PubMed

    Steudel, Bastian; Hallmann, Christine; Lorenz, Maike; Abrahamczyk, Stefan; Prinz, Kathleen; Herrfurth, Cornelia; Feussner, Ivo; Martini, Johannes W R; Kessler, Michael

    2016-10-01

    It is well known that ecosystem functioning is positively influenced by biodiversity. Most biodiversity-ecosystem functioning experiments have measured biodiversity based on species richness or phylogenetic relationships. However, theoretical and empirical evidence suggests that ecosystem functioning should be more closely related to functional diversity than to species richness. We applied different metrics of biodiversity in an artificial biodiversity-ecosystem functioning experiment using 64 species of green microalgae in combinations of two to 16 species. We found that phylogenetic and functional diversity were positively correlated with biomass overyield, driven by their strong correlation with species richness. At low species richness, no significant correlation between overyield and functional and phylogenetic diversity was found. However, at high species richness (16 species), we found a positive relationship of overyield with functional diversity and a negative relationship with phylogenetic diversity. We show that negative phylogenetic diversity-ecosystem functioning relationships can result from interspecific growth inhibition. The opposing performances of facilitation (functional diversity) and inhibition (phylogenetic diversity) we observed at the 16 species level suggest that phylogenetic diversity is not always a good proxy for functional diversity and that results from experiments with low species numbers may underestimate negative species interactions. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  3. The performance of the Congruence Among Distance Matrices (CADM) test in phylogenetic analysis

    PubMed Central

    2011-01-01

    Background CADM is a statistical test used to estimate the level of Congruence Among Distance Matrices. It has been shown in previous studies to have a correct rate of type I error and good power when applied to dissimilarity matrices and to ultrametric distance matrices. Contrary to most other tests of incongruence used in phylogenetic analysis, the null hypothesis of the CADM test assumes complete incongruence of the phylogenetic trees instead of congruence. In this study, we performed computer simulations to assess the type I error rate and power of the test. It was applied to additive distance matrices representing phylogenies and to genetic distance matrices obtained from nucleotide sequences of different lengths that were simulated on randomly generated trees of varying sizes, and under different evolutionary conditions. Results Our results showed that the test has an accurate type I error rate and good power. As expected, power increased with the number of objects (i.e., taxa), the number of partially or completely congruent matrices and the level of congruence among distance matrices. Conclusions Based on our results, we suggest that CADM is an excellent candidate to test for congruence and, when present, to estimate its level in phylogenomic studies where numerous genes are analysed simultaneously. PMID:21388552

  4. A thyroid nodule classification method based on TI-RADS

    NASA Astrophysics Data System (ADS)

    Wang, Hao; Yang, Yang; Peng, Bo; Chen, Qin

    2017-07-01

    Thyroid Imaging Reporting and Data System(TI-RADS) is a valuable tool for differentiating the benign and the malignant thyroid nodules. In clinic, doctors can determine the extent of being benign or malignant in terms of different classes by using TI-RADS. Classification represents the degree of malignancy of thyroid nodules. TI-RADS as a classification standard can be used to guide the ultrasonic doctor to examine thyroid nodules more accurately and reliably. In this paper, we aim to classify the thyroid nodules with the help of TI-RADS. To this end, four ultrasound signs, i.e., cystic and solid, echo pattern, boundary feature and calcification of thyroid nodules are extracted and converted into feature vectors. Then semi-supervised fuzzy C-means ensemble (SS-FCME) model is applied to obtain the classification results. The experimental results demonstrate that the proposed method can help doctors diagnose the thyroid nodules effectively.

  5. The Optimization of Trained and Untrained Image Classification Algorithms for Use on Large Spatial Datasets

    NASA Technical Reports Server (NTRS)

    Kocurek, Michael J.

    2005-01-01

    The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.

  6. Estimating phylogenetic trees from genome-scale data.

    PubMed

    Liu, Liang; Xi, Zhenxiang; Wu, Shaoyuan; Davis, Charles C; Edwards, Scott V

    2015-12-01

    The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data. © 2015 New York Academy of Sciences.

  7. Quantifying MCMC exploration of phylogenetic tree space.

    PubMed

    Whidden, Chris; Matsen, Frederick A

    2015-05-01

    In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  8. The phylogenetic structure of plant-pollinator networks increases with habitat size and isolation.

    PubMed

    Aizen, Marcelo A; Gleiser, Gabriela; Sabatino, Malena; Gilarranz, Luis J; Bascompte, Jordi; Verdú, Miguel

    2016-01-01

    Similarity among species in traits related to ecological interactions is frequently associated with common ancestry. Thus, closely related species usually interact with ecologically similar partners, which can be reinforced by diverse co-evolutionary processes. The effect of habitat fragmentation on the phylogenetic signal in interspecific interactions and correspondence between plant and animal phylogenies is, however, unknown. Here, we address to what extent phylogenetic signal and co-phylogenetic congruence of plant-animal interactions depend on habitat size and isolation by analysing the phylogenetic structure of 12 pollination webs from isolated Pampean hills. Phylogenetic signal in interspecific interactions differed among webs, being stronger for flower-visiting insects than plants. Phylogenetic signal and overall co-phylogenetic congruence increased independently with hill size and isolation. We propose that habitat fragmentation would erode the phylogenetic structure of interaction webs. A decrease in phylogenetic signal and co-phylogenetic correspondence in plant-pollinator interactions could be associated with less reliable mutualism and erratic co-evolutionary change. © 2015 John Wiley & Sons Ltd/CNRS.

  9. Automatic Classification of Medical Text: The Influence of Publication Form1

    PubMed Central

    Cole, William G.; Michael, Patricia A.; Stewart, James G.; Blois, Marsden S.

    1988-01-01

    Previous research has shown that within the domain of medical journal abstracts the statistical distribution of words is neither random nor uniform, but is highly characteristic. Many words are used mainly or solely by one medical specialty or when writing about one particular level of description. Due to this regularity of usage, automatic classification within journal abstracts has proved quite successful. The present research asks two further questions. It investigates whether this statistical regularity and automatic classification success can also be achieved in medical textbook chapters. It then goes on to see whether the statistical distribution found in textbooks is sufficiently similar to that found in abstracts to permit accurate classification of abstracts based solely on previous knowledge of textbooks. 14 textbook chapters and 45 MEDLINE abstracts were submitted to an automatic classification program that had been trained only on chapters drawn from a standard textbook series. Statistical analysis of the properties of abstracts vs. chapters revealed important differences in word use. Automatic classification performance was good for chapters, but poor for abstracts.

  10. Automatic Classification of Time-variable X-Ray Sources

    NASA Astrophysics Data System (ADS)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  11. Automatic classification of time-variable X-ray sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, andmore » other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.« less

  12. Classification of Strawberry Fruit Shape by Machine Learning

    NASA Astrophysics Data System (ADS)

    Ishikawa, T.; Hayashi, A.; Nagamatsu, S.; Kyutoku, Y.; Dan, I.; Wada, T.; Oku, K.; Saeki, Y.; Uto, T.; Tanabata, T.; Isobe, S.; Kochi, N.

    2018-05-01

    Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning methods to other plant species.

  13. Phylogenetics of Saccharomycetales, the ascomycete yeasts.

    PubMed

    Suh, Sung-Oui; Blackwell, Meredith; Kurtzman, Cletus P; Lachance, Marc-André

    2006-01-01

    Ascomycete yeasts (phylum Ascomycota: subphylum Saccharomycotina: class Saccharomycetes: order Saccharomycetales) comprise a monophyletic lineage with a single order of about 1000 known species. These yeasts live as saprobes, often in association with plants, animals and their interfaces. A few species account for most human mycotic infections, and fewer than 10 species are plant pathogens. Yeasts are responsible for important industrial and biotechnological processes, including baking, brewing and synthesis of recombinant proteins. Species such as Saccharomyces cerevisiae are model organisms in research, some of which led to a Nobel Prize. Yeasts usually reproduce asexually by budding, and their sexual states are not enclosed in a fruiting body. The group also is well defined by synapomorphies visible at the ultrastructural level. Yeast identification and classification changed dramatically with the availability of DNA sequencing. Species identification now benefits from a constantly updated sequence database and no longer relies on ambiguous growth tests. A phylogeny based on single gene analyses has shown the order to be remarkably divergent despite morphological similarities among members. The limits of many previously described genera are not supported by sequence comparisons, and multigene phylogenetic studies are under way to provide a stable circumscription of genera, families and orders. One recent multigene study has resolved species of the Saccharomycetaceae into genera that differ markedly from those defined by analysis of morphology and growth responses, and similar changes are likely to occur in other branches of the yeast tree as additional sequences become available.

  14. Auto-simultaneous laser treatment and Ohshiro's classification of laser treatment

    NASA Astrophysics Data System (ADS)

    Ohshiro, Toshio

    2005-07-01

    When the laser was first applied in medicine and surgery in the late 1960"s and early 1970"s, early adopters reported better wound healing and less postoperative pain with laser procedures compared with the same procedure performed with the cold scalpel or with electrothermy, and multiple surgical effects such as incision, vaporization and hemocoagulation could be achieved with the same laser beam. There was thus an added beneficial component which was associated only with laser surgery. This was first recognized as the `?-effect", was then classified by the author as simultaneous laser therapy, but is now more accurately classified by the author as part of the auto-simultaneous aspect of laser treatment. Indeed, with the dramatic increase of the applications of the laser in surgery and medicine over the last 2 decades there has been a parallel increase in the need for a standardized classification of laser treatment. Some classifications have been machine-based, and thus inaccurate because at appropriate parameters, a `low-power laser" can produce a surgical effect and a `high power laser", a therapeutic one . A more accurate classification based on the tissue reaction is presented, developed by the author. In addition to this, the author has devised a graphical representation of laser surgical and therapeutic beams whereby the laser type, parameters, penetration depth, and tissue reaction can all be shown in a single illustration, which the author has termed the `Laser Apple", due to the typical pattern generated when a laser beam is incident on tissue. Laser/tissue reactions fall into three broad groups. If the photoreaction in the tissue is irreversible, then it is classified as high-reactive level laser treatment (HLLT). If some irreversible damage occurs together with reversible photodamage, as in tissue welding, the author refers to this as mid-reactive level laser treatment (MLLT). If the level of reaction in the target tissue is lower than the cells

  15. Phylogenetic lineages in Entomophthoromycota

    USDA-ARS?s Scientific Manuscript database

    Entomophthoromycota Humber is one of five major phylogenetic lineages among the former phylum Zygomycota. These early terrestrial fungi share evolutionarily ancestral characters such as coenocytic mycelium and gametangiogamy as a sexual process resulting in zygospore formation. Previous molecular st...

  16. Phylogenetic context determines the role of competition in adaptive radiation

    PubMed Central

    Tan, Jiaqi; Slattery, Matthew R.; Yang, Xian; Jiang, Lin

    2016-01-01

    Understanding ecological mechanisms regulating the evolution of biodiversity is of much interest to ecologists and evolutionary biologists. Adaptive radiation constitutes an important evolutionary process that generates biodiversity. Competition has long been thought to influence adaptive radiation, but the directionality of its effect and associated mechanisms remain ambiguous. Here, we report a rigorous experimental test of the role of competition on adaptive radiation using the rapidly evolving bacterium Pseudomonas fluorescens SBW25 interacting with multiple bacterial species that differed in their phylogenetic distance to the diversifying bacterium. We showed that the inhibitive effect of competitors on the adaptive radiation of P. fluorescens decreased as their phylogenetic distance increased. To explain this phylogenetic dependency of adaptive radiation, we linked the phylogenetic distance between P. fluorescens and its competitors to their niche and competitive fitness differences. Competitive fitness differences, which showed weak phylogenetic signal, reduced P. fluorescens abundance and thus diversification, whereas phylogenetically conserved niche differences promoted diversification. These results demonstrate the context dependency of competitive effects on adaptive radiation, and highlight the importance of past evolutionary history for ongoing evolutionary processes. PMID:27335414

  17. Halitosis: a new definition and classification.

    PubMed

    Aydin, M; Harvey-Woodworth, C N

    2014-07-11

    There is no universally accepted, precise definition, nor standardisation in terminology and classification of halitosis. To propose a new definition, free from subjective descriptions (faecal, fish odour, etc), one-time sulphide detector readings and organoleptic estimation of odour levels, and excludes temporary exogenous odours (for example, from dietary sources). Some terms previously used in the literature are revised. A new aetiologic classification is proposed, dividing pathologic halitosis into Type 1 (oral), Type 2 (airway), Type 3 (gastroesophageal), Type 4 (blood-borne) and Type 5 (subjective). In reality, any halitosis complaint is potentially the sum of these types in any combination, superimposed on the Type 0 (physiologic odour) present in health. This system allows for multiple diagnoses in the same patient, reflecting the multifactorial nature of the complaint. It represents the most accurate model to understand halitosis and forms an efficient and logical basis for clinical management of the complaint.

  18. Phylogenetic sequence analysis, recombinant expression, and tissue distribution of a channel catfish estrogen receptor beta

    USGS Publications Warehouse

    Xia, Zhenfang; Gale, William L.; Chang, Xiaotian; Langenau, David; Patino, Reynaldo; Maule, Alec G.; Densmore, Llewellyn D.

    2000-01-01

    An estrogen receptor β (ERβ) cDNA fragment was amplified by RT-PCR of total RNAextracted from liver and ovary of immature channel catfish. This cDNA fragment was used to screen an ovarian cDNA library made from an immature female fish. A clone was obtained that contained an open reading frame encoding a 575-amino-acid protein with a deduced molecular weight of 63.9 kDa. Maximum parsimony and Neighbor Joining analyses were used to generate a phylogenetic classification of channel catfish ERβ on the basis of 25 full-length teleost and tetrapod ER sequences. The consensus tree obtained indicated the existence of two major vertebrate ER subtypes, α and β. Within each subtype, and in accordance with established phylogenetic relationships, teleost and tetrapod ER were monophyletic confirming the results of a previous analysis (Z. Xiaet al., 1999, Gen. Comp. Endocrinol. 113, 360–368). Extracts of COS-7 cells transfectedwith channel catfish ERβ cDNA bound estrogen with high affinity (Kd = 0.21 nM) and specificity. The affinity of channel catfish ERβ for estrogen was higher than previously reported for channel catfish ERα. As determined by qualitative RT-PCR, the tissue distributions of ERα and ERβ were similar but not identical. Both ER subtypes were present in ovary and testis. ERα was found in all other tissues examined from juvenile and mature fish of both sexes. ERβ was also found in most tissues except, in most cases, whole blood and head kidney. Interestingly, the pattern of expression of ER subtypes in head kidney always corresponded to the pattern in whole blood. In conclusion, we isolated a channel catfish ERβ with ligand-binding affinity and tissue expression patterns different from ERα. Also, we confirmed the validity of our previously proposed general classification scheme for vertebrate ER into α and β subtypes and within each subtype, into teleost and tetrapod clades.

  19. Automated classification of cell morphology by coherence-controlled holographic microscopy

    NASA Astrophysics Data System (ADS)

    Strbkova, Lenka; Zicha, Daniel; Vesely, Pavel; Chmelik, Radim

    2017-08-01

    In the last few years, classification of cells by machine learning has become frequently used in biology. However, most of the approaches are based on morphometric (MO) features, which are not quantitative in terms of cell mass. This may result in poor classification accuracy. Here, we study the potential contribution of coherence-controlled holographic microscopy enabling quantitative phase imaging for the classification of cell morphologies. We compare our approach with the commonly used method based on MO features. We tested both classification approaches in an experiment with nutritionally deprived cancer tissue cells, while employing several supervised machine learning algorithms. Most of the classifiers provided higher performance when quantitative phase features were employed. Based on the results, it can be concluded that the quantitative phase features played an important role in improving the performance of the classification. The methodology could be valuable help in refining the monitoring of live cells in an automated fashion. We believe that coherence-controlled holographic microscopy, as a tool for quantitative phase imaging, offers all preconditions for the accurate automated analysis of live cell behavior while enabling noninvasive label-free imaging with sufficient contrast and high-spatiotemporal phase sensitivity.

  20. Alabama-Mississippi Coastal Classification Maps - Perdido Pass to Cat Island

    USGS Publications Warehouse

    Morton, Robert A.; Peterson, Russell L.

    2005-01-01

    The primary purpose of the USGS National Assessment of Coastal Change Project is to provide accurate representations of pre-storm ground conditions for areas that are designated high-priority because they have dense populations or valuable resources that are at risk from storm waves. Another purpose of the project is to develop a geomorphic (land feature) coastal classification that, with only minor modification, can be applied to most coastal regions in the United States. A Coastal Classification Map describing local geomorphic features is the first step toward determining the hazard vulnerability of an area. The Coastal Classification Maps of the National Assessment of Coastal Change Project present ground conditions such as beach width, dune elevations, overwash potential, and density of development. In order to complete a hazard vulnerability assessment, that information must be integrated with other information, such as prior storm impacts and beach stability. The Coastal Classification Maps provide much of the basic information for such an assessment and represent a critical component of a storm-impact forecasting capability. The map above shows the areas covered by this web site. Click on any of the location names or outlines to view the Coastal Classification Map for that area.

  1. Automated classification of articular cartilage surfaces based on surface texture.

    PubMed

    Stachowiak, G P; Stachowiak, G W; Podsiadlo, P

    2006-11-01

    In this study the automated classification system previously developed by the authors was used to classify articular cartilage surfaces with different degrees of wear. This automated system classifies surfaces based on their texture. Plug samples of sheep cartilage (pins) were run on stainless steel discs under various conditions using a pin-on-disc tribometer. Testing conditions were specifically designed to produce different severities of cartilage damage due to wear. Environmental scanning electron microscope (SEM) (ESEM) images of cartilage surfaces, that formed a database for pattern recognition analysis, were acquired. The ESEM images of cartilage were divided into five groups (classes), each class representing different wear conditions or wear severity. Each class was first examined and assessed visually. Next, the automated classification system (pattern recognition) was applied to all classes. The results of the automated surface texture classification were compared to those based on visual assessment of surface morphology. It was shown that the texture-based automated classification system was an efficient and accurate method of distinguishing between various cartilage surfaces generated under different wear conditions. It appears that the texture-based classification method has potential to become a useful tool in medical diagnostics.

  2. Agent Collaborative Target Localization and Classification in Wireless Sensor Networks

    PubMed Central

    Wang, Xue; Bi, Dao-wei; Ding, Liang; Wang, Sheng

    2007-01-01

    Wireless sensor networks (WSNs) are autonomous networks that have been frequently deployed to collaboratively perform target localization and classification tasks. Their autonomous and collaborative features resemble the characteristics of agents. Such similarities inspire the development of heterogeneous agent architecture for WSN in this paper. The proposed agent architecture views WSN as multi-agent systems and mobile agents are employed to reduce in-network communication. According to the architecture, an energy based acoustic localization algorithm is proposed. In localization, estimate of target location is obtained by steepest descent search. The search algorithm adapts to measurement environments by dynamically adjusting its termination condition. With the agent architecture, target classification is accomplished by distributed support vector machine (SVM). Mobile agents are employed for feature extraction and distributed SVM learning to reduce communication load. Desirable learning performance is guaranteed by combining support vectors and convex hull vectors. Fusion algorithms are designed to merge SVM classification decisions made from various modalities. Real world experiments with MICAz sensor nodes are conducted for vehicle localization and classification. Experimental results show the proposed agent architecture remarkably facilitates WSN designs and algorithm implementation. The localization and classification algorithms also prove to be accurate and energy efficient.

  3. Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution

    PubMed Central

    Willerslev, Eske; Gilbert, M Thomas P; Binladen, Jonas; Ho, Simon YW; Campos, Paula F; Ratan, Aakrosh; Tomsho, Lynn P; da Fonseca, Rute R; Sher, Andrei; Kuznetsova, Tatanya V; Nowak-Kemp, Malgosia; Roth, Terri L; Miller, Webb; Schuster, Stephan C

    2009-01-01

    Background The scientific literature contains many examples where DNA sequence analyses have been used to provide definitive answers to phylogenetic problems that traditional (non-DNA based) approaches alone have failed to resolve. One notable example concerns the rhinoceroses, a group for which several contradictory phylogenies were proposed on the basis of morphology, then apparently resolved using mitochondrial DNA fragments. Results In this study we report the first complete mitochondrial genome sequences of the extinct ice-age woolly rhinoceros (Coelodonta antiquitatis), and the threatened Javan (Rhinoceros sondaicus), Sumatran (Dicerorhinus sumatrensis), and black (Diceros bicornis) rhinoceroses. In combination with the previously published mitochondrial genomes of the white (Ceratotherium simum) and Indian (Rhinoceros unicornis) rhinoceroses, this data set putatively enables reconstruction of the rhinoceros phylogeny. While the six species cluster into three strongly supported sister-pairings: (i) The black/white, (ii) the woolly/Sumatran, and (iii) the Javan/Indian, resolution of the higher-level relationships has no statistical support. The phylogenetic signal from individual genes is highly diffuse, with mixed topological support from different genes. Furthermore, the choice of outgroup (horse vs tapir) has considerable effect on reconstruction of the phylogeny. The lack of resolution is suggestive of a hard polytomy at the base of crown-group Rhinocerotidae, and this is supported by an investigation of the relative branch lengths. Conclusion Satisfactory resolution of the rhinoceros phylogeny may not be achievable without additional analyses of substantial amounts of nuclear DNA. This study provides a compelling demonstration that, in spite of substantial sequence length, there are significant limitations with single-locus phylogenetics. We expect further examples of this to appear as next-generation, large-scale sequencing of complete mitochondrial

  4. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  5. DendroPy: a Python library for phylogenetic computing.

    PubMed

    Sukumaran, Jeet; Holder, Mark T

    2010-06-15

    DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation routines to generate trees under a number of different phylogenetic and coalescent models. DendroPy's data simulation and manipulation facilities, in conjunction with its support of a broad range of phylogenetic data formats (NEXUS, Newick, PHYLIP, FASTA, NeXML, etc.), allow it to serve a useful role in various phyloinformatics and phylogeographic pipelines. The stable release of the library is available for download and automated installation through the Python Package Index site (http://pypi.python.org/pypi/DendroPy), while the active development source code repository is available to the public from GitHub (http://github.com/jeetsukumaran/DendroPy).

  6. Phylogenetic and functional diversity in large carnivore assemblages

    PubMed Central

    Dalerum, F.

    2013-01-01

    Large terrestrial carnivores are important ecological components and prominent flagship species, but are often extinction prone owing to a combination of biological traits and high levels of human persecution. This study combines phylogenetic and functional diversity evaluations of global and continental large carnivore assemblages to provide a framework for conservation prioritization both between and within assemblages. Species-rich assemblages of large carnivores simultaneously had high phylogenetic and functional diversity, but species contributions to phylogenetic and functional diversity components were not positively correlated. The results further provide ecological justification for the largest carnivore species as a focus for conservation action, and suggests that range contraction is a likely cause of diminishing carnivore ecosystem function. This study highlights that preserving species-rich carnivore assemblages will capture both high phylogenetic and functional diversity, but that prioritizing species within assemblages will involve trade-offs between optimizing contemporary ecosystem function versus the evolutionary potential for future ecosystem performance. PMID:23576787

  7. On the information content of discrete phylogenetic characters.

    PubMed

    Bordewich, Magnus; Deutschmann, Ina Maria; Fischer, Mareike; Kasbohm, Elisa; Semple, Charles; Steel, Mike

    2017-12-16

    Phylogenetic inference aims to reconstruct the evolutionary relationships of different species based on genetic (or other) data. Discrete characters are a particular type of data, which contain information on how the species should be grouped together. However, it has long been known that some characters contain more information than others. For instance, a character that assigns the same state to each species groups all of them together and so provides no insight into the relationships of the species considered. At the other extreme, a character that assigns a different state to each species also conveys no phylogenetic signal. In this manuscript, we study a natural combinatorial measure of the information content of an individual character and analyse properties of characters that provide the maximum phylogenetic information, particularly, the number of states such a character uses and how the different states have to be distributed among the species or taxa of the phylogenetic tree.

  8. Non-Destructive Classification Approaches for Equilbrated Ordinary Chondrites

    NASA Technical Reports Server (NTRS)

    Righter, K.; Harrington, R.; Schroeder, C.; Morris, R. V.

    2013-01-01

    Classification of meteorites is most effectively carried out by petrographic and mineralogic studies of thin sections, but a rapid and accurate classification technique for the many samples collected in dense collection areas (hot and cold deserts) is of great interest. Oil immersion techniques have been used to classify a large proportion of the US Antarctic meteorite collections since the mid-1980s [1]. This approach has allowed rapid characterization of thousands of samples over time, but nonetheless utilizes a piece of the sample that has been ground to grains or a powder. In order to compare a few non-destructive techniques with the standard approaches, we have characterized a group of chondrites from the Larkman Nunatak region using magnetic susceptibility and Moessbauer spectroscopy.

  9. [Phylogenetic analysis of tyrosinase gene family in the Pacific oyster (Crassostrea gigas Thunberg)].

    PubMed

    Yu, Xue; Yu, Hong; Kong, Lingfeng; Li, Qi

    2014-02-01

    The deduced amino acid sequence characteristics, classification and phylogeny of tyrosinase gene family in the Pacific oyster (Crassostrea gigas Thunberg) were analyzed using bioinformatics methods. The results showed that gene duplication was the major cause of tyrosinase gene expansion in the Pacific oyster. The tyrosinase gene family in the Pacific oyster can be further classified into three types: secreted form (Type A), cytosolic form (Type B) and membrane-bound form (Type C). Based on the topology of the phylogenetic tree of the Pacific oyster tyrosinases, among Type A isoforms, tyr18 seemed divergent from other Type A tyrosinases early, while tyr2 and tyr9 appeared divergent early in Type B. In Type C tyrosinses, tyr8 was divergent early. The cluster of the Pacific oyster tyrosinasesis determined by their classifications and positions in the scaffolds. Further analysis suggested that Type A tyrosinases of C. gigas clustered with those from cephalopods and then with nematodes and cnidarians. Type B tyrosinases were generally clustered with the same type of tyrosinases from molluscas and nematodes, and then with those from platyhelminths, cnidarians and chordates. Type A tyrosinases in the Pacific oyster and the Pearl oyster expanded independently and were divergent from membrane-bound form of tyrosinases in chordata, platyhelminthes and annelida. These observations suggested that Type C tyrosinases in the bivalve had a distinct evolution direction.

  10. Cloud field classification based on textural features

    NASA Technical Reports Server (NTRS)

    Sengupta, Sailes Kumar

    1989-01-01

    An essential component in global climate research is accurate cloud cover and type determination. Of the two approaches to texture-based classification (statistical and textural), only the former is effective in the classification of natural scenes such as land, ocean, and atmosphere. In the statistical approach that was adopted, parameters characterizing the stochastic properties of the spatial distribution of grey levels in an image are estimated and then used as features for cloud classification. Two types of textural measures were used. One is based on the distribution of the grey level difference vector (GLDV), and the other on a set of textural features derived from the MaxMin cooccurrence matrix (MMCM). The GLDV method looks at the difference D of grey levels at pixels separated by a horizontal distance d and computes several statistics based on this distribution. These are then used as features in subsequent classification. The MaxMin tectural features on the other hand are based on the MMCM, a matrix whose (I,J)th entry give the relative frequency of occurrences of the grey level pair (I,J) that are consecutive and thresholded local extremes separated by a given pixel distance d. Textural measures are then computed based on this matrix in much the same manner as is done in texture computation using the grey level cooccurrence matrix. The database consists of 37 cloud field scenes from LANDSAT imagery using a near IR visible channel. The classification algorithm used is the well known Stepwise Discriminant Analysis. The overall accuracy was estimated by the percentage or correct classifications in each case. It turns out that both types of classifiers, at their best combination of features, and at any given spatial resolution give approximately the same classification accuracy. A neural network based classifier with a feed forward architecture and a back propagation training algorithm is used to increase the classification accuracy, using these two classes

  11. HIPPI: highly accurate protein family classification with ensembles of HMMs.

    PubMed

    Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy

    2016-11-11

    Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

  12. Estimation of rates-across-sites distributions in phylogenetic substitution models.

    PubMed

    Susko, Edward; Field, Chris; Blouin, Christian; Roger, Andrew J

    2003-10-01

    Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.

  13. Phylogenetic community structure: temporal variation in fish assemblage

    PubMed Central

    Santorelli, Sergio; Magnusson, William; Ferreira, Efrem; Caramaschi, Erica; Zuanon, Jansen; Amadio, Sidnéia

    2014-01-01

    Hypotheses about phylogenetic relationships among species allow inferences about the mechanisms that affect species coexistence. Nevertheless, most studies assume that phylogenetic patterns identified are stable over time. We used data on monthly samples of fish from a single lake over 10 years to show that the structure in phylogenetic assemblages varies over time and conclusions depend heavily on the time scale investigated. The data set was organized in guild structures and temporal scales (grouped at three temporal scales). Phylogenetic distance was measured as the mean pairwise distances (MPD) and as mean nearest-neighbor distance (MNTD). Both distances were based on counts of nodes. We compared the observed values of MPD and MNTD with values that were generated randomly using null model independent swap. A serial runs test was used to assess the temporal independence of indices over time. The phylogenetic pattern in the whole assemblage and the functional groups varied widely over time. Conclusions about phylogenetic clustering or dispersion depended on the temporal scales. Conclusions about the frequency with which biotic processes and environmental filters affect the local assembly do not depend only on taxonomic grouping and spatial scales. While these analyzes allow the assertion that all proposed patterns apply to the fish assemblages in the floodplain, the assessment of the relative importance of these processes, and how they vary depending on the temporal scale and functional group studied, cannot be determined with the effort commonly used. It appears that, at least in the system that we studied, the assemblages are forming and breaking continuously, resulting in various phylogeny-related structures that makes summarizing difficult. PMID:25360256

  14. A phylogenetic Kalman filter for ancestral trait reconstruction using molecular data.

    PubMed

    Lartillot, Nicolas

    2014-02-15

    Correlation between life history or ecological traits and genomic features such as nucleotide or amino acid composition can be used for reconstructing the evolutionary history of the traits of interest along phylogenies. Thus far, however, such ancestral reconstructions have been done using simple linear regression approaches that do not account for phylogenetic inertia. These reconstructions could instead be seen as a genuine comparative regression problem, such as formalized by classical generalized least-square comparative methods, in which the trait of interest and the molecular predictor are represented as correlated Brownian characters coevolving along the phylogeny. Here, a Bayesian sampler is introduced, representing an alternative and more efficient algorithmic solution to this comparative regression problem, compared with currently existing generalized least-square approaches. Technically, ancestral trait reconstruction based on a molecular predictor is shown to be formally equivalent to a phylogenetic Kalman filter problem, for which backward and forward recursions are developed and implemented in the context of a Markov chain Monte Carlo sampler. The comparative regression method results in more accurate reconstructions and a more faithful representation of uncertainty, compared with simple linear regression. Application to the reconstruction of the evolution of optimal growth temperature in Archaea, using GC composition in ribosomal RNA stems and amino acid composition of a sample of protein-coding genes, confirms previous findings, in particular, pointing to a hyperthermophilic ancestor for the kingdom. The program is freely available at www.phylobayes.org.

  15. Use of Whole-Genus Genome Sequence Data To Develop a Multilocus Sequence Typing Tool That Accurately Identifies Yersinia Isolates to the Species and Subspecies Levels

    PubMed Central

    Hall, Miquette; Chattaway, Marie A.; Reuter, Sandra; Savin, Cyril; Strauch, Eckhard; Carniel, Elisabeth; Connor, Thomas; Van Damme, Inge; Rajakaruna, Lakshani; Rajendram, Dunstan; Jenkins, Claire; Thomson, Nicholas R.

    2014-01-01

    The genus Yersinia is a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identify Yersinia strains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic species Yersinia enterocolitica to whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification of Yersinia isolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available for Y. enterocolitica. PMID:25339391

  16. Enumerating all maximal frequent subtrees in collections of phylogenetic trees.

    PubMed

    Deepak, Akshay; Fernández-Baca, David

    2014-01-01

    A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events. We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/. Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees.

  17. Phylogeny and classification of bacteria in the genera Clavibacter and Rathayibacter on the basis of 16s rRNA gene sequence analyses.

    PubMed

    Lee, I M; Bartoszyk, I M; Gundersen-Rindal, D E; Davis, R E

    1997-07-01

    A phylogenetic analysis by parsimony of 16S rRNA gene sequences (16S rDNA) revealed that species and subspecies of Clavibacter and Rathayibacter form a discrete monophyletic clade, paraphyletic to Corynebacterium species. Within the Clavibacter-Rathayibacter clade, four major phylogenetic groups (subclades) with a total of 10 distinct taxa were recognized: (I) species C. michiganensis; (II) species C. xyli; (III) species R. iranicus and R. tritici; and (IV) species R. rathayi. The first three groups form a monophyletic cluster, paraphyletic to R. rathayi. On the basis of the phylogeny inferred, reclassification of members of Clavibacter-Rathayibacter group is proposed. A system for classification of taxa in Clavibacter and Rathayibacter was developed based on restriction fragment length polymorphism (RFLP) analysis of the PCR-amplified 16S rDNA sequences. The groups delineated on the basis of RFLP patterns of 16S rDNA coincided well with the subclades delineated on the basis of phylogeny. In contrast to previous classification systems, which are based primarily on phenotypic properties and are laborious, the RFLP analyses allow for rapid differentiation among species and subspecies in the two genera.

  18. Subliminal priming with nearly perfect performance in the prime-classification task.

    PubMed

    Finkbeiner, Matthew

    2011-05-01

    The subliminal priming paradigm is widely used by cognitive scientists, and claims of subliminal perception are common nowadays. Nevertheless, there are still those who remain skeptical. In a recent critique of subliminal priming, Pratte and Rouder (Attention, Perception, & Psychophysics, 71, 1276-1283, 2009) suggested that previous claims of subliminal priming may have been due to a failure to control the task difficulty between the experiment proper and the prime-classification task. Essentially, because the prime-classification task is more difficult than the experiment proper, the prime-classification task results may underrepresent the subjects' true ability to perceive the prime stimuli. To address this possibility, prime words were here presented in color. In the experiment proper, priming was observed. In the prime-classification task, subjects reported the color of the primes very accurately, indicating almost perfect control of task difficulty, but they could not identify the primes. Thus, I conclude that controlling for task difficulty does not eliminate subliminal priming.

  19. gyrB as a phylogenetic discriminator for members of the Bacillus anthracis-cereus-thuringiensis group

    NASA Technical Reports Server (NTRS)

    La Duc, Myron T.; Satomi, Masataka; Agata, Norio; Venkateswaran, Kasthuri

    2004-01-01

    Bacillus anthracis, the causative agent of the human disease anthrax, Bacillus cereus, a food-borne pathogen capable of causing human illness, and Bacillus thuringiensis, a well-characterized insecticidal toxin producer, all cluster together within a very tight clade (B. cereus group) phylogenetically and are indistinguishable from one another via 16S rDNA sequence analysis. As new pathogens are continually emerging, it is imperative to devise a system capable of rapidly and accurately differentiating closely related, yet phenotypically distinct species. Although the gyrB gene has proven useful in discriminating closely related species, its sequence analysis has not yet been validated by DNA:DNA hybridization, the taxonomically accepted "gold standard". We phylogenetically characterized the gyrB sequences of various species and serotypes encompassed in the "B. cereus group," including lab strains and environmental isolates. Results were compared to those obtained from analyses of phenotypic characteristics, 16S rDNA sequence, DNA:DNA hybridization, and virulence factors. The gyrB gene proved more highly differential than 16S, while, at the same time, as analytical as costly and laborious DNA:DNA hybridization techniques in differentiating species within the B. cereus group.

  20. Classification of forest land attributes using multi-source remotely sensed data

    NASA Astrophysics Data System (ADS)

    Pippuri, Inka; Suvanto, Aki; Maltamo, Matti; Korhonen, Kari T.; Pitkänen, Juho; Packalen, Petteri

    2016-02-01

    The aim of the study was to (1) examine the classification of forest land using airborne laser scanning (ALS) data, satellite images and sample plots of the Finnish National Forest Inventory (NFI) as training data and to (2) identify best performing metrics for classifying forest land attributes. Six different schemes of forest land classification were studied: land use/land cover (LU/LC) classification using both national classes and FAO (Food and Agricultural Organization of the United Nations) classes, main type, site type, peat land type and drainage status. Special interest was to test different ALS-based surface metrics in classification of forest land attributes. Field data consisted of 828 NFI plots collected in 2008-2012 in southern Finland and remotely sensed data was from summer 2010. Multinomial logistic regression was used as the classification method. Classification of LU/LC classes were highly accurate (kappa-values 0.90 and 0.91) but also the classification of site type, peat land type and drainage status succeeded moderately well (kappa-values 0.51, 0.69 and 0.52). ALS-based surface metrics were found to be the most important predictor variables in classification of LU/LC class, main type and drainage status. In best classification models of forest site types both spectral metrics from satellite data and point cloud metrics from ALS were used. In turn, in the classification of peat land types ALS point cloud metrics played the most important role. Results indicated that the prediction of site type and forest land category could be incorporated into stand level forest management inventory system in Finland.

  1. Using Copula Distributions to Support More Accurate Imaging-Based Diagnostic Classifiers for Neuropsychiatric Disorders

    PubMed Central

    Bansal, Ravi; Hao, Xuejun; Liu, Jun; Peterson, Bradley S.

    2014-01-01

    Many investigators have tried to apply machine learning techniques to magnetic resonance images (MRIs) of the brain in order to diagnose neuropsychiatric disorders. Usually the number of brain imaging measures (such as measures of cortical thickness and measures of local surface morphology) derived from the MRIs (i.e., their dimensionality) has been large (e.g. >10) relative to the number of participants who provide the MRI data (<100). Sparse data in a high dimensional space increases the variability of the classification rules that machine learning algorithms generate, thereby limiting the validity, reproducibility, and generalizability of those classifiers. The accuracy and stability of the classifiers can improve significantly if the multivariate distributions of the imaging measures can be estimated accurately. To accurately estimate the multivariate distributions using sparse data, we propose to estimate first the univariate distributions of imaging data and then combine them using a Copula to generate more accurate estimates of their multivariate distributions. We then sample the estimated Copula distributions to generate dense sets of imaging measures and use those measures to train classifiers. We hypothesize that the dense sets of brain imaging measures will generate classifiers that are stable to variations in brain imaging measures, thereby improving the reproducibility, validity, and generalizability of diagnostic classification algorithms in imaging datasets from clinical populations. In our experiments, we used both computer-generated and real-world brain imaging datasets to assess the accuracy of multivariate Copula distributions in estimating the corresponding multivariate distributions of real-world imaging data. Our experiments showed that diagnostic classifiers generated using imaging measures sampled from the Copula were significantly more accurate and more reproducible than were the classifiers generated using either the real-world imaging

  2. Maximum parsimony, substitution model, and probability phylogenetic trees.

    PubMed

    Weng, J F; Thomas, D A; Mareels, I

    2011-01-01

    The problem of inferring phylogenies (phylogenetic trees) is one of the main problems in computational biology. There are three main methods for inferring phylogenies-Maximum Parsimony (MP), Distance Matrix (DM) and Maximum Likelihood (ML), of which the MP method is the most well-studied and popular method. In the MP method the optimization criterion is the number of substitutions of the nucleotides computed by the differences in the investigated nucleotide sequences. However, the MP method is often criticized as it only counts the substitutions observable at the current time and all the unobservable substitutions that really occur in the evolutionary history are omitted. In order to take into account the unobservable substitutions, some substitution models have been established and they are now widely used in the DM and ML methods but these substitution models cannot be used within the classical MP method. Recently the authors proposed a probability representation model for phylogenetic trees and the reconstructed trees in this model are called probability phylogenetic trees. One of the advantages of the probability representation model is that it can include a substitution model to infer phylogenetic trees based on the MP principle. In this paper we explain how to use a substitution model in the reconstruction of probability phylogenetic trees and show the advantage of this approach with examples.

  3. PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages

    PubMed Central

    Niu, Yulong; Liu, Chengcheng; Moghimyfiroozabad, Shayan; Yang, Yi

    2017-01-01

    Direct and indirect functional links between proteins as well as their interactions as part of larger protein complexes or common signaling pathways may be predicted by analyzing the correlation of their evolutionary patterns. Based on phylogenetic profiling, here we present a highly scalable and time-efficient computational framework for predicting linkages within the whole human proteome. We have validated this method through analysis of 3,697 human pathways and molecular complexes and a comparison of our results with the prediction outcomes of previously published co-occurrency model-based and normalization methods. Here we also introduce PrePhyloPro, a web-based software that uses our method for accurately predicting proteome-wide linkages. We present data on interactions of human mitochondrial proteins, verifying the performance of this software. PrePhyloPro is freely available at http://prephylopro.org/phyloprofile/. PMID:28875072

  4. Optimization of the ANFIS using a genetic algorithm for physical work rate classification.

    PubMed

    Habibi, Ehsanollah; Salehi, Mina; Yadegarfar, Ghasem; Taheri, Ali

    2018-03-13

    Recently, a new method was proposed for physical work rate classification based on an adaptive neuro-fuzzy inference system (ANFIS). This study aims to present a genetic algorithm (GA)-optimized ANFIS model for a highly accurate classification of physical work rate. Thirty healthy men participated in this study. Directly measured heart rate and oxygen consumption of the participants in the laboratory were used for training the ANFIS classifier model in MATLAB version 8.0.0 using a hybrid algorithm. A similar process was done using the GA as an optimization technique. The accuracy, sensitivity and specificity of the ANFIS classifier model were increased successfully. The mean accuracy of the model was increased from 92.95 to 97.92%. Also, the calculated root mean square error of the model was reduced from 5.4186 to 3.1882. The maximum estimation error of the optimized ANFIS during the network testing process was ± 5%. The GA can be effectively used for ANFIS optimization and leads to an accurate classification of physical work rate. In addition to high accuracy, simple implementation and inter-individual variability consideration are two other advantages of the presented model.

  5. Fire modifies the phylogenetic structure of soil bacterial co-occurrence networks.

    PubMed

    Pérez-Valera, Eduardo; Goberna, Marta; Faust, Karoline; Raes, Jeroen; García, Carlos; Verdú, Miguel

    2017-01-01

    Fire alters ecosystems by changing the composition and community structure of soil microbes. The phylogenetic structure of a community provides clues about its main assembling mechanisms. While environmental filtering tends to reduce the community phylogenetic diversity by selecting for functionally (and hence phylogenetically) similar species, processes like competitive exclusion by limiting similarity tend to increase it by preventing the coexistence of functionally (and phylogenetically) similar species. We used co-occurrence networks to detect co-presence (bacteria that co-occur) or exclusion (bacteria that do not co-occur) links indicative of the ecological interactions structuring the community. We propose that inspecting the phylogenetic structure of co-presence or exclusion links allows to detect the main processes simultaneously assembling the community. We monitored a soil bacterial community after an experimental fire and found that fire altered its composition, richness and phylogenetic diversity. Both co-presence and exclusion links were more phylogenetically related than expected by chance. We interpret such a phylogenetic clustering in co-presence links as a result of environmental filtering, while that in exclusion links reflects competitive exclusion by limiting similarity. This suggests that environmental filtering and limiting similarity operate simultaneously to assemble soil bacterial communities, widening the traditional view that only environmental filtering structures bacterial communities. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.

  6. Light in the darkness: New perspective on lanternfish relationships and classification using genomic and morphological data.

    PubMed

    Martin, Rene P; Olson, Emily E; Girard, Matthew G; Smith, Wm Leo; Davis, Matthew P

    2018-04-01

    Massive parallel sequencing allows scientists to gather DNA sequences composed of millions of base pairs that can be combined into large datasets and analyzed to infer organismal relationships at a genome-wide scale in non-model organisms. Although the use of these large datasets is becoming more widespread, little to no work has been done in estimating phylogenetic relationships using UCEs in deep-sea fishes. Among deep-sea animals, the 257 species of lanternfishes (Myctophiformes) are among the most important open-ocean lineages, representing half of all mesopelagic vertebrate biomass. With this relative abundance, they are key members of the midwater food web where they feed on smaller invertebrates and fishes in addition to being a primary prey item for other open-ocean animals. Understanding the evolution and relationships of midwater organisms generally, and this dominant group of fishes in particular, is necessary for understanding and preserving the underexplored deep-sea ecosystem. Despite substantial congruence in the evolutionary relationships among deep-sea lanternfishes at higher classification levels in previous studies, the relationships among tribes, genera, and species within Myctophidae often conflict across phylogenetic studies or lack resolution and support. Herein we provide the first genome-scale phylogenetic analysis of lanternfishes, and we integrate these data from across the nuclear genome with additional protein-coding gene sequences and morphological data to further test evolutionary relationships among lanternfishes. Our phylogenetic hypotheses of relationships among lanternfishes are entirely congruent across a diversity of analyses that vary in methods, taxonomic sampling, and data analyzed. Within the Myctophiformes, the Neoscopelidae is inferred to be monophyletic and sister to a monophyletic Myctophidae. The current classification of lanternfishes is incongruent with our phylogenetic tree, so we recommend revisions that retain much

  7. Effects of uncertainty and variability on population declines and IUCN Red List classifications.

    PubMed

    Rueda-Cediel, Pamela; Anderson, Kurt E; Regan, Tracey J; Regan, Helen M

    2018-01-22

    The International Union for Conservation of Nature (IUCN) Red List Categories and Criteria is a quantitative framework for classifying species according to extinction risk. Population models may be used to estimate extinction risk or population declines. Uncertainty and variability arise in threat classifications through measurement and process error in empirical data and uncertainty in the models used to estimate extinction risk and population declines. Furthermore, species traits are known to affect extinction risk. We investigated the effects of measurement and process error, model type, population growth rate, and age at first reproduction on the reliability of risk classifications based on projected population declines on IUCN Red List classifications. We used an age-structured population model to simulate true population trajectories with different growth rates, reproductive ages and levels of variation, and subjected them to measurement error. We evaluated the ability of scalar and matrix models parameterized with these simulated time series to accurately capture the IUCN Red List classification generated with true population declines. Under all levels of measurement error tested and low process error, classifications were reasonably accurate; scalar and matrix models yielded roughly the same rate of misclassifications, but the distribution of errors differed; matrix models led to greater overestimation of extinction risk than underestimations; process error tended to contribute to misclassifications to a greater extent than measurement error; and more misclassifications occurred for fast, rather than slow, life histories. These results indicate that classifications of highly threatened taxa (i.e., taxa with low growth rates) under criterion A are more likely to be reliable than for less threatened taxa when assessed with population models. Greater scrutiny needs to be placed on data used to parameterize population models for species with high growth rates

  8. Accurate diagnosis of prenatal cleft lip/palate by understanding the embryology

    PubMed Central

    Smarius, Bram; Loozen, Charlotte; Manten, Wendy; Bekker, Mireille; Pistorius, Lou; Breugem, Corstiaan

    2017-01-01

    Cleft lip with or without cleft palate (CP) is one of the most common congenital malformations. Ultrasonographers involved in the routine 20-wk ultrasound screening could encounter these malformations. The face and palate develop in a very characteristic way. For ultrasonographers involved in screening these patients it is crucial to have a thorough understanding of the embryology of the face. This could help them to make a more accurate diagnosis and save time during the ultrasound. Subsequently, the current postnatal classification will be discussed to facilitate the communication with the CP teams. PMID:29026689

  9. Tanglegrams for rooted phylogenetic trees and networks

    PubMed Central

    Scornavacca, Celine; Zickmann, Franziska; Huson, Daniel H.

    2011-01-01

    Motivation: In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment. Thus, the question arises how to define and compute a tanglegram for such networks. Results: In this article, we present the first formal definition of a tanglegram for rooted phylogenetic networks and present a heuristic approach for computing one, called the NN-tanglegram method. We compare the performance of our method with existing tree tanglegram algorithms and also show a typical application to real biological datasets. For maximum usability, the algorithm does not require that the trees or networks are bifurcating or bicombining, or that they are on identical taxon sets. Availability: The algorithm is implemented in our program Dendroscope 3, which is freely available from www.dendroscope.org. Contact: scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de PMID:21685078

  10. Construction of a Calibrated Probabilistic Classification Catalog: Application to 50k Variable Sources in the All-Sky Automated Survey

    NASA Astrophysics Data System (ADS)

    Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; Brink, Henrik; Crellin-Quick, Arien

    2012-12-01

    With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.

  11. CONSTRUCTION OF A CALIBRATED PROBABILISTIC CLASSIFICATION CATALOG: APPLICATION TO 50k VARIABLE SOURCES IN THE ALL-SKY AUTOMATED SURVEY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.

    2012-12-15

    With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In additionmore » to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.« less

  12. Uav-Based Crops Classification with Joint Features from Orthoimage and Dsm Data

    NASA Astrophysics Data System (ADS)

    Liu, B.; Shi, Y.; Duan, Y.; Wu, W.

    2018-04-01

    Accurate crops classification remains a challenging task due to the same crop with different spectra and different crops with same spectrum phenomenon. Recently, UAV-based remote sensing approach gains popularity not only for its high spatial and temporal resolution, but also for its ability to obtain spectraand spatial data at the same time. This paper focus on how to take full advantages of spatial and spectrum features to improve crops classification accuracy, based on an UAV platform equipped with a general digital camera. Texture and spatial features extracted from the RGB orthoimage and the digital surface model of the monitoring area are analysed and integrated within a SVM classification framework. Extensive experiences results indicate that the overall classification accuracy is drastically improved from 72.9 % to 94.5 % when the spatial features are combined together, which verified the feasibility and effectiveness of the proposed method.

  13. Deep learning for EEG-Based preference classification

    NASA Astrophysics Data System (ADS)

    Teo, Jason; Hou, Chew Lin; Mountstephens, James

    2017-10-01

    Electroencephalogram (EEG)-based emotion classification is rapidly becoming one of the most intensely studied areas of brain-computer interfacing (BCI). The ability to passively identify yet accurately correlate brainwaves with our immediate emotions opens up truly meaningful and previously unattainable human-computer interactions such as in forensic neuroscience, rehabilitative medicine, affective entertainment and neuro-marketing. One particularly useful yet rarely explored areas of EEG-based emotion classification is preference recognition [1], which is simply the detection of like versus dislike. Within the limited investigations into preference classification, all reported studies were based on musically-induced stimuli except for a single study which used 2D images. The main objective of this study is to apply deep learning, which has been shown to produce state-of-the-art results in diverse hard problems such as in computer vision, natural language processing and audio recognition, to 3D object preference classification over a larger group of test subjects. A cohort of 16 users was shown 60 bracelet-like objects as rotating visual stimuli on a computer display while their preferences and EEGs were recorded. After training a variety of machine learning approaches which included deep neural networks, we then attempted to classify the users' preferences for the 3D visual stimuli based on their EEGs. Here, we show that that deep learning outperforms a variety of other machine learning classifiers for this EEG-based preference classification task particularly in a highly challenging dataset with large inter- and intra-subject variability.

  14. Species Divergence and Phylogenetic Variation of Ecophysiological Traits in Lianas and Trees

    PubMed Central

    Rios, Rodrigo S.; Salgado-Luarte, Cristian; Gianoli, Ernesto

    2014-01-01

    The climbing habit is an evolutionary key innovation in plants because it is associated with enhanced clade diversification. We tested whether patterns of species divergence and variation of three ecophysiological traits that are fundamental for plant adaptation to light environments (maximum photosynthetic rate [Amax], dark respiration rate [Rd], and specific leaf area [SLA]) are consistent with this key innovation. Using data reported from four tropical forests and three temperate forests, we compared phylogenetic distance among species as well as the evolutionary rate, phylogenetic distance and phylogenetic signal of those traits in lianas and trees. Estimates of evolutionary rates showed that Rd evolved faster in lianas, while SLA evolved faster in trees. The mean phylogenetic distance was 1.2 times greater among liana species than among tree species. Likewise, estimates of phylogenetic distance indicated that lianas were less related than by chance alone (phylogenetic evenness across 63 species), and trees were more related than expected by chance (phylogenetic clustering across 71 species). Lianas showed evenness for Rd, while trees showed phylogenetic clustering for this trait. In contrast, for SLA, lianas exhibited phylogenetic clustering and trees showed phylogenetic evenness. Lianas and trees showed patterns of ecophysiological trait variation among species that were independent of phylogenetic relatedness. We found support for the expected pattern of greater species divergence in lianas, but did not find consistent patterns regarding ecophysiological trait evolution and divergence. Rd followed the species-level pattern, i.e., greater divergence/evolution in lianas compared to trees, while the opposite occurred for SLA and no pattern was detected for Amax. Rd may have driven lianas' divergence across forest environments, and might contribute to diversification in climber clades. PMID:24914958

  15. Impact of Information based Classification on Network Epidemics

    PubMed Central

    Mishra, Bimal Kumar; Haldar, Kaushik; Sinha, Durgesh Nandini

    2016-01-01

    Formulating mathematical models for accurate approximation of malicious propagation in a network is a difficult process because of our inherent lack of understanding of several underlying physical processes that intrinsically characterize the broader picture. The aim of this paper is to understand the impact of available information in the control of malicious network epidemics. A 1-n-n-1 type differential epidemic model is proposed, where the differentiality allows a symptom based classification. This is the first such attempt to add such a classification into the existing epidemic framework. The model is incorporated into a five class system called the DifEpGoss architecture. Analysis reveals an epidemic threshold, based on which the long-term behavior of the system is analyzed. In this work three real network datasets with 22002, 22469 and 22607 undirected edges respectively, are used. The datasets show that classification based prevention given in the model can have a good role in containing network epidemics. Further simulation based experiments are used with a three category classification of attack and defense strengths, which allows us to consider 27 different possibilities. These experiments further corroborate the utility of the proposed model. The paper concludes with several interesting results. PMID:27329348

  16. Advances in Spectral-Spatial Classification of Hyperspectral Images

    NASA Technical Reports Server (NTRS)

    Fauvel, Mathieu; Tarabalka, Yuliya; Benediktsson, Jon Atli; Chanussot, Jocelyn; Tilton, James C.

    2012-01-01

    Recent advances in spectral-spatial classification of hyperspectral images are presented in this paper. Several techniques are investigated for combining both spatial and spectral information. Spatial information is extracted at the object (set of pixels) level rather than at the conventional pixel level. Mathematical morphology is first used to derive the morphological profile of the image, which includes characteristics about the size, orientation and contrast of the spatial structures present in the image. Then the morphological neighborhood is defined and used to derive additional features for classification. Classification is performed with support vector machines using the available spectral information and the extracted spatial information. Spatial post-processing is next investigated to build more homogeneous and spatially consistent thematic maps. To that end, three presegmentation techniques are applied to define regions that are used to regularize the preliminary pixel-wise thematic map. Finally, a multiple classifier system is defined to produce relevant markers that are exploited to segment the hyperspectral image with the minimum spanning forest algorithm. Experimental results conducted on three real hyperspectral images with different spatial and spectral resolutions and corresponding to various contexts are presented. They highlight the importance of spectral-spatial strategies for the accurate classification of hyperspectral images and validate the proposed methods.

  17. Advances in Spectral-Spatial Classification of Hyperspectral Images

    NASA Technical Reports Server (NTRS)

    Fauvel, Mathieu; Tarabalka, Yuliya; Benediktsson, Jon Atli; Chanussot, Jocelyn; Tilton, James C.

    2012-01-01

    Recent advances in spectral-spatial classification of hyperspectral images are presented in this paper. Several techniques are investigated for combining both spatial and spectral information. Spatial information is extracted at the object (set of pixels) level rather than at the conventional pixel level. Mathematical morphology is first used to derive the morphological profile of the image, which includes characteristics about the size, orientation, and contrast of the spatial structures present in the image. Then, the morphological neighborhood is defined and used to derive additional features for classification. Classification is performed with support vector machines (SVMs) using the available spectral information and the extracted spatial information. Spatial postprocessing is next investigated to build more homogeneous and spatially consistent thematic maps. To that end, three presegmentation techniques are applied to define regions that are used to regularize the preliminary pixel-wise thematic map. Finally, a multiple-classifier (MC) system is defined to produce relevant markers that are exploited to segment the hyperspectral image with the minimum spanning forest algorithm. Experimental results conducted on three real hyperspectral images with different spatial and spectral resolutions and corresponding to various contexts are presented. They highlight the importance of spectral–spatial strategies for the accurate classification of hyperspectral images and validate the proposed methods.

  18. Automated retinal vessel type classification in color fundus images

    NASA Astrophysics Data System (ADS)

    Yu, H.; Barriga, S.; Agurto, C.; Nemeth, S.; Bauman, W.; Soliz, P.

    2013-02-01

    Automated retinal vessel type classification is an essential first step toward machine-based quantitative measurement of various vessel topological parameters and identifying vessel abnormalities and alternations in cardiovascular disease risk analysis. This paper presents a new and accurate automatic artery and vein classification method developed for arteriolar-to-venular width ratio (AVR) and artery and vein tortuosity measurements in regions of interest (ROI) of 1.5 and 2.5 optic disc diameters from the disc center, respectively. This method includes illumination normalization, automatic optic disc detection and retinal vessel segmentation, feature extraction, and a partial least squares (PLS) classification. Normalized multi-color information, color variation, and multi-scale morphological features are extracted on each vessel segment. We trained the algorithm on a set of 51 color fundus images using manually marked arteries and veins. We tested the proposed method in a previously unseen test data set consisting of 42 images. We obtained an area under the ROC curve (AUC) of 93.7% in the ROI of AVR measurement and 91.5% of AUC in the ROI of tortuosity measurement. The proposed AV classification method has the potential to assist automatic cardiovascular disease early detection and risk analysis.

  19. Enumerating all maximal frequent subtrees in collections of phylogenetic trees

    PubMed Central

    2014-01-01

    Background A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events. Results We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/. Conclusions Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees. PMID:25061474

  20. Phylogenetic fields through time: temporal dynamics of geographical co-occurrence and phylogenetic structure within species ranges

    PubMed Central

    Carotenuto, Francesco; Diniz-Filho, José Alexandre F.

    2016-01-01

    Species co-occur with different sets of other species across their geographical distribution, which can be either closely or distantly related. Such co-occurrence patterns and their phylogenetic structure within individual species ranges represent what we call the species phylogenetic fields (PFs). These PFs allow investigation of the role of historical processes—speciation, extinction and dispersal—in shaping species co-occurrence patterns, in both extinct and extant species. Here, we investigate PFs of large mammalian species during the last 3 Myr, and how these correlate with trends in diversification rates. Using the fossil record, we evaluate species' distributional and co-occurrence patterns along with their phylogenetic structure. We apply a novel Bayesian framework on fossil occurrences to estimate diversification rates through time. Our findings highlight the effect of evolutionary processes and past climatic changes on species' distributions and co-occurrences. From the Late Pliocene to the Recent, mammal species seem to have responded in an individualistic manner to climate changes and diversification dynamics, co-occurring with different sets of species from different lineages across their geographical ranges. These findings stress the difficulty of forecasting potential effects of future climate changes on biodiversity. PMID:26977061