Science.gov

Sample records for alignments phylogenetic trees

  1. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    PubMed

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  2. Iteratively Refined Guide Trees Help Improving Alignment and Phylogenetic Inference in the Mushroom Family Bolbitiaceae

    PubMed Central

    Tóth, Annamária; Hausknecht, Anton; Krisai-Greilhuber, Irmgard; Papp, Tamás; Vágvölgyi, Csaba; Nagy, László G.

    2013-01-01

    Reconciling traditional classifications, morphology, and the phylogenetic relationships of brown-spored agaric mushrooms has proven difficult in many groups, due to extensive convergence in morphological features. Here, we address the monophyly of the Bolbitiaceae, a family with over 700 described species and examine the higher-level relationships within the family using a newly constructed multilocus dataset (ITS, nrLSU rDNA and EF1-alpha). We tested whether the fast-evolving Internal Transcribed Spacer (ITS) sequences can be accurately aligned across the family, by comparing the outcome of two iterative alignment refining approaches (an automated and a manual) and various indel-treatment strategies. We used PRANK to align sequences in both cases. Our results suggest that – although PRANK successfully evades overmatching of gapped sites, referred previously to as alignment overmatching – it infers an unrealistically high number of indel events with natively generated guide-trees. This 'alignment undermatching' could be avoided by using more rigorous (e.g. ML) guide trees. The trees inferred in this study support the monophyly of the core Bolbitiaceae, with the exclusion of Panaeolus, Agrocybe, and some of the genera formerly placed in the family. Bolbitius and Conocybe were found monophyletic, however, Pholiotina and Galerella require redefinition. The phylogeny revealed that stipe coverage type is a poor predictor of phylogenetic relationships, indicating the need for a revision of the intrageneric relationships within Conocybe. PMID:23418526

  3. Iteratively refined guide trees help improving alignment and phylogenetic inference in the mushroom family Bolbitiaceae.

    PubMed

    Tóth, Annamária; Hausknecht, Anton; Krisai-Greilhuber, Irmgard; Papp, Tamás; Vágvölgyi, Csaba; Nagy, László G

    2013-01-01

    Reconciling traditional classifications, morphology, and the phylogenetic relationships of brown-spored agaric mushrooms has proven difficult in many groups, due to extensive convergence in morphological features. Here, we address the monophyly of the Bolbitiaceae, a family with over 700 described species and examine the higher-level relationships within the family using a newly constructed multilocus dataset (ITS, nrLSU rDNA and EF1-alpha). We tested whether the fast-evolving Internal Transcribed Spacer (ITS) sequences can be accurately aligned across the family, by comparing the outcome of two iterative alignment refining approaches (an automated and a manual) and various indel-treatment strategies. We used PRANK to align sequences in both cases. Our results suggest that--although PRANK successfully evades overmatching of gapped sites, referred previously to as alignment overmatching--it infers an unrealistically high number of indel events with natively generated guide-trees. This 'alignment undermatching' could be avoided by using more rigorous (e.g. ML) guide trees. The trees inferred in this study support the monophyly of the core Bolbitiaceae, with the exclusion of Panaeolus, Agrocybe, and some of the genera formerly placed in the family. Bolbitius and Conocybe were found monophyletic, however, Pholiotina and Galerella require redefinition. The phylogeny revealed that stipe coverage type is a poor predictor of phylogenetic relationships, indicating the need for a revision of the intrageneric relationships within Conocybe.

  4. BuddySuite: Command-line toolkits for manipulating sequences, alignments, and phylogenetic trees.

    PubMed

    Bond, Stephen R; Keat, Karl E; Barreira, Sofia N; Baxevanis, Andreas D

    2017-02-25

    The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, it is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite.

  5. Phylogenetic Inference From Conserved sites Alignments

    SciTech Connect

    grundy, W.N.; Naylor, G.J.P.

    1999-08-15

    Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements.

  6. Interim Report on Multiple Sequence Alignments and TaqMan Signature Mapping to Phylogenetic Trees

    SciTech Connect

    Gardner, S; Jaing, C

    2012-03-27

    The goal of this project is to develop forensic genotyping assays for select agent viruses, addressing a significant capability gap for the viral bioforensics and law enforcement community. We used a multipronged approach combining bioinformatics analysis, PCR-enriched samples, microarrays and TaqMan assays to develop high resolution and cost effective genotyping methods for strain level forensic discrimination of viruses. We have leveraged substantial experience and efficiency gained through year 1 on software development, SNP discovery, TaqMan signature design and phylogenetic signature mapping to scale up the development of forensics signatures in year 2. In this report, we have summarized the Taqman signature development for South American hemorrhagic fever viruses, tick-borne encephalitis viruses and henipaviruses, Old World Arenaviruses, filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus and Japanese encephalitis virus.

  7. Comparison of tree-child phylogenetic networks.

    PubMed

    Cardona, Gabriel; Rosselló, Francesc; Valiente, Gabriel

    2009-01-01

    Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences, all attempts to endorse a class of phylogenetic networks (strictly extending the class of phylogenetic trees) with a well-founded distance measure have, to the best of our knowledge and with the only exception of the bipartition distance on regular networks, failed so far. In this paper, we present and study a new meaningful class of phylogenetic networks, called tree-child phylogenetic networks, and we provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors. We then use this representation to define a distance on this class that extends the well-known Robinson-Foulds distance for phylogenetic trees and to give an alignment method for pairs of networks in this class. Simple polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks and for aligning a pair of tree-child phylogenetic networks, are provided. They have been implemented as a Perl package and a Java applet, which can be found at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance/.

  8. A Universal Phylogenetic Tree.

    ERIC Educational Resources Information Center

    Offner, Susan

    2001-01-01

    Presents a universal phylogenetic tree suitable for use in high school and college-level biology classrooms. Illustrates the antiquity of life and that all life is related, even if it dates back 3.5 billion years. Reflects important evolutionary relationships and provides an exciting way to learn about the history of life. (SAH)

  9. Phylogenetic trees in bioinformatics

    SciTech Connect

    Burr, Tom L

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  10. Phylogenetic Trees From Sequences

    NASA Astrophysics Data System (ADS)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  11. Phylogenetic trees and Euclidean embeddings.

    PubMed

    Layer, Mark; Rhodes, John A

    2017-01-01

    It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.

  12. Alignment-free phylogenetics and population genetics.

    PubMed

    Haubold, Bernhard

    2014-05-01

    Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on comparative data, today usually DNA sequences. These have become so plentiful that alignment-free sequence comparison is of growing importance in the race between scientists and sequencing machines. In phylogenetics, efficient distance computation is the major contribution of alignment-free methods. A distance measure should reflect the number of substitutions per site, which underlies classical alignment-based phylogeny reconstruction. Alignment-free distance measures are either based on word counts or on match lengths, and I apply examples of both approaches to simulated and real data to assess their accuracy and efficiency. While phylogeny reconstruction is based on the number of substitutions, in population genetics, the distribution of mutations along a sequence is also considered. This distribution can be explored by match lengths, thus opening the prospect of alignment-free population genomics.

  13. Incompletely resolved phylogenetic trees inflate estimates of phylogenetic conservatism.

    PubMed

    Davies, T Jonathan; Kraft, Nathan J B; Salamin, Nicolas; Wolkovich, Elizabeth M

    2012-02-01

    The tendency for more closely related species to share similar traits and ecological strategies can be explained by their longer shared evolutionary histories and represents phylogenetic conservatism. How strongly species traits co-vary with phylogeny can significantly impact how we analyze cross-species data and can influence our interpretation of assembly rules in the rapidly expanding field of community phylogenetics. Phylogenetic conservatism is typically quantified by analyzing the distribution of species values on the phylogenetic tree that connects them. Many phylogenetic approaches, however, assume a completely sampled phylogeny: while we have good estimates of deeper phylogenetic relationships for many species-rich groups, such as birds and flowering plants, we often lack information on more recent interspecific relationships (i.e., within a genus). A common solution has been to represent these relationships as polytomies on trees using taxonomy as a guide. Here we show that such trees can dramatically inflate estimates of phylogenetic conservatism quantified using S. P. Blomberg et al.'s K statistic. Using simulations, we show that even randomly generated traits can appear to be phylogenetically conserved on poorly resolved trees. We provide a simple rarefaction-based solution that can reliably retrieve unbiased estimates of K, and we illustrate our method using data on first flowering times from Thoreau's woods (Concord, Massachusetts, USA).

  14. Visual exploration of parameter influence on phylogenetic trees.

    PubMed

    Hess, Martin; Bremm, Sebastian; Weissgraeber, Stephanie; Hamacher, Kay; Goesele, Michael; Wiemeyer, Josef; von Landesberger, Tatiana

    2014-01-01

    Evolutionary relationships between organisms are frequently derived as phylogenetic trees inferred from multiple sequence alignments (MSAs). The MSA parameter space is exponentially large, so tens of thousands of potential trees can emerge for each dataset. A proposed visual-analytics approach can reveal the parameters' impact on the trees. Given input trees created with different parameter settings, it hierarchically clusters the trees according to their structural similarity. The most important clusters of similar trees are shown together with their parameters. This view offers interactive parameter exploration and automatic identification of relevant parameters. Biologists applied this approach to real data of 16S ribosomal RNA and protein sequences of ion channels. It revealed which parameters affected the tree structures. This led to a more reliable selection of the best trees.

  15. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

    PubMed Central

    Vilella, Albert J.; Severin, Jessica; Ureta-Vidal, Abel; Heng, Li; Durbin, Richard; Birney, Ewan

    2009-01-01

    We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project. PMID:19029536

  16. Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs

    PubMed Central

    Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.

    2013-01-01

    Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118

  17. Terrestrial apes and phylogenetic trees

    PubMed Central

    Arsuaga, Juan Luis

    2010-01-01

    The image that best expresses Darwin’s thinking is the tree of life. However, Darwin’s human evolutionary tree lacked almost everything because only the Neanderthals were known at the time and they were considered one extreme expression of our own species. Darwin believed that the root of the human tree was very deep and in Africa. It was not until 1962 that the root was shown to be much more recent in time and definitively in Africa. On the other hand, some neo-Darwinians believed that our family tree was not a tree, because there were no branches, but, rather, a straight stem. The recent years have witnessed spectacular discoveries in Africa that take us close to the origin of the human tree and in Spain at Atapuerca that help us better understand the origin of the Neanderthals as well as our own species. The final form of the tree, and the number of branches, remains an object of passionate debate. PMID:20445090

  18. Phylogenetic tree shapes resolve disease transmission patterns

    PubMed Central

    Colijn, Caroline; Gardy, Jennifer

    2014-01-01

    Background and Objectives: Whole-genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterized infectious periods, epidemiological and clinical metadata which may not always be available, and typically require computationally intensive analysis focusing on the branch lengths in phylogenetic trees. We sought to determine whether the topological structures of phylogenetic trees contain signatures of the transmission patterns underlying an outbreak. Methodology: We use simulated outbreaks to train and then test computational classifiers. We test the method on data from two real-world outbreaks. Results: We show that different transmission patterns result in quantitatively different phylogenetic tree shapes. We describe topological features that summarize a phylogeny’s structure and find that computational classifiers based on these are capable of predicting an outbreak’s transmission dynamics. The method is robust to variations in the transmission parameters and network types, and recapitulates known epidemiology of previously characterized real-world outbreaks. Conclusions and implications: There are simple structural properties of phylogenetic trees which, when combined, can distinguish communicable disease outbreaks with a super-spreader, homogeneous transmission and chains of transmission. This is possible using genome data alone, and can be done during an outbreak. We discuss the implications for management of outbreaks. PMID:24916411

  19. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  20. Relating phylogenetic trees to transmission trees of infectious disease outbreaks.

    PubMed

    Ypma, Rolf J F; van Ballegooijen, W Marijn; Wallinga, Jacco

    2013-11-01

    Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology.

  1. Constructing Student Problems in Phylogenetic Tree Construction.

    ERIC Educational Resources Information Center

    Brewer, Steven D.

    Evolution is often equated with natural selection and is taught from a primarily functional perspective while comparative and historical approaches, which are critical for developing an appreciation of the power of evolutionary theory, are often neglected. This report describes a study of expert problem-solving in phylogenetic tree construction.…

  2. Quantifying MCMC Exploration of Phylogenetic Tree Space

    PubMed Central

    Whidden, Chris; Matsen, Frederick A.

    2015-01-01

    In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks. PMID:25631175

  3. Quantifying MCMC exploration of phylogenetic tree space.

    PubMed

    Whidden, Chris; Matsen, Frederick A

    2015-05-01

    In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks.

  4. BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC

    PubMed Central

    Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun

    2009-01-01

    Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598

  5. Exploring hierarchical visualization designs using phylogenetic trees

    NASA Astrophysics Data System (ADS)

    Li, Shaomeng; Crouser, R. Jordan; Griffin, Garth; Gramazio, Connor; Schulz, Hans-Jörg; Childs, Hank; Chang, Remco

    2015-01-01

    Ongoing research on information visualization has produced an ever-increasing number of visualization designs. Despite this activity, limited progress has been made in categorizing this large number of information visualizations. This makes understanding their common design features challenging, and obscures the yet unexplored areas of novel designs. With this work, we provide categorization from an evolutionary perspective, leveraging a computational model to represent evolutionary processes, the phylogenetic tree. The result - a phylogenetic tree of a design corpus of hierarchical visualizations - enables better understanding of the various design features of hierarchical information visualizations, and further illuminates the space in which the visualizations lie, through support for interactive clustering and novel design suggestions. We demonstrate these benefits with our software system, where a corpus of two-dimensional hierarchical visualization designs is constructed into a phylogenetic tree. This software system supports visual interactive clustering and suggesting for novel designs; the latter capacity is also demonstrated via collaboration with an artist who sketched new designs using our system.

  6. Building phylogenetic trees from molecular data with MEGA.

    PubMed

    Hall, Barry G

    2013-05-01

    Phylogenetic analysis is sometimes regarded as being an intimidating, complex process that requires expertise and years of experience. In fact, it is a fairly straightforward process that can be learned quickly and applied effectively. This Protocol describes the several steps required to produce a phylogenetic tree from molecular data for novices. In the example illustrated here, the program MEGA is used to implement all those steps, thereby eliminating the need to learn several programs, and to deal with multiple file formats from one step to another (Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 28:2731-2739). The first step, identification of a set of homologous sequences and downloading those sequences, is implemented by MEGA's own browser built on top of the Google Chrome toolkit. For the second step, alignment of those sequences, MEGA offers two different algorithms: ClustalW and MUSCLE. For the third step, construction of a phylogenetic tree from the aligned sequences, MEGA offers many different methods. Here we illustrate the maximum likelihood method, beginning with MEGA's Models feature, which permits selecting the most suitable substitution model. Finally, MEGA provides a powerful and flexible interface for the final step, actually drawing the tree for publication. Here a step-by-step protocol is presented in sufficient detail to allow a novice to start with a sequence of interest and to build a publication-quality tree illustrating the evolution of an appropriate set of homologs of that sequence. MEGA is available for use on PCs and Macs from www.megasoftware.net.

  7. kdetrees: non-parametric estimation of phylogenetic tree distributions

    PubMed Central

    Weyenberg, Grady; Huggins, Peter M.; Schardl, Christopher L.; Howe, Daniel K.; Yoshida, Ruriko

    2014-01-01

    Motivation: Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such ‘outlying’ gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. Results: We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. Availability and implementation: Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN. Contact: ruriko.yoshida@uky.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24764459

  8. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution

    PubMed Central

    Kendall, Michelle; Colijn, Caroline

    2016-01-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. Key words: phylogenetics, evolution, tree metrics, genetics, sequencing. PMID:27343287

  9. Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees.

    PubMed

    Baste, Julien; Paul, Christophe; Sau, Ignasi; Scornavacca, Celine

    2017-04-01

    In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree-a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes-called the "species tree." One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping-but not identical-sets of labels, is called "supertree." In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time [Formula: see text], where n is the total size of the input.

  10. Student Interpretations of Phylogenetic Trees in an Introductory Biology Course

    ERIC Educational Resources Information Center

    Dees, Jonathan; Momsen, Jennifer L.; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa…

  11. treespace: statistical exploration of landscapes of phylogenetic trees.

    PubMed

    Jombart, Thibaut; Kendall, Michelle; Almagro-Garcia, Jacob; Colijn, Caroline

    2017-04-04

    The increasing availability of large genomic datasets as well as the advent of Bayesian phylogenetics facilitate the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group-specific consensus phylogenies. treespace also provides a user-friendly web interface for interactive data analysis. treespace is integrated alongside existing standards for phylogenetics and is easily accessible through a web interface. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results. This article is protected by copyright. All rights reserved.

  12. Student interpretations of phylogenetic trees in an introductory biology course.

    PubMed

    Dees, Jonathan; Momsen, Jennifer L; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa relatedness on phylogenetic trees, to measure the prevalence of correct taxa-relatedness interpretations, and to determine how student reasoning and correctness change in response to instruction and over time. Counting synapomorphies and nodes between taxa were the most common forms of incorrect reasoning, which presents a pedagogical dilemma concerning labeled synapomorphies on phylogenetic trees. Students also independently generated an alternative form of correct reasoning using monophyletic groups, the use of which decreased in popularity over time. Approximately half of all students were able to correctly interpret taxa relatedness on phylogenetic trees, and many memorized correct reasoning without understanding its application. Broad initial instruction that allowed students to generate inferences on their own contributed very little to phylogenetic tree understanding, while targeted instruction on evolutionary relationships improved understanding to some extent. Phylogenetic trees, which can directly affect student understanding of evolution, appear to offer introductory biology instructors a formidable pedagogical challenge.

  13. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis.

    PubMed

    Trifinopoulos, Jana; Nguyen, Lam-Tung; von Haeseler, Arndt; Minh, Bui Quang

    2016-07-08

    This article presents W-IQ-TREE, an intuitive and user-friendly web interface and server for IQ-TREE, an efficient phylogenetic software for maximum likelihood analysis. W-IQ-TREE supports multiple sequence types (DNA, protein, codon, binary and morphology) in common alignment formats and a wide range of evolutionary models including mixture and partition models. W-IQ-TREE performs fast model selection, partition scheme finding, efficient tree reconstruction, ultrafast bootstrapping, branch tests, and tree topology tests. All computations are conducted on a dedicated computer cluster and the users receive the results via URL or email. W-IQ-TREE is available at http://iqtree.cibiv.univie.ac.at It is free and open to all users and there is no login requirement.

  14. Estimating phylogenetic trees from genome-scale data.

    PubMed

    Liu, Liang; Xi, Zhenxiang; Wu, Shaoyuan; Davis, Charles C; Edwards, Scott V

    2015-12-01

    The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data.

  15. Phylogenetic tree construction based on 2D graphical representation

    NASA Astrophysics Data System (ADS)

    Liao, Bo; Shan, Xinzhou; Zhu, Wen; Li, Renfa

    2006-04-01

    A new approach based on the two-dimensional (2D) graphical representation of the whole genome sequence [Bo Liao, Chem. Phys. Lett., 401(2005) 196.] is proposed to analyze the phylogenetic relationships of genomes. The evolutionary distances are obtained through measuring the differences among the 2D curves. The fuzzy theory is used to construct phylogenetic tree. The phylogenetic relationships of H5N1 avian influenza virus illustrate the utility of our approach.

  16. Reconstruction of phylogenetic trees of prokaryotes using maximal common intervals.

    PubMed

    Heydari, Mahdi; Marashi, Sayed-Amir; Tusserkani, Ruzbeh; Sadeghi, Mehdi

    2014-10-01

    One of the fundamental problems in bioinformatics is phylogenetic tree reconstruction, which can be used for classifying living organisms into different taxonomic clades. The classical approach to this problem is based on a marker such as 16S ribosomal RNA. Since evolutionary events like genomic rearrangements are not included in reconstructions of phylogenetic trees based on single genes, much effort has been made to find other characteristics for phylogenetic reconstruction in recent years. With the increasing availability of completely sequenced genomes, gene order can be considered as a new solution for this problem. In the present work, we applied maximal common intervals (MCIs) in two or more genomes to infer their distance and to reconstruct their evolutionary relationship. Additionally, measures based on uncommon segments (UCS's), i.e., those genomic segments which are not detected as part of any of the MCIs, are also used for phylogenetic tree reconstruction. We applied these two types of measures for reconstructing the phylogenetic tree of 63 prokaryotes with known COG (clusters of orthologous groups) families. Similarity between the MCI-based (resp. UCS-based) reconstructed phylogenetic trees and the phylogenetic tree obtained from NCBI taxonomy browser is as high as 93.1% (resp. 94.9%). We show that in the case of this diverse dataset of prokaryotes, tree reconstruction based on MCI and UCS outperforms most of the currently available methods based on gene orders, including breakpoint distance and DCJ. We additionally tested our new measures on a dataset of 13 closely-related bacteria from the genus Prochlorococcus. In this case, distances like rearrangement distance, breakpoint distance and DCJ proved to be useful, while our new measures are still appropriate for phylogenetic reconstruction.

  17. Maximum parsimony, substitution model, and probability phylogenetic trees.

    PubMed

    Weng, J F; Thomas, D A; Mareels, I

    2011-01-01

    The problem of inferring phylogenies (phylogenetic trees) is one of the main problems in computational biology. There are three main methods for inferring phylogenies-Maximum Parsimony (MP), Distance Matrix (DM) and Maximum Likelihood (ML), of which the MP method is the most well-studied and popular method. In the MP method the optimization criterion is the number of substitutions of the nucleotides computed by the differences in the investigated nucleotide sequences. However, the MP method is often criticized as it only counts the substitutions observable at the current time and all the unobservable substitutions that really occur in the evolutionary history are omitted. In order to take into account the unobservable substitutions, some substitution models have been established and they are now widely used in the DM and ML methods but these substitution models cannot be used within the classical MP method. Recently the authors proposed a probability representation model for phylogenetic trees and the reconstructed trees in this model are called probability phylogenetic trees. One of the advantages of the probability representation model is that it can include a substitution model to infer phylogenetic trees based on the MP principle. In this paper we explain how to use a substitution model in the reconstruction of probability phylogenetic trees and show the advantage of this approach with examples.

  18. Molecular phylogenetic analysis of the Papionina using concatenation and species tree methods.

    PubMed

    Guevara, Elaine E; Steiper, Michael E

    2014-01-01

    The Papionina is a geographically widespread subtribe of African cercopithecid monkeys whose evolutionary history is of particular interest to anthropologists. The phylogenetic relationships among arboreal mangabeys (Lophocebus), baboons (Papio), and geladas (Theropithecus) remain unresolved. Molecular phylogenetic analyses have revealed marked gene tree incongruence for these taxa, and several recent concatenated phylogenetic analyses of multilocus datasets have supported different phylogenetic hypotheses. To address this issue, we investigated the phylogeny of the Lophocebus + Papio + Theropithecus group using concatenation methods, as well as alternative methods that incorporate gene tree heterogeneity to estimate a 'species tree.' Our compiled DNA sequence dataset was ∼56 kb pairs long and included 57 independent partitions. All analyses of concatenated alignments strongly supported a Lophocebus + Papio clade and a basal position for Theropithecus. The Bayesian concordance analysis supported the same phylogeny. A coalescent-based Bayesian method resulted in a very poorly resolved species tree. The topological agreement between concatenation and the Bayesian concordance analysis offers considerable support for a Lophocebus + Papio clade as the dominant relationship across the genome. However, the results of the Bayesian concordance analysis indicate that almost half the genome has an alternative history. As such, our results offer a well-supported phylogenetic hypothesis for the Papio/Lophocebus/Theropithecus trichotomy, while at the same time providing evidence for a complex evolutionary history that likely includes hybridization among lineages.

  19. [Molecular evidence on the phylogenetic position of tree shrews].

    PubMed

    Xu, Ling; Fan, Yu; Jiang, Xue-Long; Yao, Yong-Gang

    2013-04-01

    The tree shrew is currently located in the Order Scandentia and is widely distributed in Southeast Asia, South Asia, and South China. Due to its unique characteristics, such as small body size, high brain-to-body mass ratio, short reproductive cycle and life span, and low-cost of maintenance, the tree shrew has been proposed as an alternative experimental animal to primates in biomedical research. However, there is unresolved debate regarding the phylogenetic affinity of tree shrews to primates and their phylogenetic position in Euarchontoglires. To help settle this debate, we summarized the available molecular evidence on the phylogenetic position of the tree shrew. Most nuclear DNA data, including recent genome data, suggested that the tree shrew belongs to the Euarchonta clade harboring primates and flying lemurs (colugos). However, analyses of mitochondrial DNA (mtDNA) data suggested a close relationship to lagomorphs and rodents. These different clustering patterns could be explained by nuclear gene data and mtDNA data discrepancies, as well as the different phylogenetic approaches used in previous studies. Taking all available conclusions together, the robust data from whole genome of this species supports tree shrews being genetically closely related to primates.

  20. An Optimization-Based Sampling Scheme for Phylogenetic Trees

    NASA Astrophysics Data System (ADS)

    Misra, Navodit; Blelloch, Guy; Ravi, R.; Schwartz, Russell

    Much modern work in phylogenetics depends on statistical sampling approaches to phylogeny construction to estimate probability distributions of possible trees for any given input data set. Our theoretical understanding of sampling approaches to phylogenetics remains far less developed than that for optimization approaches, however, particularly with regard to the number of sampling steps needed to produce accurate samples of tree partition functions. Despite the many advantages in principle of being able to sample trees from sophisticated probabilistic models, we have little theoretical basis for concluding that the prevailing sampling approaches do in fact yield accurate samples from those models within realistic numbers of steps. We propose a novel approach to phylogenetic sampling intended to be both efficient in practice and more amenable to theoretical analysis than the prevailing methods. The method depends on replacing the standard tree rearrangement moves with an alternative Markov model in which one solves a theoretically hard but practically tractable optimization problem on each step of sampling. The resulting method can be applied to a broad range of standard probability models, yielding practical algorithms for efficient sampling and rigorous proofs of accurate sampling for some important special cases. We demonstrate the efficiency and versatility of the method in an analysis of uncertainty in tree inference over varying input sizes. In addition to providing a new practical method for phylogenetic sampling, the technique is likely to prove applicable to many similar problems involving sampling over combinatorial objects weighted by a likelihood model.

  1. Inferring Epidemic Contact Structure from Phylogenetic Trees

    PubMed Central

    Leventhal, Gabriel E.; Kouyos, Roger; Stadler, Tanja; von Wyl, Viktor; Yerly, Sabine; Böni, Jürg; Cellerai, Cristina; Klimkait, Thomas; Günthard, Huldrych F.; Bonhoeffer, Sebastian

    2012-01-01

    Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing. PMID:22412361

  2. Inferring epidemic contact structure from phylogenetic trees.

    PubMed

    Leventhal, Gabriel E; Kouyos, Roger; Stadler, Tanja; Wyl, Viktor von; Yerly, Sabine; Böni, Jürg; Cellerai, Cristina; Klimkait, Thomas; Günthard, Huldrych F; Bonhoeffer, Sebastian

    2012-01-01

    Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing.

  3. Dimensional Reduction for the General Markov Model on Phylogenetic Trees.

    PubMed

    Sumner, Jeremy G

    2017-03-01

    We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.

  4. Walking tree heuristics for biological string alignment, gene location, and phylogenies

    NASA Astrophysics Data System (ADS)

    Cull, P.; Holloway, J. L.; Cavener, J. D.

    1999-03-01

    Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids (proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and aligning strings brings out their shared characteristics. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model's assumptions. We propose a family of heuristics called walking trees to align biologically reasonable strings. Both edit-distance and walking tree methods can locate specific genes within a large string when the genes' sequences are given. When we attempt to match whole strings, the walking tree matches most genes, while the edit-distance method fails. We also give examples in which the walking tree matches substrings even if they have been moved or inverted. The edit-distance method was not designed to handle these problems. We include an example in which the walking tree "discovered" a gene. Calculating scores for whole genome matches gives a method for approximating evolutionary distance. We show two evolutionary trees for the picornaviruses which were computed by the walking tree heuristic. Both of these trees show great similarity to previously constructed trees. The point of this demonstration is that WHOLE genomes can be matched and distances calculated. The first tree was created on a Sequent parallel computer and demonstrates that the walking tree heuristic can be efficiently parallelized. The second tree was created using a network of work stations and demonstrates that there is suffient parallelism in the phylogenetic tree calculation that the sequential walking tree can be used effectively on a network.

  5. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

    PubMed Central

    2010-01-01

    Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. PMID:21034504

  6. Reliable Phylogenetic Trees Building: A New Web Interface for FIGENIX.

    PubMed

    Paganini, Julien; Gouret, Philippe

    2012-01-01

    The community needed a reliable and user friendly tool to quickly produce robust phylogenetic trees which are crucial in evolutionary studies and genomes' functional annotation. FIGENIX is software dedicated to this and was published in 2005. Several laboratories around the world use it in their research, but it was difficult to use for non-expert users, thus we developed a new graphical user interface for the benefit of all biologists.

  7. Quartet decomposition server: a platform for analyzing phylogenetic trees

    PubMed Central

    2012-01-01

    Background The frequent exchange of genetic material among prokaryotes means that extracting a majority or plurality phylogenetic signal from many gene families, and the identification of gene families that are in significant conflict with the plurality signal is a frequent task in comparative genomics, and especially in phylogenomic analyses. Decomposition of gene trees into embedded quartets (unrooted trees each with four taxa) is a convenient and statistically powerful technique to address this challenging problem. This approach was shown to be useful in several studies of completely sequenced microbial genomes. Results We present here a web server that takes a collection of gene phylogenies, decomposes them into quartets, generates a Quartet Spectrum, and draws a split network. Users are also provided with various data download options for further analyses. Each gene phylogeny is to be represented by an assessment of phylogenetic information content, such as sets of trees reconstructed from bootstrap replicates or sampled from a posterior distribution. The Quartet Decomposition server is accessible at http://quartets.uga.edu. Conclusions The Quartet Decomposition server presented here provides a convenient means to perform Quartet Decomposition analyses and will empower users to find statistically supported phylogenetic conflicts. PMID:22676320

  8. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    PubMed Central

    2010-01-01

    Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking

  9. Mesoamerican tree squirrels evolution (Rodentia: Sciuridae): a molecular phylogenetic analysis.

    PubMed

    Villalobos, Federico; Gutierrez-Espeleta, Gustavo

    2014-06-01

    The tribe Sciurini comprehends the genera Sciurus, Syntheosiurus, Microsciurus, Tamiasciurus and Rheinthrosciurus. The phylogenetic relationships within Sciurus have been only partially done, and the relationship between Mesoamerican species remains unsolved. The phylogenetic relationships of the Mesoamerican tree squirrels were examined using molecular data. Sequence data publicly available (12S, 16S, CYTB mitochondrial genes and IRBP nuclear gene) and cytochrome B gene sequences of four previously not sampled Mesoamerican Sciurus species were analyzed under a Bayesian multispecies coalescence model. Phylogenetic analysis of the multilocus data set showed the neotropical tree squirrels as a monophyletic clade. The genus Sciurus was paraphyletic due to the inclusion of Microsciurus species (M. alfari and M. flaviventer). The South American species S. aestuans and S. stramineus showed a sister taxa relationship. Single locus analysis based on the most compact and complete data set (i.e. CYTB gene sequences), supported the monophyly of the South American species and recovered a Mesoamerican clade including S. aureogaster, S. granatensis and S. variegatoides. These results corroborated previous findings based on cladistic analysis of cranial and post-cranial characters. Our data support a close relationship between Mesoamerican Sciurus species and a sister relationship with South American species, and corroborates previous findings in relation to the polyphyly of Microsciurus and Syntheosciurus paraphyly.

  10. LifePrint: a novel k-tuple distance method for construction of phylogenetic trees

    PubMed Central

    Reyes-Prieto, Fabián; García-Chéquer, Adda J; Jaimes-Díaz, Hueman; Casique-Almazán, Janet; Espinosa-Lara, Juana M; Palma-Orozco, Rosaura; Méndez-Tenorio, Alfonso; Maldonado-Rodríguez, Rogelio; Beattie, Kenneth L

    2011-01-01

    Purpose Here we describe LifePrint, a sequence alignment-independent k-tuple distance method to estimate relatedness between complete genomes. Methods We designed a representative sample of all possible DNA tuples of length 9 (9-tuples). The final sample comprises 1878 tuples (called the LifePrint set of 9-tuples; LPS9) that are distinct from each other by at least two internal and noncontiguous nucleotide differences. For validation of our k-tuple distance method, we analyzed several real and simulated viroid genomes. Using different distance metrics, we scrutinized diverse viroid genomes to estimate the k-tuple distances between these genomic sequences. Then we used the estimated genomic k-tuple distances to construct phylogenetic trees using the neighbor-joining algorithm. A comparison of the accuracy of LPS9 and the previously reported 5-tuple method was made using symmetric differences between the trees estimated from each method and a simulated “true” phylogenetic tree. Results The identified optimal search scheme for LPS9 allows only up to two nucleotide differences between each 9-tuple and the scrutinized genome. Similarity search results of simulated viroid genomes indicate that, in most cases, LPS9 is able to detect single-base substitutions between genomes efficiently. Analysis of simulated genomic variants with a high proportion of base substitutions indicates that LPS9 is able to discern relationships between genomic variants with up to 40% of nucleotide substitution. Conclusion Our LPS9 method generates more accurate phylogenetic reconstructions than the previously proposed 5-tuples strategy. LPS9-reconstructed trees show higher bootstrap proportion values than distance trees derived from the 5-tuple method. PMID:21918634

  11. Implementation of a Markov model for phylogenetic trees.

    PubMed

    Bohl, Erich; Lancaster, Peter

    2006-04-07

    A recently developed mathematical model for the analysis of phylogenetic trees is applied to comparative data for 48 species. The model represents a return to fundamentals and makes no hypothesis with respect to the reversibility of the process. The species have been analysed in all subsets of three, and a measure of reliability of the results is provided. The numerical results of the computations on 17,296 triples of species are made available on the Internet. These results are discussed and the development of reliable tree structures for several species is illustrated. It is shown that, indeed, the Markov model is capable of considerably more interesting predictions than has been recognized to date.

  12. Why abundant tropical tree species are phylogenetically old.

    PubMed

    Wang, Shaopeng; Chen, Anping; Fang, Jingyun; Pacala, Stephen W

    2013-10-01

    Neutral models of species diversity predict patterns of abundance for communities in which all individuals are ecologically equivalent. These models were originally developed for Panamanian trees and successfully reproduce observed distributions of abundance. Neutral models also make macroevolutionary predictions that have rarely been evaluated or tested. Here we show that neutral models predict a humped or flat relationship between species age and population size. In contrast, ages and abundances of tree species in the Panamanian Canal watershed are found to be positively correlated, which falsifies the models. Speciation rates vary among phylogenetic lineages and are partially heritable from mother to daughter species. Variable speciation rates in an otherwise neutral model lead to a demographic advantage for species with low speciation rate. This demographic advantage results in a positive correlation between species age and abundance, as found in the Panamanian tropical forest community.

  13. Phylogenetic affinity of tree shrews to Glires is attributed to fast evolution rate.

    PubMed

    Lin, Jiannan; Chen, Guangfeng; Gu, Liang; Shen, Yuefeng; Zheng, Meizhu; Zheng, Weisheng; Hu, Xinjie; Zhang, Xiaobai; Qiu, Yu; Liu, Xiaoqing; Jiang, Cizhong

    2014-02-01

    Previous phylogenetic analyses have led to incongruent evolutionary relationships between tree shrews and other suborders of Euarchontoglires. What caused the incongruence remains elusive. In this study, we identified 6845 orthologous genes between seventeen placental mammals. Tree shrews and Primates were monophyletic in the phylogenetic trees derived from the first or/and second codon positions whereas tree shrews and Glires formed a monophyly in the trees derived from the third or all codon positions. The same topology was obtained in the phylogeny inference using the slowly and fast evolving genes, respectively. This incongruence was likely attributed to the fast substitution rate in tree shrews and Glires. Notably, sequence GC content only was not informative to resolve the controversial phylogenetic relationships between tree shrews, Glires, and Primates. Finally, estimation in the confidence of the tree selection strongly supported the phylogenetic affiliation of tree shrews to Primates as a monophyly.

  14. Network dynamics of eukaryotic LTR retroelements beyond phylogenetic trees

    PubMed Central

    Llorens, Carlos; Muñoz-Pomer, Alfonso; Bernad, Lucia; Botella, Hector; Moya, Andrés

    2009-01-01

    Background Sequencing projects have allowed diverse retroviruses and LTR retrotransposons from different eukaryotic organisms to be characterized. It is known that retroviruses and other retro-transcribing viruses evolve from LTR retrotransposons and that this whole system clusters into five families: Ty3/Gypsy, Retroviridae, Ty1/Copia, Bel/Pao and Caulimoviridae. Phylogenetic analyses usually show that these split into multiple distinct lineages but what is yet to be understood is how deep evolution occurred in this system. Results We combined phylogenetic and graph analyses to investigate the history of LTR retroelements both as a tree and as a network. We used 268 non-redundant LTR retroelements, many of them introduced for the first time in this work, to elucidate all possible LTR retroelement phylogenetic patterns. These were superimposed over the tree of eukaryotes to investigate the dynamics of the system, at distinct evolutionary times. Next, we investigated phenotypic features such as duplication and variability of amino acid motifs, and several differences in genomic ORF organization. Using this information we characterized eight reticulate evolution markers to construct phenotypic network models. Conclusion The evolutionary history of LTR retroelements can be traced as a time-evolving network that depends on phylogenetic patterns, epigenetic host-factors and phenotypic plasticity. The Ty1/Copia and the Ty3/Gypsy families represent the oldest patterns in this network that we found mimics eukaryotic macroevolution. The emergence of the Bel/Pao, Retroviridae and Caulimoviridae families in this network can be related with distinct inflations of the Ty3/Gypsy family, at distinct evolutionary times. This suggests that Ty3/Gypsy ancestors diversified much more than their Ty1/Copia counterparts, at distinct geological eras. Consistent with the principle of preferential attachment, the connectivities among phenotypic markers, taken as network

  15. Phylogenetic Tree Reconstruction Accuracy and Model Fit when Proportions of Variable Sites Change across the Tree

    PubMed Central

    Grievink, Liat Shavit; Penny, David; Hendy, Michael D.; Holland, Barbara R.

    2010-01-01

    Commonly used phylogenetic models assume a homogeneous process through time in all parts of the tree. However, it is known that these models can be too simplistic as they do not account for nonhomogeneous lineage-specific properties. In particular, it is now widely recognized that as constraints on sequences evolve, the proportion and positions of variable sites can vary between lineages causing heterotachy. The extent to which this model misspecification affects tree reconstruction is still unknown. Here, we evaluate the effect of changes in the proportions and positions of variable sites on model fit and tree estimation. We consider 5 current models of nucleotide sequence evolution in a Bayesian Markov chain Monte Carlo framework as well as maximum parsimony (MP). We show that for a tree with 4 lineages where 2 nonsister taxa undergo a change in the proportion of variable sites tree reconstruction under the best-fitting model, which is chosen using a relative test, often results in the wrong tree. In this case, we found that an absolute test of model fit is a better predictor of tree estimation accuracy. We also found further evidence that MP is not immune to heterotachy. In addition, we show that increased sampling of taxa that have undergone a change in proportion and positions of variable sites is critical for accurate tree reconstruction. PMID:20525636

  16. Inference of Transmission Network Structure from HIV Phylogenetic Trees.

    PubMed

    Giardina, Federica; Romero-Severson, Ethan Obie; Albert, Jan; Britton, Tom; Leitner, Thomas

    2017-01-01

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.

  17. Inference of Transmission Network Structure from HIV Phylogenetic Trees

    PubMed Central

    Britton, Tom; Leitner, Thomas

    2017-01-01

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic. PMID:28085876

  18. Phylogenetic tree and sequence similarity of beta-lactamases.

    PubMed

    Ogawara, H

    1993-06-01

    beta-Lactamases are the main cause of beta-lactam resistance in many pathogenic bacteria. These enzymes can be detected in a variety of pathogenic as well as non-pathogenic bacteria. The cyanobacteria are also known to produce a beta-lactamase. Recently, the amino acid sequences and the three-dimensional structures of some of these beta-lactamases have been clarified. On the basis of the amino acid sequences of 47 beta-lactamases and the computer-aided analysis, a phylogenetic tree is proposed in this paper. According to the tree, beta-lactamases are classified into six groups. Group 1 beta-lactamases are mainly composed of plasmid-mediated enzymes from gram-negative bacteria. However, chromosome-derived beta-lactamases from Klebsiella pneumoniae and Rhodopseudomonas capsulata take part in this group. Group 2 enzymes consist of a part of the chromosome-encoded beta-lactamases from Streptomyces, and chromosome-mediated enzymes from Yersinia enterocolitica, Citrobacter diversus, and Klebsiella oxytoca. Chromosome-encoded beta-lactamases from gram-negative bacteria form group 3. Group 4 is composed of metalloenzymes, whereas group 5 consists of OXA type beta-lactamases. Chromosome-encoded beta-lactamases from gram-positive bacteria form group 6. Comparison of the amino acid sequences among these groups confirmed the phylogenetic tree and the classification: the beta-lactamases in each group have its particular conserved amino acid sequences. In addition, the tree provides more detailed classification and time-scale mutual relationships and predicts new types of beta-lactamases that may be found. Furthermore, the classification deduced from the tree is generally in accord with the one based on the amino acid sequences reported previously. However, the class A beta-lactamases are clearly divided into three groups: groups 1, 2, and 6. RDF2 analysis shows that some combinations between beta-lactamases and beta-lactam-interacting proteins as well as eukaryotic proteins

  19. How Ecology and Landscape Dynamics Shape Phylogenetic Trees.

    PubMed

    Gascuel, Fanny; Ferrière, Régis; Aguilée, Robin; Lambert, Amaury

    2015-07-01

    Whether biotic or abiotic factors are the dominant drivers of clade diversification is a long-standing question in evolutionary biology. The ubiquitous patterns of phylogenetic imbalance and branching slowdown have been taken as supporting the role of ecological niche filling and spatial heterogeneity in ecological features, and thus of biotic processes, in diversification. However, a proper theoretical assessment of the relative roles of biotic and abiotic factors in macroevolution requires models that integrate both types of factors, and such models have been lacking. In this study, we use an individual-based model to investigate the temporal patterns of diversification driven by ecological speciation in a stochastically fluctuating geographic landscape. The model generates phylogenies whose shape evolves as the clade ages. Stabilization of tree shape often occurs after ecological saturation, revealing species turnover caused by competition and demographic stochasticity. In the initial phase of diversification (allopatric radiation into an empty landscape), trees tend to be unbalanced and branching slows down. As diversification proceeds further due to landscape dynamics, balance and branching tempo may increase and become positive. Three main conclusions follow. First, the phylogenies of ecologically saturated clades do not always exhibit branching slowdown. Branching slowdown requires that competition be wide or heterogeneous across the landscape, or that the characteristics of landscape dynamics vary geographically. Conversely, branching acceleration is predicted under narrow competition or frequent local catastrophes. Second, ecological heterogeneity does not necessarily cause phylogenies to be unbalanced--short time in geographical isolation or frequent local catastrophes may lead to balanced trees despite spatial heterogeneity. Conversely, unbalanced trees can emerge without spatial heterogeneity, notably if competition is wide. Third, short isolation time

  20. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses

    PubMed Central

    Capella-Gutiérrez, Salvador; Silla-Martínez, José M.; Gabaldón, Toni

    2009-01-01

    Summary: Multiple sequence alignments are central to many areas of bioinformatics. It has been shown that the removal of poorly aligned regions from an alignment increases the quality of subsequent analyses. Such an alignment trimming phase is complicated in large-scale phylogenetic analyses that deal with thousands of alignments. Here, we present trimAl, a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses. trimAl can consider several parameters, alone or in multiple combinations, for selecting the most reliable positions in the alignment. These include the proportion of sequences with a gap, the level of amino acid similarity and, if several alignments for the same set of sequences are provided, the level of consistency across different alignments. Moreover, trimAl can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized. Availability: trimAl has been written in C++, it is portable to all platforms. trimAl is freely available for download (http://trimal.cgenomics.org) and can be used online through the Phylemon web server (http://phylemon2.bioinfo.cipf.es/). Supplementary Material is available at http://trimal.cgenomics.org/publications. Contact: tgabaldon@crg.es PMID:19505945

  1. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    PubMed

    Satoh, Soichirou; Mimuro, Mamoru; Tanaka, Ayumi

    2013-01-01

    Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  2. Characterization of a branch of the phylogenetic tree

    SciTech Connect

    Samuel, Stuart A.; Weng, Gezhi

    2003-04-11

    We use a combination of analytic models and computer simulations to gain insight into the dynamics of evolution. Our results suggest that certain interesting phenomena should eventually emerge from the fossil record. For example, there should be a ''tortoise and hare effect'': Those genera with the smallest species death rate are likely to survive much longer than genera with large species birth and death rates. A complete characterization of the behavior of a branch of the phylogenetic tree corresponding to a genus and accurate mathematical representations of the various stages are obtained. We apply our results to address certain controversial issues that have arisen in paleontology such as the importance of punctuated equilibrium and whether unique Cambrian phyla have survived to the present.

  3. Unrooted unordered homeomorphic subtree alignment of RNA trees

    PubMed Central

    2013-01-01

    We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(nTnS + min(dT,dS)LTLS) time complexity, where nT,LT and dT are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying dT ≤ LT ≤ nT), and similarly for nS,LS and dS with respect to the tree S. This improves the time complexity of previous algorithms for less general variants of the problem. In order to obtain this time bound for HSA, we developed new algorithms for a generalized variant of the Min-Cost Bipartite Matching problem (MCM), as well as to two derivatives of this problem, entitled All-Cavity-MCM and All-Pairs-Cavity-MCM. For two input sets of size n and m, where n ≤ m, MCM and both its cavity derivatives are solved in O(n3 + nm) time, without the usage of priority queues (e.g. Fibonacci heaps) or other complex data structures. This gives the first cubic time algorithm for All-Pairs-Cavity-MCM, and improves the running times of MCM and All-Cavity-MCM problems in the unbalanced case where n ≪ m. We implemented the algorithm (in all modes mentioned above) as a graphical software tool which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran all-against-all inter-family pairwise alignments of RNAse P and Hammerhead RNA

  4. A Model of Desired Performance in Phylogenetic Tree Construction for Teaching Evolution.

    ERIC Educational Resources Information Center

    Brewer, Steven D.

    This research paper examines phylogenetic tree construction-a form of problem solving in biology-by studying the strategies and heuristics used by experts. One result of the research is the development of a model of desired performance for phylogenetic tree construction. A detailed description of the model and the sample problems which illustrate…

  5. bcgTree: automatized phylogenetic tree building from bacterial core genomes.

    PubMed

    Ankenbrand, Markus J; Keller, Alexander

    2016-10-01

    The need for multi-gene analyses in scientific fields such as phylogenetics and DNA barcoding has increased in recent years. In particular, these approaches are increasingly important for differentiating bacterial species, where reliance on the standard 16S rDNA marker can result in poor resolution. Additionally, the assembly of bacterial genomes has become a standard task due to advances in next-generation sequencing technologies. We created a bioinformatic pipeline, bcgTree, which uses assembled bacterial genomes either from databases or own sequencing results from the user to reconstruct their phylogenetic history. The pipeline automatically extracts 107 essential single-copy core genes, found in a majority of bacteria, using hidden Markov models and performs a partitioned maximum-likelihood analysis. Here, we describe the workflow of bcgTree and, as a proof-of-concept, its usefulness in resolving the phylogeny of 293 publically available bacterial strains of the genus Lactobacillus. We also evaluate its performance in both low- and high-level taxonomy test sets. The tool is freely available at github ( https://github.com/iimog/bcgTree ) and our institutional homepage ( http://www.dna-analytics.biozentrum.uni-wuerzburg.de ).

  6. Selection of Orthologous Genes for Construction of a Highly Resolved Phylogenetic Tree and Clarification of the Phylogeny of Trichosporonales Species

    PubMed Central

    Takashima, Masako; Manabe, Ri-ichiroh; Iwasaki, Wataru; Ohyama, Akira; Ohkuma, Moriya; Sugita, Takashi

    2015-01-01

    The order Trichosporonales (Tremellomycotina, Basidiomycota) includes various species that have clinical, agricultural and biotechnological value. Thus, understanding why and how evolutionary diversification occurred within this order is extremely important. This study clarified the phylogenetic relationships among Tricosporonales species. To select genes suitable for phylogenetic analysis, we determined the draft genomes of 17 Trichosporonales species and extracted 30 protein-coding DNA sequences (CDSs) from genomic data. The CDS regions of Trichosporon asahii and T. faecale were identified by referring to mRNA sequence data since the intron positions of the respective genes differed from those of Cryptococcus neoformans (outgroup) and are not conserved within this order. A multiple alignment of the respective gene was first constructed using the CDSs of T. asahii, T. faecale and C. neoformans, and those of other species were added and aligned based on codons. The phylogenetic trees were constructed based on each gene and a concatenated alignment. Resolution of the maximum-likelihood trees estimated from the concatenated dataset based on both nucleotide (72,531) and amino acid (24,173) sequences were greater than in previous reports. In addition, we found that several genes, such as phosphatidylinositol 3-kinase TOR1 and glutamate synthase (NADH), had good resolution in this group (even when used alone). Our study proposes a set of genes suitable for constructing a phylogenetic tree with high resolution to examine evolutionary diversification in Trichosporonales. These can also be used for epidemiological and biogeographical studies, and may also serve as the basis for a comprehensive reclassification of pleomorphic fungi. PMID:26241762

  7. Species divergence and phylogenetic variation of ecophysiological traits in lianas and trees.

    PubMed

    Rios, Rodrigo S; Salgado-Luarte, Cristian; Gianoli, Ernesto

    2014-01-01

    The climbing habit is an evolutionary key innovation in plants because it is associated with enhanced clade diversification. We tested whether patterns of species divergence and variation of three ecophysiological traits that are fundamental for plant adaptation to light environments (maximum photosynthetic rate [A(max)], dark respiration rate [R(d)], and specific leaf area [SLA]) are consistent with this key innovation. Using data reported from four tropical forests and three temperate forests, we compared phylogenetic distance among species as well as the evolutionary rate, phylogenetic distance and phylogenetic signal of those traits in lianas and trees. Estimates of evolutionary rates showed that R(d) evolved faster in lianas, while SLA evolved faster in trees. The mean phylogenetic distance was 1.2 times greater among liana species than among tree species. Likewise, estimates of phylogenetic distance indicated that lianas were less related than by chance alone (phylogenetic evenness across 63 species), and trees were more related than expected by chance (phylogenetic clustering across 71 species). Lianas showed evenness for R(d), while trees showed phylogenetic clustering for this trait. In contrast, for SLA, lianas exhibited phylogenetic clustering and trees showed phylogenetic evenness. Lianas and trees showed patterns of ecophysiological trait variation among species that were independent of phylogenetic relatedness. We found support for the expected pattern of greater species divergence in lianas, but did not find consistent patterns regarding ecophysiological trait evolution and divergence. R(d) followed the species-level pattern, i.e., greater divergence/evolution in lianas compared to trees, while the opposite occurred for SLA and no pattern was detected for A(max). R(d) may have driven lianas' divergence across forest environments, and might contribute to diversification in climber clades.

  8. One Tree to Link Them All: A Phylogenetic Dataset for the European Tetrapoda

    PubMed Central

    Roquet, Cristina; Lavergne, Sébastien; Thuiller, Wilfried

    2014-01-01

    Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty. PMID:25685620

  9. Community Phylogenetics: Assessing Tree Reconstruction Methods and the Utility of DNA Barcodes.

    PubMed

    Boyle, Elizabeth E; Adamowicz, Sarah J

    2015-01-01

    Studies examining phylogenetic community structure have become increasingly prevalent, yet little attention has been given to the influence of the input phylogeny on metrics that describe phylogenetic patterns of co-occurrence. Here, we examine the influence of branch length, tree reconstruction method, and amount of sequence data on measures of phylogenetic community structure, as well as the phylogenetic signal (Pagel's λ) in morphological traits, using Trichoptera larval communities from Churchill, Manitoba, Canada. We find that model-based tree reconstruction methods and the use of a backbone family-level phylogeny improve estimations of phylogenetic community structure. In addition, trees built using the barcode region of cytochrome c oxidase subunit I (COI) alone accurately predict metrics of phylogenetic community structure obtained from a multi-gene phylogeny. Input tree did not alter overall conclusions drawn for phylogenetic signal, as significant phylogenetic structure was detected in two body size traits across input trees. As the discipline of community phylogenetics continues to expand, it is important to investigate the best approaches to accurately estimate patterns. Our results suggest that emerging large datasets of DNA barcode sequences provide a vast resource for studying the structure of biological communities.

  10. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

    PubMed Central

    Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928

  11. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  12. PhyloExplorer: a web server to validate, explore and query phylogenetic trees

    PubMed Central

    Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent

    2009-01-01

    Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253

  13. Edge-related loss of tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

    PubMed

    Santos, Bráulio A; Arroyo-Rodríguez, Víctor; Moreno, Claudia E; Tabarelli, Marcelo

    2010-09-08

    Deforestation and forest fragmentation are known major causes of nonrandom extinction, but there is no information about their impact on the phylogenetic diversity of the remaining species assemblages. Using a large vegetation dataset from an old hyper-fragmented landscape in the Brazilian Atlantic rainforest we assess whether the local extirpation of tree species and functional impoverishment of tree assemblages reduce the phylogenetic diversity of the remaining tree assemblages. We detected a significant loss of tree phylogenetic diversity in forest edges, but not in core areas of small (<80 ha) forest fragments. This was attributed to a reduction of 11% in the average phylogenetic distance between any two randomly chosen individuals from forest edges; an increase of 17% in the average phylogenetic distance to closest non-conspecific relative for each individual in forest edges; and to the potential manifestation of late edge effects in the core areas of small forest remnants. We found no evidence supporting fragmentation-induced phylogenetic clustering or evenness. This could be explained by the low phylogenetic conservatism of key life-history traits corresponding to vulnerable species. Edge effects must be reduced to effectively protect tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

  14. morePhyML: improving the phylogenetic tree space exploration with PhyML 3.

    PubMed

    Criscuolo, Alexis

    2011-12-01

    PhyML is a widely used Maximum Likelihood (ML) phylogenetic tree inference software based on a standard hill-climbing method. Starting from an initial tree, the version 3 of PhyML explores the tree space by using "Nearest Neighbor Interchange" (NNI) or "Subtree Pruning and Regrafting" (SPR) tree swapping techniques in order to find the ML phylogenetic tree. NNI-based local searches are fast but can often get trapped in local optima, whereas it is expected that the larger (but slower to cover) SPR-based neighborhoods will lead to trees with higher likelihood. Here, I verify that PhyML infers more likely trees with SPRs than with NNIs in almost all cases. However, I also show that the SPR-based local search of PhyML often does not succeed at locating the ML tree. To improve the tree space exploration, I deliver a script, named morePhyML, which allows escaping from local optima by performing character reweighting. This ML tree search strategy, named ratchet, often leads to higher likelihood estimates. Based on the analysis of a large number of amino acid and nucleotide data, I show that morePhyML allows inferring more accurate phylogenetic trees than several other recently developed ML tree inference softwares in many cases.

  15. A novel approach to phylogenetic tree construction using stochastic optimization and clustering

    PubMed Central

    Qin, Ling; Chen, Yixin; Pan, Yi; Chen, Ling

    2006-01-01

    Background The problem of inferring the evolutionary history and constructing the phylogenetic tree with high performance has become one of the major problems in computational biology. Results A new phylogenetic tree construction method from a given set of objects (proteins, species, etc.) is presented. As an extension of ant colony optimization, this method proposes an adaptive phylogenetic clustering algorithm based on a digraph to find a tree structure that defines the ancestral relationships among the given objects. Conclusion Our phylogenetic tree construction method is tested to compare its results with that of the genetic algorithm (GA). Experimental results show that our algorithm converges much faster and also achieves higher quality than GA. PMID:17217517

  16. PhySortR: a fast, flexible tool for sorting phylogenetic trees in R

    PubMed Central

    Stephens, Timothy G.; Bhattacharya, Debashish; Ragan, Mark A.

    2016-01-01

    A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, particularly robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for classifying phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa based on tip labels (i.e., leaves) on a tree, with customisable options to assess clades within the context of the whole tree. Using simulated and empirical datasets, we demonstrate the potential and scalability of PhySortR in analysis of thousands of phylogenetic trees without a priori assumption of tree-rooting, and in yielding readily interpretable trees that unambiguously satisfy the query. PhySortR is a command-line tool that is freely available and easily automatable. PMID:27190724

  17. Soil phosphorus heterogeneity promotes tree species diversity and phylogenetic clustering in a tropical seasonal rainforest.

    PubMed

    Xu, Wumei; Ci, Xiuqin; Song, Caiyun; He, Tianhua; Zhang, Wenfu; Li, Qiaoming; Li, Jie

    2016-12-01

    The niche theory predicts that environmental heterogeneity and species diversity are positively correlated in tropical forests, whereas the neutral theory suggests that stochastic processes are more important in determining species diversity. This study sought to investigate the effects of soil nutrient (nitrogen and phosphorus) heterogeneity on tree species diversity in the Xishuangbanna tropical seasonal rainforest in southwestern China. Thirty-nine plots of 400 m(2) (20 × 20 m) were randomly located in the Xishuangbanna tropical seasonal rainforest. Within each plot, soil nutrient (nitrogen and phosphorus) availability and heterogeneity, tree species diversity, and community phylogenetic structure were measured. Soil phosphorus heterogeneity and tree species diversity in each plot were positively correlated, while phosphorus availability and tree species diversity were not. The trees in plots with low soil phosphorus heterogeneity were phylogenetically overdispersed, while the phylogenetic structure of trees within the plots became clustered as heterogeneity increased. Neither nitrogen availability nor its heterogeneity was correlated to tree species diversity or the phylogenetic structure of trees within the plots. The interspecific competition in the forest plots with low soil phosphorus heterogeneity could lead to an overdispersed community. However, as heterogeneity increase, more closely related species may be able to coexist together and lead to a clustered community. Our results indicate that soil phosphorus heterogeneity significantly affects tree diversity in the Xishuangbanna tropical seasonal rainforest, suggesting that deterministic processes are dominant in this tropical forest assembly.

  18. The vestigial olfactory receptor subgenome of odontocete whales: phylogenetic congruence between gene-tree reconciliation and supermatrix methods.

    PubMed

    McGowen, Michael R; Clark, Clay; Gatesy, John

    2008-08-01

    The macroevolutionary transition of whales (cetaceans) from a terrestrial quadruped to an obligate aquatic form involved major changes in sensory abilities. Compared to terrestrial mammals, the olfactory system of baleen whales is dramatically reduced, and in toothed whales is completely absent. We sampled the olfactory receptor (OR) subgenomes of eight cetacean species from four families. A multigene tree of 115 newly characterized OR sequences from these eight species and published data for Bos taurus revealed a diverse array of class II OR paralogues in Cetacea. Evolution of the OR gene superfamily in toothed whales (Odontoceti) featured a multitude of independent pseudogenization events, supporting anatomical evidence that odontocetes have lost their olfactory sense. We explored the phylogenetic utility of OR pseudogenes in Cetacea, concentrating on delphinids (oceanic dolphins), the product of a rapid evolutionary radiation that has been difficult to resolve in previous studies of mitochondrial DNA sequences. Phylogenetic analyses of OR pseudogenes using both gene-tree reconciliation and supermatrix methods yielded fully resolved, consistently supported relationships among members of four delphinid subfamilies. Alternative minimizations of gene duplications, gene duplications plus gene losses, deep coalescence events, and nucleotide substitutions plus indels returned highly congruent phylogenetic hypotheses. Novel DNA sequence data for six single-copy nuclear loci and three mitochondrial genes (> 5000 aligned nucleotides) provided an independent test of the OR trees. Nucleotide substitutions and indels in OR pseudogenes showed a very low degree of homoplasy in comparison to mitochondrial DNA and, on average, provided more variation than single-copy nuclear DNA. Our results suggest that phylogenetic analysis of the large OR superfamily will be effective for resolving relationships within Cetacea whether supermatrix or gene-tree reconciliation procedures are

  19. Climate-driven extinctions shape the phylogenetic structure of temperate tree floras.

    PubMed

    Eiserhardt, Wolf L; Borchsenius, Finn; Plum, Christoffer M; Ordonez, Alejandro; Svenning, Jens-Christian

    2015-03-01

    When taxa go extinct, unique evolutionary history is lost. If extinction is selective, and the intrinsic vulnerabilities of taxa show phylogenetic signal, more evolutionary history may be lost than expected under random extinction. Under what conditions this occurs is insufficiently known. We show that late Cenozoic climate change induced phylogenetically selective regional extinction of northern temperate trees because of phylogenetic signal in cold tolerance, leading to significantly and substantially larger than random losses of phylogenetic diversity (PD). The surviving floras in regions that experienced stronger extinction are phylogenetically more clustered, indicating that non-random losses of PD are of increasing concern with increasing extinction severity. Using simulations, we show that a simple threshold model of survival given a physiological trait with phylogenetic signal reproduces our findings. Our results send a strong warning that we may expect future assemblages to be phylogenetically and possibly functionally depauperate if anthropogenic climate change affects taxa similarly.

  20. Do Branch Lengths Help to Locate a Tree in a Phylogenetic Network?

    PubMed

    Gambette, Philippe; van Iersel, Leo; Kelk, Steven; Pardi, Fabio; Scornavacca, Celine

    2016-09-01

    Phylogenetic networks are increasingly used in evolutionary biology to represent the history of species that have undergone reticulate events such as horizontal gene transfer, hybrid speciation and recombination. One of the most fundamental questions that arise in this context is whether the evolution of a gene with one copy in all species can be explained by a given network. In mathematical terms, this is often translated in the following way: is a given phylogenetic tree contained in a given phylogenetic network? Recently this tree containment problem has been widely investigated from a computational perspective, but most studies have only focused on the topology of the phylogenies, ignoring a piece of information that, in the case of phylogenetic trees, is routinely inferred by evolutionary analyses: branch lengths. These measure the amount of change (e.g., nucleotide substitutions) that has occurred along each branch of the phylogeny. Here, we study a number of versions of the tree containment problem that explicitly account for branch lengths. We show that, although length information has the potential to locate more precisely a tree within a network, the problem is computationally hard in its most general form. On a positive note, for a number of special cases of biological relevance, we provide algorithms that solve this problem efficiently. This includes the case of networks of limited complexity, for which it is possible to recover, among the trees contained by the network with the same topology as the input tree, the closest one in terms of branch lengths.

  1. Tree-average distances on certain phylogenetic networks have their weights uniquely determined.

    PubMed

    Willson, Stephen J

    2012-01-01

    A phylogenetic network N has vertices corresponding to species and arcs corresponding to direct genetic inheritance from the species at the tail to the species at the head. Measurements of DNA are often made on species in the leaf set, and one seeks to infer properties of the network, possibly including the graph itself. In the case of phylogenetic trees, distances between extant species are frequently used to infer the phylogenetic trees by methods such as neighbor-joining. This paper proposes a tree-average distance for networks more general than trees. The notion requires a weight on each arc measuring the genetic change along the arc. For each displayed tree the distance between two leaves is the sum of the weights along the path joining them. At a hybrid vertex, each character is inherited from one of its parents. We will assume that for each hybrid there is a probability that the inheritance of a character is from a specified parent. Assume that the inheritance events at different hybrids are independent. Then for each displayed tree there will be a probability that the inheritance of a given character follows the tree; this probability may be interpreted as the probability of the tree. The tree-average distance between the leaves is defined to be the expected value of their distance in the displayed trees. For a class of rooted networks that includes rooted trees, it is shown that the weights and the probabilities at each hybrid vertex can be calculated given the network and the tree-average distances between the leaves. Hence these weights and probabilities are uniquely determined. The hypotheses on the networks include that hybrid vertices have indegree exactly 2 and that vertices that are not leaves have a tree-child.

  2. Phylogenetic Structure of Tree Species across Different Life Stages from Seedlings to Canopy Trees in a Subtropical Evergreen Broad-Leaved Forest

    PubMed Central

    Jin, Yi; Qian, Hong; Yu, Mingjian

    2015-01-01

    Investigating patterns of phylogenetic structure across different life stages of tree species in forests is crucial to understanding forest community assembly, and investigating forest gap influence on the phylogenetic structure of forest regeneration is necessary for understanding forest community assembly. Here, we examine the phylogenetic structure of tree species across life stages from seedlings to canopy trees, as well as forest gap influence on the phylogenetic structure of forest regeneration in a forest of the subtropical region in China. We investigate changes in phylogenetic relatedness (measured as NRI) of tree species from seedlings, saplings, treelets to canopy trees; we compare the phylogenetic turnover (measured as βNRI) between canopy trees and seedlings in forest understory with that between canopy trees and seedlings in forest gaps. We found that phylogenetic relatedness generally increases from seedlings through saplings and treelets up to canopy trees, and that phylogenetic relatedness does not differ between seedlings in forest understory and those in forest gaps, but phylogenetic turnover between canopy trees and seedlings in forest understory is lower than that between canopy trees and seedlings in forest gaps. We conclude that tree species tend to be more closely related from seedling to canopy layers, and that forest gaps alter the seedling phylogenetic turnover of the studied forest. It is likely that the increasing trend of phylogenetic clustering as tree stem size increases observed in this subtropical forest is primarily driven by abiotic filtering processes, which select a set of closely related evergreen broad-leaved tree species whose regeneration has adapted to the closed canopy environments of the subtropical forest developed under the regional monsoon climate. PMID:26098916

  3. Phylogenetic Structure of Tree Species across Different Life Stages from Seedlings to Canopy Trees in a Subtropical Evergreen Broad-Leaved Forest.

    PubMed

    Jin, Yi; Qian, Hong; Yu, Mingjian

    2015-01-01

    Investigating patterns of phylogenetic structure across different life stages of tree species in forests is crucial to understanding forest community assembly, and investigating forest gap influence on the phylogenetic structure of forest regeneration is necessary for understanding forest community assembly. Here, we examine the phylogenetic structure of tree species across life stages from seedlings to canopy trees, as well as forest gap influence on the phylogenetic structure of forest regeneration in a forest of the subtropical region in China. We investigate changes in phylogenetic relatedness (measured as NRI) of tree species from seedlings, saplings, treelets to canopy trees; we compare the phylogenetic turnover (measured as βNRI) between canopy trees and seedlings in forest understory with that between canopy trees and seedlings in forest gaps. We found that phylogenetic relatedness generally increases from seedlings through saplings and treelets up to canopy trees, and that phylogenetic relatedness does not differ between seedlings in forest understory and those in forest gaps, but phylogenetic turnover between canopy trees and seedlings in forest understory is lower than that between canopy trees and seedlings in forest gaps. We conclude that tree species tend to be more closely related from seedling to canopy layers, and that forest gaps alter the seedling phylogenetic turnover of the studied forest. It is likely that the increasing trend of phylogenetic clustering as tree stem size increases observed in this subtropical forest is primarily driven by abiotic filtering processes, which select a set of closely related evergreen broad-leaved tree species whose regeneration has adapted to the closed canopy environments of the subtropical forest developed under the regional monsoon climate.

  4. Native fauna on exotic trees: phylogenetic conservatism and geographic contingency in two lineages of phytophages on two lineages of trees.

    PubMed

    Gossner, Martin M; Chao, Anne; Bailey, Richard I; Prinzing, Andreas

    2009-05-01

    The relative roles of evolutionary history and geographical and ecological contingency for community assembly remain unknown. Plant species, for instance, share more phytophages with closer relatives (phylogenetic conservatism), but for exotic plants introduced to another continent, this may be overlaid by geographically contingent evolution or immigration from locally abundant plant species (mass effects). We assessed within local forests to what extent exotic trees (Douglas-fir, red oak) recruit phytophages (Coleoptera, Heteroptera) from more closely or more distantly related native plants. We found that exotics shared more phytophages with natives from the same major plant lineage (angiosperms vs. gymnosperms) than with natives from the other lineage. This was particularly true for Heteroptera, and it emphasizes the role of host specialization in phylogenetic conservatism of host use. However, for Coleoptera on Douglas-fir, mass effects were important: immigration from beech increased with increasing beech abundance. Within a plant phylum, phylogenetic proximity of exotics and natives increased phytophage similarity, primarily in younger Coleoptera clades on angiosperms, emphasizing a role of past codiversification of hosts and phytophages. Overall, phylogenetic conservatism can shape the assembly of local phytophage communities on exotic trees. Whether it outweighs geographic contingency and mass effects depends on the interplay of phylogenetic scale, local abundance of native tree species, and the biology and evolutionary history of the phytophage taxon.

  5. Frugivores bias seed-adult tree associations through nonrandom seed dispersal: a phylogenetic approach.

    PubMed

    Razafindratsima, Onja H; Dunham, Amy E

    2016-08-01

    Frugivores are the main seed dispersers in many ecosystems, such that behaviorally driven, nonrandom patterns of seed dispersal are a common process; but patterns are poorly understood. Characterizing these patterns may be essential for understanding spatial organization of fruiting trees and drivers of seed-dispersal limitation in biodiverse forests. To address this, we studied resulting spatial associations between dispersed seeds and adult tree neighbors in a diverse rainforest in Madagascar, using a temporal and phylogenetic approach. Data show that by using fruiting trees as seed-dispersal foci, frugivores bias seed dispersal under conspecific adults and under heterospecific trees that share dispersers and fruiting time with the dispersed species. Frugivore-mediated seed dispersal also resulted in nonrandom phylogenetic associations of dispersed seeds with their nearest adult neighbors, in nine out of the 16 months of our study. However, these nonrandom phylogenetic associations fluctuated unpredictably over time, ranging from clustered to overdispersed. The spatial and phylogenetic template of seed dispersal did not translate to similar patterns of association in adult tree neighborhoods, suggesting the importance of post-dispersal processes in structuring plant communities. Results suggest that frugivore-mediated seed dispersal is important for structuring early stages of plant-plant associations, setting the template for post-dispersal processes that influence ultimate patterns of plant recruitment. Importantly, if biased patterns of dispersal are common in other systems, frugivores may promote tree coexistence in biodiverse forests by limiting the frequency and diversity of heterospecific interactions of seeds they disperse.

  6. The Reliability and Stability of an Inferred Phylogenetic Tree from Empirical Data.

    PubMed

    Katsura, Yukako; Stanley, Craig E; Kumar, Sudhir; Nei, Masatoshi

    2017-01-18

    The reliability of a phylogenetic tree obtained from empirical data is usually measured by the bootstrap probability (Pb) of interior branches of the tree. If the bootstrap probability is high for most branches, the tree is considered to be reliable. If some interior branches show relatively low bootstrap probabilities, we are not sure that the inferred tree is really reliable. Here, we propose another quantity measuring the reliability of the tree called the stability of a subtree. This quantity refers to the probability of obtaining a subtree (Ps) of an inferred tree obtained. We then show that if the tree is to be reliable, both Pb and Ps must be high. We also show that Ps is given by a bootstrap probability of the subtree with the closest outgroup sequence, and computer program RESTA for computing the Pb and Ps values will be presented.

  7. Is invasion success of Australian trees mediated by their native biogeography, phylogenetic history, or both?

    PubMed

    Miller, Joseph T; Hui, Cang; Thornhill, Andrew; Gallien, Laure; Le Roux, Johannes J; Richardson, David M

    2016-12-30

    For a plant species to become invasive it has to progress along the introduction-naturalization-invasion (INI) continuum which reflects the joint direction of niche breadth. Identification of traits that correlate with and drive species invasiveness along the continuum is a major focus of invasion biology. If invasiveness is underlain by heritable traits, and if such traits are phylogenetically conserved, then we would expect non-native species with different introduction status (i.e. position along the INI continuum) to show phylogenetic signal. This study uses two clades that contain a large number of invasive tree species from the genera Acacia and Eucalyptus to test whether geographic distribution and a novel phylogenetic conservation method can predict which species have been introduced, became naturalized, and invasive. Our results suggest that no underlying phylogenetic signal underlie the introduction status for both groups of trees, except for introduced acacias. The more invasive acacia clade contains invasive species that have smoother geographic distributions and are more marginal in the phylogenetic network. The less invasive eucalyptus group contains invasive species that are more clustered geographically, more centrally located in the phylogenetic network and have phylogenetic distances between invasive and non-invasive species that are trending toward the mean pairwise distance. This suggests that highly invasive groups may be identified because they have invasive species with smoother and faster expanding native distributions and are located more to the edges of phylogenetic networks than less invasive groups.

  8. Comparing Phylogenetic Trees by Matching Nodes Using the Transfer Distance Between Partitions.

    PubMed

    Bogdanowicz, Damian; Giaro, Krzysztof

    2017-02-08

    Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies. For example, such metrics are used to assess the quality of phylogeny construction methods, to define optimization criteria in supertree building algorithms, or to find horizontal gene transfer (HGT) events. Among the set of metrics described so far in the literature, the most commonly used seems to be the Robinson-Foulds distance. In this article, we define a new metric for rooted trees-the Matching Pair (MP) distance. The MP metric uses the concept of the minimum-weight perfect matching in a complete bipartite graph constructed from partitions of all pairs of leaves of the compared phylogenetic trees. We analyze the properties of the MP metric and present computational experiments showing its potential applicability in tasks related to finding the HGT events.

  9. Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

    PubMed

    Robinson, Oscar; Dylus, David; Dessimoz, Christophe

    2016-08-01

    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license.

  10. Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web

    PubMed Central

    Robinson, Oscar; Dylus, David; Dessimoz, Christophe

    2016-01-01

    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license. PMID:27189561

  11. Nearly complete rRNA genes from 371 Animalia: updated structure-based alignment and detailed phylogenetic analysis.

    PubMed

    Mallatt, Jon; Craig, Catherine Waggoner; Yoder, Matthew J

    2012-09-01

    This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ~33 of the ~36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/~jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A-F, or 6A-C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly

  12. Calibrated birth-death phylogenetic time-tree priors for bayesian inference.

    PubMed

    Heled, Joseph; Drummond, Alexei J

    2015-05-01

    Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference.

  13. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees.

    PubMed

    He, Zilong; Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J; Chen, Wei-Hua; Hu, Songnian

    2016-07-08

    Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its 'dataset system' contains not only the data to be visualized on the tree, but also 'modifiers' that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new 'Demo' trees to demonstrate the basic functionalities of Evolview, and five new 'Showcase' trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/.

  14. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees

    PubMed Central

    He, Zilong; Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J.; Chen, Wei-Hua; Hu, Songnian

    2016-01-01

    Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its ‘dataset system’ contains not only the data to be visualized on the tree, but also ‘modifiers’ that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new ‘Demo’ trees to demonstrate the basic functionalities of Evolview, and five new ‘Showcase’ trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/. PMID:27131786

  15. PhyloPen: Phylogenetic Tree Browsing Using a Pen and Touch Interface.

    PubMed

    Wehrer, Anthony; Yee, Andrew; Lisle, Curtis; Hughes, Charles

    2015-11-23

    Phylogenetic trees are used by researchers across multiple fields of study to display historical relationships between organisms or genes. Trees are used to examine the speciation process in evolutionary biology, to classify families of viruses in epidemiology, to demonstrate co-speciation in host and pathogen studies, and to explore genetic changes occurring during the disease process in cancer, among other applications. Due to their complexity and the amount of data they present in visual form, phylogenetic trees have generally been difficult to render for publication and challenging to directly interact with in digital form. To address these limitations, we developed PhyloPen, an experimental novel multi-touch and pen application that renders a phylogenetic tree and allows users to interactively navigate within the tree, examining nodes, branches, and auxiliary information, and annotate the tree for note-taking and collaboration. We present a discussion of the interactions implemented in PhyloPen and the results of a formative study that examines how the application was received after use by practicing biologists -- faculty members and graduate students in the discipline. These results are to be later used for a fully supported implementation of the software where the community will be welcomed to participate in its development.

  16. PhyloPen: Phylogenetic Tree Browsing Using a Pen and Touch Interface

    PubMed Central

    Wehrer, Anthony; Yee, Andrew; Lisle, Curtis; Hughes, Charles

    2015-01-01

    Phylogenetic trees are used by researchers across multiple fields of study to display historical relationships between organisms or genes. Trees are used to examine the speciation process in evolutionary biology, to classify families of viruses in epidemiology, to demonstrate co-speciation in host and pathogen studies, and to explore genetic changes occurring during the disease process in cancer, among other applications. Due to their complexity and the amount of data they present in visual form, phylogenetic trees have generally been difficult to render for publication and challenging to directly interact with in digital form. To address these limitations, we developed PhyloPen, an experimental novel multi-touch and pen application that renders a phylogenetic tree and allows users to interactively navigate within the tree, examining nodes, branches, and auxiliary information, and annotate the tree for note-taking and collaboration. We present a discussion of the interactions implemented in PhyloPen and the results of a formative study that examines how the application was received after use by practicing biologists -- faculty members and graduate students in the discipline. These results are to be later used for a fully supported implementation of the software where the community will be welcomed to participate in its development. PMID:26693078

  17. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased.

    PubMed

    Xi, Zhenxiang; Liu, Liang; Davis, Charles C

    2015-11-01

    The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).

  18. Phylogenetic trees and the future of mammalian biodiversity

    PubMed Central

    Davies, T. Jonathan; Fritz, Susanne A.; Grenyer, Richard; Orme, C. David L.; Bielby, Jon; Bininda-Emonds, Olaf R. P.; Cardillo, Marcel; Jones, Kate E.; Gittleman, John L.; Mace, Georgina M.; Purvis, Andy

    2008-01-01

    Phylogenies describe the origins and history of species. However, they can also help to predict species' fates and so can be useful tools for managing the future of biodiversity. This article starts by sketching how phylogenetic, geographic, and trait information can be combined to elucidate present mammalian diversity patterns and how they arose. Recent diversification rates and standing diversity show different geographic patterns, indicating that cradles of diversity have moved over time. Patterns in extinction risk reflect both biological differences among mammalian lineages and differences in threat intensity among regions. Phylogenetic comparative analyses indicate that for small-bodied mammals, extinction risk is governed mostly by where the species live and the intensity of the threats, whereas for large-bodied mammals, ecological differences also play an important role. This modeling approach identifies species whose intrinsic biology renders them particularly vulnerable to increased human pressure. We outline how the approach might be extended to consider future trends in anthropogenic drivers, to identify likely future battlegrounds of mammalian conservation, and the likely casualties. This framework could help to highlight consequences of choosing among different future climatic and socioeconomic scenarios. We end by discussing priority-setting, showing how alternative currencies for diversity can suggest very different priorities. We argue that aiming to maximize long-term evolutionary responses is inappropriate, that conservation planning needs to consider costs as well as benefits, and that proactive conservation of largely intact systems should be part of a balanced strategy. PMID:18695230

  19. Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis

    PubMed Central

    2012-01-01

    Background Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use. Findings Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree. Conclusions The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting

  20. Climate Change Impacts on the Tree of Life: Changes in Phylogenetic Diversity Illustrated for Acropora Corals

    PubMed Central

    Faith, Daniel P.; Richards, Zoe T.

    2012-01-01

    The possible loss of whole branches from the tree of life is a dramatic, but under-studied, biological implication of climate change. The tree of life represents an evolutionary heritage providing both present and future benefits to humanity, often in unanticipated ways. Losses in this evolutionary (evo) life-support system represent losses in “evosystem” services, and are quantified using the phylogenetic diversity (PD) measure. High species-level biodiversity losses may or may not correspond to high PD losses. If climate change impacts are clumped on the phylogeny, then loss of deeper phylogenetic branches can mean disproportionately large PD loss for a given degree of species loss. Over time, successive species extinctions within a clade each may imply only a moderate loss of PD, until the last species within that clade goes extinct, and PD drops precipitously. Emerging methods of “phylogenetic risk analysis” address such phylogenetic tipping points by adjusting conservation priorities to better reflect risk of such worst-case losses. We have further developed and explored this approach for one of the most threatened taxonomic groups, corals. Based on a phylogenetic tree for the corals genus Acropora, we identify cases where worst-case PD losses may be avoided by designing risk-averse conservation priorities. We also propose spatial heterogeneity measures changes to assess possible changes in the geographic distribution of corals PD. PMID:24832524

  1. Building a Phylogenetic Tree of the Human and Ape Superfamily Using DNA-DNA Hybridization Data

    ERIC Educational Resources Information Center

    Maier, Caroline Alexander

    2004-01-01

    The study describes the process of DNA-DNA hybridization and the history of its use by Sibley and Alquist in simple, straightforward, and interesting language that students easily understand to create their own phylogenetic tree of the hominoid superfamily. They calibrate the DNA clock and use it to estimate the divergence dates of the various…

  2. Building Phylogenetic Trees from DNA Sequence Data: Investigating Polar Bear and Giant Panda Ancestry.

    ERIC Educational Resources Information Center

    Maier, Caroline Alexandra

    2001-01-01

    Presents an activity in which students seek answers to questions about evolutionary relationships by using genetic databases and bioinformatics software. Students build genetic distance matrices and phylogenetic trees based on molecular sequence data using web-based resources. Provides a flowchart of steps involved in accessing, retrieving, and…

  3. How much does horizontal gene transfer affect the phylogenetic tree of bacteria?

    NASA Astrophysics Data System (ADS)

    Tang, Bin; Boisvert, Philippe; Higgs, Paul

    2004-03-01

    Ribosomal RNA sequences are frequently used in bacterial phylogenetics. We have developed RNA-specific phylogenetic methods that take account of the conserved secondary structure of these sequences. Our method uses Monte Carlo simulations to generate a representative sample of evolutionary trees (analogous to an equilibrium ensemble in physics). It is known that horizontal transfer of genes can occur between bacterial species, although the frequency and implications of this are not fully understood. If horizontal transfer were frequent, there would be no consistent evolutionary tree for bacteria. We compared trees for 16S rRNA, 23S rRNA and tRNA genes from Proteobacteria (a diverse group for which many complete genome sequences are available). The gene trees are consistent with one another in most respects. Minor differences can almost all be attributed to uncertainties and unreliabilities in the phylogenetic method. We therefore conclude that these genes all give a coherent picture of the phylogeny of the organisms, and that horizontal transfer of these genes is too rare to obscure the signal of the organismal tree.

  4. Equality of Shapley value and fair proportion index in phylogenetic trees.

    PubMed

    Fuchs, Michael; Jin, Emma Yu

    2015-11-01

    The Shapley value and the fair proportion index of phylogenetic trees have been introduced recently for the purpose of making conservation decisions in genetics. Moreover, also very recently, Hartmann (J Math Biol 67:1163-1170, 2013) has presented data which shows that there is a strong correlation between a slightly modified version of the Shapley value (which we call the modified Shapley value) and the fair proportion index. He gave an explanation of this correlation by showing that the contribution of both indices to an edge of the tree becomes identical as the number of taxa tends to infinity. In this note, we show that the Shapley value and the fair proportion index are in fact the same. Moreover, we also consider the modified Shapley value and show that its covariance with the fair proportion index in random phylogenetic trees under the Yule-Harding model and uniform model is indeed close to one.

  5. Influence of tree shape and evolutionary time-scale on phylogenetic diversity metrics

    PubMed Central

    Mazel, F.; Davies, T.J; Gallien, L.; Renaud, J.; Groussin, M.; Münkemüller, T.; Thuiller, W.

    2016-01-01

    During the last decades, describing, analysing and understanding the phylogenetic structure of species assemblages has been a central theme in both community ecology and macro-ecology. Among the wide variety of phylogenetic structure metrics, three have been predominant in the literature: Faith’s phylogenetic diversity (PDFaith), which represents the sum of the branch lengths of the phylogenetic tree linking all species of a particular assemblage, the mean pairwise distance between all species in an assemblage (MPD) and the pairwise distance between the closest relatives in an assemblage (MNTD). Comparisons between studies using one or several of these metrics are difficult because there has been no comprehensive evaluation of the phylogenetic properties each metric captures. In particular it is unknown how PDFaith relates to MDP and MNTD. Consequently, it is possible that apparently opposing patterns in different studies might simply reflect differences in metric properties. Here, we aim to fill this gap by comparing these metrics using simulations and empirical data. We first used simulation experiments to test the influence of community structure and size on the mismatch between metrics whilst varying the shape and size of the phylogenetic tree of the species pool. Second we investigated the mismatch between metrics for two empirical datasets (gut microbes and global carnivoran assemblages). We show that MNTD and PDFaith provide similar information on phylogenetic structure, and respond similarly to variation in species richness and assemblage structure. However, MPD demonstrate a very different behaviour, and is highly sensitive to deep branching structure. We suggest that by combining complementary metrics that are sensitive to processes operating at different phylogenetic depths (i.e. MPD and MNTD or PDFaith) we can obtain a better understanding of assemblage structure. PMID:27713599

  6. Influence of tree shape and evolutionary time-scale on phylogenetic diversity metrics.

    PubMed

    Mazel, F; Davies, T J; Gallien, L; Renaud, J; Groussin, M; Münkemüller, T; Thuiller, W

    2016-10-01

    During the last decades, describing, analysing and understanding the phylogenetic structure of species assemblages has been a central theme in both community ecology and macro-ecology. Among the wide variety of phylogenetic structure metrics, three have been predominant in the literature: Faith's phylogenetic diversity (PDFaith), which represents the sum of the branch lengths of the phylogenetic tree linking all species of a particular assemblage, the mean pairwise distance between all species in an assemblage (MPD) and the pairwise distance between the closest relatives in an assemblage (MNTD). Comparisons between studies using one or several of these metrics are difficult because there has been no comprehensive evaluation of the phylogenetic properties each metric captures. In particular it is unknown how PDFaith relates to MDP and MNTD. Consequently, it is possible that apparently opposing patterns in different studies might simply reflect differences in metric properties. Here, we aim to fill this gap by comparing these metrics using simulations and empirical data. We first used simulation experiments to test the influence of community structure and size on the mismatch between metrics whilst varying the shape and size of the phylogenetic tree of the species pool. Second we investigated the mismatch between metrics for two empirical datasets (gut microbes and global carnivoran assemblages). We show that MNTD and PDFaith provide similar information on phylogenetic structure, and respond similarly to variation in species richness and assemblage structure. However, MPD demonstrate a very different behaviour, and is highly sensitive to deep branching structure. We suggest that by combining complementary metrics that are sensitive to processes operating at different phylogenetic depths (i.e. MPD and MNTD or PDFaith) we can obtain a better understanding of assemblage structure.

  7. Molecular phylogenetics and systematics of the bivalve family Ostreidae based on rRNA sequence-structure models and multilocus species tree.

    PubMed

    Salvi, Daniele; Macali, Armando; Mariottini, Paolo

    2014-01-01

    The bivalve family Ostreidae has a worldwide distribution and includes species of high economic importance. Phylogenetics and systematic of oysters based on morphology have proved difficult because of their high phenotypic plasticity. In this study we explore the phylogenetic information of the DNA sequence and secondary structure of the nuclear, fast-evolving, ITS2 rRNA and the mitochondrial 16S rRNA genes from the Ostreidae and we implemented a multi-locus framework based on four loci for oyster phylogenetics and systematics. Sequence-structure rRNA models aid sequence alignment and improved accuracy and nodal support of phylogenetic trees. In agreement with previous molecular studies, our phylogenetic results indicate that none of the currently recognized subfamilies, Crassostreinae, Ostreinae, and Lophinae, is monophyletic. Single gene trees based on Maximum likelihood (ML) and Bayesian (BA) methods and on sequence-structure ML were congruent with multilocus trees based on a concatenated (ML and BA) and coalescent based (BA) approaches and consistently supported three main clades: (i) Crassostrea, (ii) Saccostrea, and (iii) an Ostreinae-Lophinae lineage. Therefore, the subfamily Crassostreinae (including Crassostrea), Saccostreinae subfam. nov. (including Saccostrea and tentatively Striostrea) and Ostreinae (including Ostreinae and Lophinae taxa) are recognized [corrected]. Based on phylogenetic and biogeographical evidence the Asian species of Crassostrea from the Pacific Ocean are assigned to Magallana gen. nov., whereas an integrative taxonomic revision is required for the genera Ostrea and Dendostrea. This study pointed out the suitability of the ITS2 marker for DNA barcoding of oyster and the relevance of using sequence-structure rRNA models and features of the ITS2 folding in molecular phylogenetics and taxonomy. The multilocus approach allowed inferring a robust phylogeny of Ostreidae providing a broad molecular perspective on their systematics.

  8. Molecular Phylogenetics and Systematics of the Bivalve Family Ostreidae Based on rRNA Sequence-Structure Models and Multilocus Species Tree

    PubMed Central

    Salvi, Daniele; Macali, Armando; Mariottini, Paolo

    2014-01-01

    The bivalve family Ostreidae has a worldwide distribution and includes species of high economic importance. Phylogenetics and systematic of oysters based on morphology have proved difficult because of their high phenotypic plasticity. In this study we explore the phylogenetic information of the DNA sequence and secondary structure of the nuclear, fast-evolving, ITS2 rRNA and the mitochondrial 16S rRNA genes from the Ostreidae and we implemented a multi-locus framework based on four loci for oyster phylogenetics and systematics. Sequence-structure rRNA models aid sequence alignment and improved accuracy and nodal support of phylogenetic trees. In agreement with previous molecular studies, our phylogenetic results indicate that none of the currently recognized subfamilies, Crassostreinae, Ostreinae, and Lophinae, is monophyletic. Single gene trees based on Maximum likelihood (ML) and Bayesian (BA) methods and on sequence-structure ML were congruent with multilocus trees based on a concatenated (ML and BA) and coalescent based (BA) approaches and consistently supported three main clades: (i) Crassostrea, (ii) Saccostrea, and (iii) an Ostreinae-Lophinae lineage. Therefore, the subfamily Crassotreinae (including Crassostrea), Saccostreinae subfam. nov. (including Saccostrea and tentatively Striostrea) and Ostreinae (including Ostreinae and Lophinae taxa) are recognized. Based on phylogenetic and biogeographical evidence the Asian species of Crassostrea from the Pacific Ocean are assigned to Magallana gen. nov., whereas an integrative taxonomic revision is required for the genera Ostrea and Dendostrea. This study pointed out the suitability of the ITS2 marker for DNA barcoding of oyster and the relevance of using sequence-structure rRNA models and features of the ITS2 folding in molecular phylogenetics and taxonomy. The multilocus approach allowed inferring a robust phylogeny of Ostreidae providing a broad molecular perspective on their systematics. PMID:25250663

  9. Genetic Distances and Reconstruction of Phylogenetic Trees from Microsatellite DNA

    PubMed Central

    Takezaki, N.; Nei, M.

    1996-01-01

    Recently many investigators have used microsatellite DNA loci for studying the evolutionary relationships of closely related populations or species, and some authors proposed new genetic distance measures for this purpose. However, the efficiencies of these distance measures in obtaining the correct tree topology remains unclear. We therefore investigated the probability of obtaining the correct topology (P(C)) for these new distances as well as traditional distance measures by using computer simulation. We used both the infinite-allele model (IAM) and the stepwise mutation model (SMM), which seem to be appropriate for classical markers and microsatellite loci, respectively. The results show that in both the IAM and SMM CAVALLI-SFORZA and EDWARDS' chord distance (D(C)) and NEI et al.'s D(A) distance generally show higher P(C) values than other distance measures, whether the bottleneck effect exists or not. For estimating evolutionary times, however, NEI's standard distance and GOLDSTEIN et al.'s (δ μ)(2) are more appropriate than other distances. Microsatellite DNA seems to be very useful for clarifying the evolutionary relationships of closely related populations. PMID:8878702

  10. AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.

    PubMed

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.

  11. AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

    PubMed Central

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. PMID:24892935

  12. Characterizing the phylogenetic tree community structure of a protected tropical rain forest area in Cameroon.

    PubMed

    Manel, Stéphanie; Couvreur, Thomas L P; Munoz, François; Couteron, Pierre; Hardy, Olivier J; Sonké, Bonaventure

    2014-01-01

    Tropical rain forests, the richest terrestrial ecosystems in biodiversity on Earth are highly threatened by global changes. This paper aims to infer the mechanisms governing species tree assemblages by characterizing the phylogenetic structure of a tropical rain forest in a protected area of the Congo Basin, the Dja Faunal Reserve (Cameroon). We re-analyzed a dataset of 11538 individuals belonging to 372 taxa found along nine transects spanning five habitat types. We generated a dated phylogenetic tree including all sampled taxa to partition the phylogenetic diversity of the nine transects into alpha and beta components at the level of the transects and of the habitat types. The variation in phylogenetic composition among transects did not deviate from a random pattern at the scale of the Dja Faunal Reserve, probably due to a common history and weak environmental variation across the park. This lack of phylogenetic structure combined with an isolation-by-distance pattern of taxonomic diversity suggests that neutral dispersal limitation is a major driver of community assembly in the Dja. To assess any lack of sensitivity to the variation in habitat types, we restricted the analyses of transects to the terra firme primary forest and found results consistent with those of the whole dataset at the level of the transects. Additionally to previous analyses, we detected a weak but significant phylogenetic turnover among habitat types, suggesting that species sort in varying environments, even though it is not predominating on the overall phylogenetic structure. Finer analyses of clades indicated a signal of clustering for species from the Annonaceae family, while species from the Apocynaceae family indicated overdispersion. These results can contribute to the conservation of the park by improving our understanding of the processes dictating community assembly in these hyperdiverse but threatened regions of the world.

  13. Local-scale Partitioning of Functional and Phylogenetic Beta Diversity in a Tropical Tree Assemblage.

    PubMed

    Yang, Jie; Swenson, Nathan G; Zhang, Guocheng; Ci, Xiuqin; Cao, Min; Sha, Liqing; Li, Jie; Ferry Slik, J W; Lin, Luxiang

    2015-08-03

    The relative degree to which stochastic and deterministic processes underpin community assembly is a central problem in ecology. Quantifying local-scale phylogenetic and functional beta diversity may shed new light on this problem. We used species distribution, soil, trait and phylogenetic data to quantify whether environmental distance, geographic distance or their combination are the strongest predictors of phylogenetic and functional beta diversity on local scales in a 20-ha tropical seasonal rainforest dynamics plot in southwest China. The patterns of phylogenetic and functional beta diversity were generally consistent. The phylogenetic and functional dissimilarity between subplots (10 × 10 m, 20 × 20 m, 50 × 50 m and 100 × 100 m) was often higher than that expected by chance. The turnover of lineages and species function within habitats was generally slower than that across habitats. Partitioning the variation in phylogenetic and functional beta diversity showed that environmental distance was generally a better predictor of beta diversity than geographic distance thereby lending relatively more support for deterministic environmental filtering over stochastic processes. Overall, our results highlight that deterministic processes play a stronger role than stochastic processes in structuring community composition in this diverse assemblage of tropical trees.

  14. Lack of phylogenetic signals within environmental niches of tropical tree species across life stages

    PubMed Central

    Zhang, Caicai; Yang, Jie; Sha, Liqing; Ci, Xiuqin; Li, Jie; Cao, Min; Brown, Calum; Swenson, Nathan G.; Lin, Luxiang

    2017-01-01

    The lasting imprint of phylogenetic history on current day ecological patterns has long intrigued biologists. Over the past decade ecologists have increasingly sought to quantify phylogenetic signals in environmental niche preferences and, especially, traits to help uncover the mechanisms driving plant community assembly. However, relatively little is known about how phylogenetic patterns in environmental niches and traits compare, leaving significant uncertainty about the ecological implications of trait-based analyses. We examined phylogenetic signals within known environmental niches of 64 species, at seedling and adult life stages, in a Chinese tropical forest, to test whether local environmental niches had consistent relationships with phylogenies. Our analyses show that local environmental niches are highly phylogenetically labile for both seedlings and adult trees, with closely related species occupying niches that are no more similar than expected by random chance. These findings contrast with previous trait-based studies in the same forest, suggesting that phylogenetic signals in traits might not a reliable guide to niche preferences or, therefore, to community assembly processes in some ecosystems, like the tropical seasonal rainforest in this study. PMID:28181524

  15. Phylogenetic Stability, Tree Shape, and Character Compatibility: A Case Study Using Early Tetrapods.

    PubMed

    Bernardi, Massimo; Angielczyk, Kenneth D; Mitchell, Jonathan S; Ruta, Marcello

    2016-09-01

    Phylogenetic tree shape varies as the evolutionary processes affecting a clade change over time. In this study, we examined an empirical phylogeny of fossil tetrapods during several time intervals, and studied how temporal constraints manifested in patterns of tree imbalance and character change. The results indicate that the impact of temporal constraints on tree shape is minimal and highlights the stability through time of the reference tetrapod phylogeny. Unexpected values of imbalance for Mississippian and Pennsylvanian time slices strongly support the hypothesis that the Carboniferous was a period of explosive tetrapod radiation. Several significant diversification shifts take place in the Mississippian and underpin increased terrestrialization among the earliest limbed vertebrates. Character incompatibility is relatively high at the beginning of tetrapod history, but quickly decreases to a relatively stable lower level, relative to a null distribution based on constant rates of character change. This implies that basal tetrapods had high, but declining, rates of homoplasy early in their evolutionary history, although the origin of Lissamphibia is an exception to this trend. The time slice approach is a powerful method of phylogenetic analysis and a useful tool for assessing the impact of combining extinct and extant taxa in phylogenetic analyses of large and speciose clades.

  16. Chemical classification of cattle. 2. Phylogenetic tree and specific status of the Zebu.

    PubMed

    Manwell, C; Baker, C M

    1980-01-01

    Phylogenetic trees for the ten major breed groups of cattle were constructed by Farris's (1972) maximum parsimony method, or Fitch & Margoliash's (1967) method, which averages ou the deviation over the entire assemblage. Both techniques yield essentially identical trees. The phylogenetic tree for the ten major cattle breed groups can be superimposed on a map of Europe and western Asia, the root of the tree being close to the 'fertile crescent' in Asia Minor, believed to be a primary centre of bovine domestication. For some but not all protein variants there is a cline of gene frequencies as one proceeds from the British Isles and northwest Europe towards southeast Europe and Asia Minor, with the most extreme gene frequencies in the Zebu breeds of India. It is not clear to what extent the observed clines are primary or secondary, i.e., consequent to the initial migrations of cattle towards the end of the Pleistocene or consequent to the many migrations of man with his domesticated cattle. Such clines as exist are not in themselves sufficient to prove either selection versus genetic drift or to establish taxonomic ranking. Contrary to some suggestions in the literature, the biochemical evidence supports Linnaeus's original conclusions: Bos taurus and Bos indicus are distinct species.

  17. The algebra of the general Markov model on phylogenetic trees and networks.

    PubMed

    Sumner, J G; Holland, B R; Jarvis, P D

    2012-04-01

    It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.

  18. Phylogenetically diverse AM fungi from Ecuador strongly improve seedling growth of native potential crop trees.

    PubMed

    Schüßler, Arthur; Krüger, Claudia; Urgiles, Narcisa

    2016-04-01

    In many deforested regions of the tropics, afforestation with native tree species could valorize a growing reservoir of degraded, previously overused and abandoned land. The inoculation of tropical tree seedlings with arbuscular mycorrhizal fungi (AM fungi) can improve tree growth and viability, but efficiency may depend on plant and AM fungal genotype. To study such effects, seven phylogenetically diverse AM fungi, native to Ecuador, from seven genera and a non-native AM fungus (Rhizophagus irregularis DAOM197198) were used to inoculate the tropical potential crop tree (PCT) species Handroanthus chrysanthus (synonym Tabebuia chrysantha), Cedrela montana, and Heliocarpus americanus. Twenty-four plant-fungus combinations were studied in five different fertilization and AMF inoculation treatments. Numerous plant growth parameters and mycorrhizal root colonization were assessed. The inoculation with any of the tested AM fungi improved seedling growth significantly and in most cases reduced plant mortality. Plants produced up to threefold higher biomass, when compared to the standard nursery practice. AM fungal inoculation alone or in combination with low fertilization both outperformed full fertilization in terms of plant growth promotion. Interestingly, root colonization levels for individual fungi strongly depended on the host tree species, but surprisingly the colonization strength did not correlate with plant growth promotion. The combination of AM fungal inoculation with a low dosage of slow release fertilizer improved PCT seedling performance strongest, but also AM fungal treatments without any fertilization were highly efficient. The AM fungi tested are promising candidates to improve management practices in tropical tree seedling production.

  19. Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence d...

  20. Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set.

    PubMed

    Hall, Matthew; Woolhouse, Mark; Rambaut, Andrew

    2015-12-01

    The use of genetic data to reconstruct the transmission tree of infectious disease epidemics and outbreaks has been the subject of an increasing number of studies, but previous approaches have usually either made assumptions that are not fully compatible with phylogenetic inference, or, where they have based inference on a phylogeny, have employed a procedure that requires this tree to be fixed. At the same time, the coalescent-based models of the pathogen population that are employed in the methods usually used for time-resolved phylogeny reconstruction are a considerable simplification of epidemic process, as they assume that pathogen lineages mix freely. Here, we contribute a new method that is simultaneously a phylogeny reconstruction method for isolates taken from an epidemic, and a procedure for transmission tree reconstruction. We observe that, if one or more samples is taken from each host in an epidemic or outbreak and these are used to build a phylogeny, a transmission tree is equivalent to a partition of the set of nodes of this phylogeny, such that each partition element is a set of nodes that is connected in the full tree and contains all the tips corresponding to samples taken from one and only one host. We then implement a Monte Carlo Markov Chain (MCMC) procedure for simultaneous sampling from the spaces of both trees, utilising a newly-designed set of phylogenetic tree proposals that also respect node partitions. We calculate the posterior probability of these partitioned trees based on a model that acknowledges the population structure of an epidemic by employing an individual-based disease transmission model and a coalescent process taking place within each host. We demonstrate our method, first using simulated data, and then with sequences taken from the H7N7 avian influenza outbreak that occurred in the Netherlands in 2003. We show that it is superior to established coalescent methods for reconstructing the topology and node heights of the

  1. Algorithms for efficient near-perfect phylogenetic tree reconstruction in theory and practice.

    PubMed

    Sridhar, Srinath; Dhamdhere, Kedar; Blelloch, Guy; Halperin, Eran; Ravi, R; Schwartz, Russell

    2007-01-01

    We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.

  2. Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data.

    PubMed

    Waddell, Peter J; Kishino, Hirohisa; Ota, Rissa

    2002-01-01

    Evolutionary trees sit at the core of all realistic models describing a set of related sequences, including alignment, homology search, ancestral protein reconstruction and 2D/3D structural change. It is important to assess the stochastic error when estimating a tree, including models using the most realistic likelihood-based optimizations, yet computation times may be many days or weeks. If so, the bootstrap is computationally prohibitive. Here we show that the extremely fast "resampling of estimated log likelihoods" or RELL method behaves well under more general circumstances than previously examined. RELL approximates the bootstrap (BP) proportions of trees better that some bootstrap methods that rely on fast heuristics to search the tree space. The BIC approximation of the Bayesian posterior probability (BPP) of trees is made more accurate by including an additional term related to the determinant of the information matrix (which may also be obtained as a product of gradient or score vectors). Such estimates are shown to be very close to MCMC chain values. Our analysis of mammalian mitochondrial amino acid sequences suggest that when model breakdown occurs, as it typically does for sequences separated by more than a few million years, the BPP values are far too peaked and the real fluctuations in the likelihood of the data are many times larger than expected. Accordingly, several ways to incorporate the bootstrap and other types of direct resampling with MCMC procedures are outlined. Genes evolve by a process which involves some sites following a tree close to, but not identical with, the species tree. It is seen that under such a likelihood model BP (bootstrap proportions) and BPP estimates may still be reasonable estimates of the species tree. Since many of the methods studied are very fast computationally, there is no reason to ignore stochastic error even with the slowest ML or likelihood based methods.

  3. SICLE: a high-throughput tool for extracting evolutionary relationships from phylogenetic trees

    PubMed Central

    Wisecaver, Jennifer H.

    2016-01-01

    We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/. PMID:27635331

  4. Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

    PubMed

    Poon, Art F Y; Walker, Lorne W; Murray, Heather; McCloskey, Rosemary M; Harrigan, P Richard; Liang, Richard H

    2013-01-01

    A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.

  5. Molecular phylogenetic trees - On the validity of the Goodman-Moore augmentation algorithm

    NASA Technical Reports Server (NTRS)

    Holmquist, R.

    1979-01-01

    A response is made to the reply of Nei and Tateno (1979) to the letter of Holmquist (1978) supporting the validity of the augmentation algorithm of Moore (1977) in reconstructions of nucleotide substitutions by means of the maximum parsimony principle. It is argued that the overestimation of the augmented numbers of nucleotide substitutions (augmented distances) found by Tateno and Nei (1978) is due to an unrepresentative data sample and that it is only necessary that evolution be stochastically uniform in different regions of the phylogenetic network for the augmentation method to be useful. The importance of the average value of the true distance over all links is explained, and the relative variances of the true and augmented distances are calculated to be almost identical. The effects of topological changes in the phylogenetic tree on the augmented distance and the question of the correctness of ancestral sequences inferred by the method of parsimony are also clarified.

  6. Trends over time in tree and seedling phylogenetic diversity indicate regional differences in forest biodiversity change.

    PubMed

    Potter, Kevin M; Woodall, Christopher W

    2012-03-01

    Changing climate conditions may impact the short-term ability of forest tree species to regenerate in many locations. In the longer term, tree species may be unable to persist in some locations while they become established in new places. Over both time frames, forest tree biodiversity may change in unexpected ways. Using repeated inventory measurements five years apart from more than 7000 forested plots in the eastern United States, we tested three hypotheses: phylogenetic diversity is substantially different from species richness as a measure of biodiversity; forest communities have undergone recent changes in phylogenetic diversity that differ by size class, region, and seed dispersal strategy; and these patterns are consistent with expected early effects of climate change. Specifically, the magnitude of diversity change across broad regions should be greater among seedlings than in trees, should be associated with latitude and elevation, and should be greater among species with high dispersal capacity. Our analyses demonstrated that phylogenetic diversity and species richness are decoupled at small and medium scales and are imperfectly associated at large scales. This suggests that it is appropriate to apply indicators of biodiversity change based on phylogenetic diversity, which account for evolutionary relationships among species and may better represent community functional diversity. Our results also detected broadscale patterns of forest biodiversity change that are consistent with expected early effects of climate change. First, the statistically significant increase over time in seedling diversity in the South suggests that conditions there have become more favorable for the reproduction and dispersal of a wider variety of species, whereas the significant decrease in northern seedling diversity indicates that northern conditions have become less favorable. Second, we found weak correlations between seedling diversity change and latitude in both zones

  7. primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies.

    PubMed

    Contreras-Moreira, Bruno; Sachman-Ruiz, Bernardo; Figueroa-Palacios, Iraís; Vinuesa, Pablo

    2009-07-01

    Primers4clades is an easy-to-use web server that implements a fully automatic PCR primer design pipeline for cross-species amplification of novel sequences from metagenomic DNA, or from uncharacterized organisms, belonging to user-specified phylogenetic clades or taxa. The server takes a set of non-aligned protein coding genes, with or without introns, aligns them and computes a neighbor-joining tree, which is displayed on screen for easy selection of species or sequence clusters to design lineage-specific PCR primers. Primers4clades implements an extended CODEHOP primer design strategy based on both DNA and protein multiple sequence alignments. It evaluates several thermodynamic properties of the oligonucleotide pairs, and computes the phylogenetic information content of the predicted amplicon sets from Shimodaira-Hasegawa-like branch support values of maximum likelihood phylogenies. A non-redundant set of primer formulations is returned, ranked according to their thermodynamic properties. An amplicon distribution map provides a convenient overview of the coverage of the target locus. Altogether these features greatly help the user in making an informed choice between alternative primer pair formulations. Primers4clades is available at two mirror sites: http://maya.ccg.unam.mx/primers4clades/and http://floresta.eead.csic.es/primers4clades/. Three demo data sets and a comprehensive documentation/tutorial page are provided for easy testing of the server's capabilities and interface.

  8. Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data

    PubMed Central

    Tanner, Alastair R.; Fleming, James F.; Tarver, James E.; Pisani, Davide

    2017-01-01

    Morphological data provide the only means of classifying the majority of life's history, but the choice between competing phylogenetic methods for the analysis of morphology is unclear. Traditionally, parsimony methods have been favoured but recent studies have shown that these approaches are less accurate than the Bayesian implementation of the Mk model. Here we expand on these findings in several ways: we assess the impact of tree shape and maximum-likelihood estimation using the Mk model, as well as analysing data composed of both binary and multistate characters. We find that all methods struggle to correctly resolve deep clades within asymmetric trees, and when analysing small character matrices. The Bayesian Mk model is the most accurate method for estimating topology, but with lower resolution than other methods. Equal weights parsimony is more accurate than implied weights parsimony, and maximum-likelihood estimation using the Mk model is the least accurate method. We conclude that the Bayesian implementation of the Mk model should be the default method for phylogenetic estimation from phenotype datasets, and we explore the implications of our simulations in reanalysing several empirical morphological character matrices. A consequence of our finding is that high levels of resolution or the ability to classify species or groups with much confidence should not be expected when using small datasets. It is now necessary to depart from the traditional parsimony paradigms of constructing character matrices, towards datasets constructed explicitly for Bayesian methods. PMID:28077778

  9. Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information.

    PubMed

    Merkle, Daniel; Middendorf, Martin

    2005-04-01

    In this paper, we present a method and a corresponding tool called Tarzan for cophylogeny analysis of phylogenetic trees where the nodes are labelled with divergence timing information. The tool can be used for example to infer the common history of hosts and their parasites, of insect-plant relations or symbiotic relationships. Our method does the reconciliation analysis using an event-based concept where each event is assigned a cost and cost minimal solutions are sought. The events that are used by Tarzan are cospeciations, sortings, duplications, and (host) switches. Different from existing tools, Tarzan can handle more complex timing information of the phylogenetic trees for the analysis. This is important because several recent studies of cophylogenetic relationship have shown that timing information can be very important for the correct interpretation of results from cophylogenetic analysis. We present two examples (one host-parasite system and one insect-plant system) that show how divergence timing information can be integrated into reconciliation analysis and how this influences the results.

  10. Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data.

    PubMed

    Puttick, Mark N; O'Reilly, Joseph E; Tanner, Alastair R; Fleming, James F; Clark, James; Holloway, Lucy; Lozano-Fernandez, Jesus; Parry, Luke A; Tarver, James E; Pisani, Davide; Donoghue, Philip C J

    2017-01-11

    Morphological data provide the only means of classifying the majority of life's history, but the choice between competing phylogenetic methods for the analysis of morphology is unclear. Traditionally, parsimony methods have been favoured but recent studies have shown that these approaches are less accurate than the Bayesian implementation of the Mk model. Here we expand on these findings in several ways: we assess the impact of tree shape and maximum-likelihood estimation using the Mk model, as well as analysing data composed of both binary and multistate characters. We find that all methods struggle to correctly resolve deep clades within asymmetric trees, and when analysing small character matrices. The Bayesian Mk model is the most accurate method for estimating topology, but with lower resolution than other methods. Equal weights parsimony is more accurate than implied weights parsimony, and maximum-likelihood estimation using the Mk model is the least accurate method. We conclude that the Bayesian implementation of the Mk model should be the default method for phylogenetic estimation from phenotype datasets, and we explore the implications of our simulations in reanalysing several empirical morphological character matrices. A consequence of our finding is that high levels of resolution or the ability to classify species or groups with much confidence should not be expected when using small datasets. It is now necessary to depart from the traditional parsimony paradigms of constructing character matrices, towards datasets constructed explicitly for Bayesian methods.

  11. Minimizing the Average Distance to a Closest Leaf in a Phylogenetic Tree

    PubMed Central

    Matsen, Frederick A.; Gallagher, Aaron; McCoy, Connor O.

    2013-01-01

    When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized “reference sequences” to a smaller subset that is as close as possible on average to a collection of “query sequences” of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting “reference tree” sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software. [Mass transport; phylogenetic diversity; sequence selection.] PMID:23843314

  12. Conserving the functional and phylogenetic trees of life of European tetrapods

    PubMed Central

    Thuiller, Wilfried; Maiorano, Luigi; Mazel, Florent; Guilhaumon, François; Ficetola, Gentile Francesco; Lavergne, Sébastien; Renaud, Julien; Roquet, Cristina; Mouillot, David

    2015-01-01

    Protected areas (PAs) are pivotal tools for biodiversity conservation on the Earth. Europe has had an extensive protection system since Natura 2000 areas were created in parallel with traditional parks and reserves. However, the extent to which this system covers not only taxonomic diversity but also other biodiversity facets, such as evolutionary history and functional diversity, has never been evaluated. Using high-resolution distribution data of all European tetrapods together with dated molecular phylogenies and detailed trait information, we first tested whether the existing European protection system effectively covers all species and in particular, those with the highest evolutionary or functional distinctiveness. We then tested the ability of PAs to protect the entire tetrapod phylogenetic and functional trees of life by mapping species' target achievements along the internal branches of these two trees. We found that the current system is adequately representative in terms of the evolutionary history of amphibians while it fails for the rest. However, the most functionally distinct species were better represented than they would be under random conservation efforts. These results imply better protection of the tetrapod functional tree of life, which could help to ensure long-term functioning of the ecosystem, potentially at the expense of conserving evolutionary history. PMID:25561666

  13. TreSpEx—Detection of Misleading Signal in Phylogenetic Reconstructions Based on Tree Information

    PubMed Central

    Struck, Torsten H

    2014-01-01

    Phylogenies of species or genes are commonplace nowadays in many areas of comparative biological studies. However, for phylogenetic reconstructions one must refer to artificial signals such as paralogy, long-branch attraction, saturation, or conflict between different datasets. These signals might eventually mislead the reconstruction even in phylogenomic studies employing hundreds of genes. Unfortunately, there has been no program allowing the detection of such effects in combination with an implementation into automatic process pipelines. TreSpEx (Tree Space Explorer) now combines different approaches (including statistical tests), which utilize tree-based information like nodal support or patristic distances (PDs) to identify misleading signals. The program enables the parallel analysis of hundreds of trees and/or predefined gene partitions, and being command-line driven, it can be integrated into automatic process pipelines. TreSpEx is implemented in Perl and supported on Linux, Mac OS X, and MS Windows. Source code, binaries, and additional material are freely available at http://www.annelida.de/research/bioinformatics/software.html. PMID:24701118

  14. Regional and phylogenetic variation of wood density across 2456 Neotropical tree species.

    PubMed

    Chave, Jérôme; Muller-Landau, Helene C; Baker, Timothy R; Easdale, Tomás A; ter Steege, Hans; Webb, Campbell O

    2006-12-01

    Wood density is a crucial variable in carbon accounting programs of both secondary and old-growth tropical forests. It also is the best single descriptor of wood: it correlates with numerous morphological, mechanical, physiological, and ecological properties. To explore the extent to which wood density could be estimated for rare or poorly censused taxa, and possible sources of variation in this trait, we analyzed regional, taxonomic, and phylogenetic variation in wood density among 2456 tree species from Central and South America. Wood density varied over more than one order of magnitude across species, with an overall mean of 0.645 g/cm3. Our geographical analysis showed significant decreases in wood density with increasing altitude and significant differences among low-altitude geographical regions: wet forests of Central America and western Amazonia have significantly lower mean wood density than dry forests of Central and South America, eastern and central Amazonian forests, and the Atlantic forests of Brazil; and eastern Amazonian forests have lower wood densities than the dry forests and the Atlantic forest. A nested analysis of variance showed that 74% of the species-level wood density variation was explained at the genus level, 34% at the Angiosperm Phylogeny Group (APG) family level, and 19% at the APG order level. This indicates that genus-level means give reliable approximations of values of species, except in a few hypervariable genera. We also studied which evolutionary shifts in wood density occurred in the phylogeny of seed plants using a composite phylogenetic tree. Major changes were observed at deep nodes (Eurosid 1), and also in more recent divergences (for instance in the Rhamnoids, Simaroubaceae, and Anacardiaceae). Our unprecedented wood density data set yields consistent guidelines for estimating wood densities when species-level information is lacking and should significantly reduce error in Central and South American carbon accounting

  15. Species-time-area and phylogenetic-time-area relationships in tropical tree communities

    PubMed Central

    Swenson, Nathan G; Mi, Xiangcheng; Kress, W John; Thompson, Jill; Uriarte, María; Zimmerman, Jess K

    2013-01-01

    The species-area relationship (SAR) has proven to be one of the few strong generalities in ecology. The temporal analog of the SAR, the species-time relationship (STR), has received considerably less attention. Recent work primarily from the temperate zone has aimed to merge the SAR and the STR into a synthetic and unified species-time-area relationship (STAR) as originally envisioned by Preston (1960). Here we test this framework using two tropical tree communities and extend it by deriving a phylogenetic-time-area relationship (PTAR). The work finds some support for Preston's prediction that diversity-time relationships, both species and phylogenetic, are sensitive to the spatial scale of the sampling. Contrary to the Preston's predictions we find a decoupling of diversity-area and diversity-time relationships in both forests as the time period used to quantify the diversity-area relationship changes. In particular, diversity-area and diversity-time relationships are positively correlated using the initial census to quantify the diversity-area relationship, but weakly or even negatively correlated when using the most recent census. Thus, diversity-area relationships could forecast the temporal accumulation of biodiversity of the forests, but they failed to “back-cast” the temporal accumulation of biodiversity suggesting a decoupling of space and time. PMID:23762505

  16. Species-time-area and phylogenetic-time-area relationships in tropical tree communities.

    PubMed

    Swenson, Nathan G; Mi, Xiangcheng; Kress, W John; Thompson, Jill; Uriarte, María; Zimmerman, Jess K

    2013-05-01

    The species-area relationship (SAR) has proven to be one of the few strong generalities in ecology. The temporal analog of the SAR, the species-time relationship (STR), has received considerably less attention. Recent work primarily from the temperate zone has aimed to merge the SAR and the STR into a synthetic and unified species-time-area relationship (STAR) as originally envisioned by Preston (1960). Here we test this framework using two tropical tree communities and extend it by deriving a phylogenetic-time-area relationship (PTAR). The work finds some support for Preston's prediction that diversity-time relationships, both species and phylogenetic, are sensitive to the spatial scale of the sampling. Contrary to the Preston's predictions we find a decoupling of diversity-area and diversity-time relationships in both forests as the time period used to quantify the diversity-area relationship changes. In particular, diversity-area and diversity-time relationships are positively correlated using the initial census to quantify the diversity-area relationship, but weakly or even negatively correlated when using the most recent census. Thus, diversity-area relationships could forecast the temporal accumulation of biodiversity of the forests, but they failed to "back-cast" the temporal accumulation of biodiversity suggesting a decoupling of space and time.

  17. Comparison of methods for rooting phylogenetic trees: a case study using Orcuttieae (Poaceae: Chloridoideae).

    PubMed

    Boykin, Laura M; Kubatko, Laura Salter; Lowrey, Timothy K

    2010-03-01

    DNA sequence data (cpDNA trnL intron and nrDNA ITS1 and ITS2) were analyzed to identify relationships within Orcuttieae, a small tribe of endangered grasses endemic to vernal pools in California and Baja California. The tribe includes three genera: Orcuttia, Tuctoria, and Neostapfia. All three genera carry out C(4) photosynthesis but aquatic taxa of Orcuttia lack Kranz anatomy. The unusual habitat preference of the tribe is coupled with the atypical development of C(4) photosynthesis without Kranz anatomy. Furthermore, the tribe has no known close relatives and has been noted to be phylogenetically isolated within the subfamily Chloridoideae. In this study we examine the problem of inferring the root of the tribe in the absence of an identified outgroup, analyze the phylogenetic relationships of the constituent taxa, and evaluate the evolutionary development of C(4) photosynthesis. We compare four methods for inferring the root of the tree: (1) the outgroup method, (2) midpoint rooting, the imposition of a molecular clock for both (3) maximum likelihood (ML) and (4) Bayesian analysis. We examine the consequences of each method for the inferred phylogenetic relationships. Three of the methods (outgroup rooting and the ML and Bayesian molecular clock analyses) suggest that the root of Orcuttieae is between Neostapfia and the Tuctoria/Orcuttia lineage, while midpoint rooting gives a different root. The Bayesian method additionally provides information about probabilities associated with other possible root locations. Assuming that the true root of Orcuttieae is between Neostapfia and the Tuctoria/Orcuttia lineage, our data indicate Neostapfia and Orcuttia are both monophyletic, while Tuctoria is paraphyletic (with no synapomorphies in either dataset) and forming a grade between the other two genera and needs taxonomic revision. Our data support the hypothesis that Orcuttieae was derived from a terrestrial ancestor and evolved specializations to an aquatic environment

  18. Supermatrix and species tree methods resolve phylogenetic relationships within the big cats, Panthera (Carnivora: Felidae).

    PubMed

    Davis, Brian W; Li, Gang; Murphy, William J

    2010-07-01

    The pantherine lineage of cats diverged from the remainder of modern Felidae less than 11 million years ago and consists of the five big cats of the genus Panthera, the lion, tiger, jaguar, leopard, and snow leopard, as well as the closely related clouded leopard. A significant problem exists with respect to the precise phylogeny of these highly threatened great cats. Despite multiple publications on the subject, no two molecular studies have reconstructed Panthera with the same topology. These evolutionary relationships remain unresolved partially due to the recent and rapid radiation of pantherines in the Pliocene, individual speciation events occurring within less than 1 million years, and probable introgression between lineages following their divergence. We provide an alternative, highly supported interpretation of the evolutionary history of the pantherine lineage using novel and published DNA sequence data from the autosomes, both sex chromosomes and the mitochondrial genome. New sequences were generated for 39 single-copy regions of the felid Y chromosome, as well as four mitochondrial and four autosomal gene segments, totaling 28.7 kb. Phylogenetic analysis of these new data, combined with all published data in GenBank, highlighted the prevalence of phylogenetic disparities stemming either from the amplification of a mitochondrial to nuclear translocation event (numt), or errors in species identification. Our 47.6 kb combined dataset was analyzed as a supermatrix and with respect to individual partitions using maximum likelihood and Bayesian phylogenetic inference, in conjunction with Bayesian Estimation of Species Trees (BEST) which accounts for heterogeneous gene histories. Our results yield a robust consensus topology supporting the monophyly of lion and leopard, with jaguar sister to these species, as well as a sister species relationship of tiger and snow leopard. These results highlight new avenues for the study of speciation genomics and

  19. Not Seeing the Forest for the Trees: Size of the Minimum Spanning Trees (MSTs) Forest and Branch Significance in MST-Based Phylogenetic Analysis

    PubMed Central

    Teixeira, Andreia Sofia; Monteiro, Pedro T.; Carriço, João A; Ramirez, Mário; Francisco, Alexandre P.

    2015-01-01

    Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in such trees. In this paper, we address this problem for MSTs by introducing a new edge betweenness metric for undirected and weighted graphs. This spanning edge betweenness metric is defined as the fraction of equivalent MSTs where a given edge is present. The metric provides a per edge statistic that is similar to that of the bootstrap approach frequently used in phylogenetics to support the grouping of taxa. We provide methods for the exact computation of this metric based on the well known Kirchhoff’s matrix tree theorem. Moreover, we implement and make available a module for the PHYLOViZ software and evaluate the proposed metric concerning both effectiveness and computational performance. Analysis of trees generated using multilocus sequence typing data (MLST) and the goeBURST algorithm revealed that the space of possible MSTs in real data sets is extremely large. Selection of the edge to be represented using bootstrap could lead to unreliable results since alternative edges are present in the same fraction of equivalent MSTs. The choice of the MST to be presented, results from criteria implemented in the algorithm that must be based in biologically plausible models. PMID:25799056

  20. Bioinformatics analysis and construction of phylogenetic tree of aquaporins from Echinococcus granulosus.

    PubMed

    Wang, Fen; Ye, Bin

    2016-09-01

    Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus.

  1. Wood nitrogen concentrations in tropical trees: phylogenetic patterns and ecological correlates.

    PubMed

    Martin, Adam R; Erickson, David L; Kress, W John; Thomas, Sean C

    2014-11-01

    In tropical and temperate trees, wood chemical traits are hypothesized to covary with species' life-history strategy along a 'wood economics spectrum' (WES), but evidence supporting these expected patterns remains scarce. Due to its role in nutrient storage, we hypothesize that wood nitrogen (N) concentration will covary along the WES, being higher in slow-growing species with high wood density (WD), and lower in fast-growing species with low WD. In order to test this hypothesis we quantified wood N concentrations in 59 Panamanian hardwood species, and used this dataset to examine ecological correlates and phylogenetic patterns of wood N. Wood N varied > 14-fold among species between 0.04 and 0.59%; closely related species were more similar in wood N than expected by chance. Wood N was positively correlated with WD, and negatively correlated with log-transformed relative growth rates, although these relationships were relatively weak. We found evidence for co-evolution between wood N and both WD and log-transformed mortality rates. Our study provides evidence that wood N covaries with tree life-history parameters, and that these patterns consistently co-evolve in tropical hardwoods. These results provide some support for the hypothesized WES, and suggest that wood is an increasingly important N pool through tropical forest succession.

  2. Bears in a forest of gene trees: phylogenetic inference is complicated by incomplete lineage sorting and gene flow.

    PubMed

    Kutschera, Verena E; Bidon, Tobias; Hailer, Frank; Rodi, Julia L; Fain, Steven R; Janke, Axel

    2014-08-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal.

  3. Bears in a Forest of Gene Trees: Phylogenetic Inference Is Complicated by Incomplete Lineage Sorting and Gene Flow

    PubMed Central

    Kutschera, Verena E.; Bidon, Tobias; Hailer, Frank; Rodi, Julia L.; Fain, Steven R.; Janke, Axel

    2014-01-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. PMID:24903145

  4. Rooting the tree of life: the phylogenetic jury is still out

    PubMed Central

    Gouy, Richard; Baurain, Denis; Philippe, Hervé

    2015-01-01

    This article aims to shed light on difficulties in rooting the tree of life (ToL) and to explore the (sociological) reasons underlying the limited interest in accurately addressing this fundamental issue. First, we briefly review the difficulties plaguing phylogenetic inference and the ways to improve the modelling of the substitution process, which is highly heterogeneous, both across sites and over time. We further observe that enriched taxon samplings, better gene samplings and clever data removal strategies have led to numerous revisions of the ToL, and that these improved shallow phylogenies nearly always relocate simple organisms higher in the ToL provided that long-branch attraction artefacts are kept at bay. Then, we note that, despite the flood of genomic data available since 2000, there has been a surprisingly low interest in inferring the root of the ToL. Furthermore, the rare studies dealing with this question were almost always based on methods dating from the 1990s that have been shown to be inaccurate for much more shallow issues! This leads us to argue that the current consensus about a bacterial root for the ToL can be traced back to the prejudice of Aristotle's Great Chain of Beings, in which simple organisms are ancestors of more complex life forms. Finally, we demonstrate that even the best models cannot yet handle the complexity of the evolutionary process encountered both at shallow depth, when the outgroup is too distant, and at the level of the inter-domain relationships. Altogether, we conclude that the commonly accepted bacterial root is still unproven and that the root of the ToL should be revisited using phylogenomic supermatrices to ensure that new evidence for eukaryogenesis, such as the recently described Lokiarcheota, is interpreted in a sound phylogenetic framework. PMID:26323760

  5. Molecular Phylogenetics: Concepts for a Newcomer.

    PubMed

    Ajawatanawong, Pravech

    2016-10-26

    Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.

  6. Phylogenetic Analysis of Local-Scale Tree Soil Associations in a Lowland Moist Tropical Forest

    PubMed Central

    Schreeg, Laura A.; Kress, W. John; Erickson, David L.; Swenson, Nathan G.

    2010-01-01

    Background Local plant-soil associations are commonly studied at the species-level, while associations at the level of nodes within a phylogeny have been less well explored. Understanding associations within a phylogenetic context, however, can improve our ability to make predictions across systems and can advance our understanding of the role of evolutionary history in structuring communities. Methodology/Principal Findings Here we quantified evolutionary signal in plant-soil associations using a DNA sequence-based community phylogeny and several soil variables (e.g., extractable phosphorus, aluminum and manganese, pH, and slope as a proxy for soil water). We used published plant distributional data from the 50-ha plot on Barro Colorado Island (BCI), Republic of Panamá. Our results suggest some groups of closely related species do share similar soil associations. Most notably, the node shared by Myrtaceae and Vochysiaceae was associated with high levels of aluminum, a potentially toxic element. The node shared by Apocynaceae was associated with high extractable phosphorus, a nutrient that could be limiting on a taxon specific level. The node shared by the large group of Laurales and Magnoliales was associated with both low extractable phosphorus and with steeper slope. Despite significant node-specific associations, this study detected little to no phylogeny-wide signal. We consider the majority of the ‘traits’ (i.e., soil variables) evaluated to fall within the category of ecological traits. We suggest that, given this category of traits, phylogeny-wide signal might not be expected while node-specific signals can still indicate phylogenetic structure with respect to the variable of interest. Conclusions Within the BCI forest dynamics plot, distributions of some plant taxa are associated with local-scale differences in soil variables when evaluated at individual nodes within the phylogenetic tree, but they are not detectable by phylogeny-wide signal. Trends

  7. Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree.

    PubMed

    Moore, Andrew D; Grath, Sonja; Schüler, Andreas; Huylmans, Ann K; Bornberg-Bauer, Erich

    2013-05-01

    Modularity is a hallmark of molecular evolution. Whether considering gene regulation, the components of metabolic pathways or signaling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Similarly, protein domains are the modules of proteins, and modular domain rearrangements can create diversity with seemingly few operations in turn allowing for swift changes to an organism's functional repertoire. Here, we assess the patterns and functional effects of modular rearrangements at high resolution. Using a well resolved and diverse group of pancrustaceans, we illustrate arrangement diversity within closely related organisms, estimate arrangement turnover frequency and establish, for the first time, branch-specific rate estimates for fusion, fission, domain addition and terminal loss. Our results show that roughly 16 new arrangements arise per million years and that between 64% and 81% of these can be explained by simple, single-step modular rearrangement events. We find evidence that the frequencies of fission and terminal deletion events increase over time, and that modular rearrangements impact all levels of the cellular signaling apparatus and thus may have strong adaptive potential. Novel arrangements that cannot be explained by simple modular rearrangements contain a significant amount of repeat domains that occur in complex patterns which we term "supra-repeats". Furthermore, these arrangements are significantly longer than those with a single-step rearrangement solution, suggesting that such arrangements may result from multi-step events. In summary, our analysis provides an integrated view and initial quantification of the patterns and functional impact of modular protein evolution in a well resolved phylogenetic tree. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.

  8. Molecular phylogenetics before sequences

    PubMed Central

    Ragan, Mark A; Bernard, Guillaume; Chan, Cheong Xin

    2014-01-01

    From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today. PMID:24572375

  9. Comprehensive phylogenetic reconstruction of relationships in Octocorallia (Cnidaria: Anthozoa) from the Atlantic ocean using mtMutS and nad2 genes tree reconstructions

    NASA Astrophysics Data System (ADS)

    Morris, K. J.; Herrera, S.; Gubili, C.; Tyler, P. A.; Rogers, A.; Hauton, C.

    2012-12-01

    Despite being an abundant group of significant ecological importance the phylogenetic relationships of the Octocorallia remain poorly understood and very much understudied. We used 1132 bp of two mitochondrial protein-coding genes, nad2 and mtMutS (previously referred to as msh1), to construct a phylogeny for 161 octocoral specimens from the Atlantic, including both Isididae and non-Isididae species. We found that four clades were supported using a concatenated alignment. Two of these (A and B) were in general agreement with the of Holaxonia-Alcyoniina and Anthomastus-Corallium clades identified by previous work. The third and fourth clades represent a split of the Calcaxonia-Pennatulacea clade resulting in a clade containing the Pennatulacea and a small number of Isididae specimens and a second clade containing the remaining Calcaxonia. When individual genes were considered nad2 largely agreed with previous work with MtMutS also producing a fourth clade corresponding to a split of Isididae species from the Calcaxonia-Pennatulacea clade. It is expected these difference are a consequence of the inclusion of Isisdae species that have undergone a gene inversion in the mtMutS gene causing their separation in the MtMutS only tree. The fourth clade in the concatenated tree is also suspected to be a result of this gene inversion, as there were very few Isidiae species included in previous work tree and thus this separation would not be clearly resolved. A~larger phylogeny including both Isididae and non Isididae species is required to further resolve these clades.

  10. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  11. A multi-neighbor-joining approach for phylogenetic tree reconstruction and visualization.

    PubMed

    Silva, Ana Estela A da; Villanueva, Wilfredo J P; Knidel, Helder; Bonato, Viniacute Cius; Reis, Sérgio F dos; Von Zuben, Fernando J

    2005-09-30

    The computationally challenging problem of reconstructing the phylogeny of a set of contemporary data, such as DNA sequences or morphological attributes, was treated by an extended version of the neighbor-joining (NJ) algorithm. The original NJ algorithm provides a single-tree topology, after a cascade of greedy pairing decisions that tries to simultaneously optimize the minimum evolution and the least squares criteria. Given that some sub-trees are more stable than others, and that the minimum evolution tree may not be achieved by the original NJ algorithm, we propose a multi-neighbor-joining (MNJ) algorithm capable of performing multiple pairing decisions at each level of the tree reconstruction, keeping various partial solutions along the recursive execution of the NJ algorithm. The main advantages of the new reconstruction procedure are: 1) as is the case for the original NJ algorithm, the MNJ algorithm is still a low-cost reconstruction method; 2) a further investigation of the alternative topologies may reveal stable and unstable sub-trees; 3) the chance of achieving the minimum evolution tree is greater; 4) tree topologies with very similar performances will be simultaneously presented at the output. When there are multiple unrooted tree topologies to be compared, a visualization tool is also proposed, using a radial layout to uniformly distribute the branches with the help of well-known metaheuristics used in computer science.

  12. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments

    PubMed Central

    Jansen, M. Andrew; Franz, Nico M.

    2015-01-01

    Abstract This contribution adopts the taxonomic concept annotation and alignment approach. Accordingly, and where indicated, previous and newly inferred meanings of taxonomic names are individuated according to one specific source. Articulations among these concepts and pairwise, logically consistent alignments of original and revisionary classifications are also provided, in addition to conventional nomenclatural provenance information. A phylogenetic revision of the broad-nosed weevil genera Minyomerus Horn, 1876 sec. O’Brien & Wibmer (1982), and Piscatopus Sleeper, 1960 sec. O’Brien & Wibmer (1982) (Curculionidae [non-focal]: Entiminae [non-focal]: Tanymecini [non-focal]) is presented. Prior to this study, Minyomerus sec. O’Brien & Wibmer (1982) contained seven species, whereas the monotypic Piscatopus sec. O’Brien & Wibmer (1982) was comprised solely of Piscatopus griseus Sleeper, 1960 sec. O’Brien & Wibmer (1982). We thoroughly redescribe these recognized species-level entities and furthermore describe ten species as new to science: Minyomerus bulbifrons sec. Jansen & Franz (2015) (henceforth: [JF2015]), sp. n., Minyomerus aeriballux [JF2015], sp. n., Minyomerus cracens [JF2015], sp. n., Minyomerus gravivultus [JF2015], sp. n., Minyomerus imberbus [JF2015], sp. n., Minyomerus reburrus [JF2015], sp. n., Minyomerus politus [JF2015], sp. n., Minyomerus puticulatus [JF2015], sp. n., Minyomerus rutellirostris [JF2015], sp. n., and Minyomerus trisetosus [JF2015], sp. n. A cladistic analysis using 46 morphological characters of 22 terminal taxa (5/17 outgroup/ingroup) yielded a single most-parsimonious cladogram (L = 82, CI = 65, RI = 82). The analysis strongly supports the monophyly of Minyomerus [JF2015] with eight unreversed synapomorphies, and places Piscatopus griseus sec. O’Brien & Wibmer (1982) within the genus as sister to Minyomerus rutellirostris [JF2015]. Accordingly, Piscatopus sec. Sleeper (1960), syn. n. is changed to junior synonymy of

  13. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments.

    PubMed

    Jansen, M Andrew; Franz, Nico M

    2015-01-01

    This contribution adopts the taxonomic concept annotation and alignment approach. Accordingly, and where indicated, previous and newly inferred meanings of taxonomic names are individuated according to one specific source. Articulations among these concepts and pairwise, logically consistent alignments of original and revisionary classifications are also provided, in addition to conventional nomenclatural provenance information. A phylogenetic revision of the broad-nosed weevil genera Minyomerus Horn, 1876 sec. O'Brien & Wibmer (1982), and Piscatopus Sleeper, 1960 sec. O'Brien & Wibmer (1982) (Curculionidae [non-focal]: Entiminae [non-focal]: Tanymecini [non-focal]) is presented. Prior to this study, Minyomerus sec. O'Brien & Wibmer (1982) contained seven species, whereas the monotypic Piscatopus sec. O'Brien & Wibmer (1982) was comprised solely of Piscatopus griseus Sleeper, 1960 sec. O'Brien & Wibmer (1982). We thoroughly redescribe these recognized species-level entities and furthermore describe ten species as new to science: Minyomerus bulbifrons sec. Jansen & Franz (2015) (henceforth: [JF2015]), sp. n., Minyomerus aeriballux [JF2015], sp. n., Minyomerus cracens [JF2015], sp. n., Minyomerus gravivultus [JF2015], sp. n., Minyomerus imberbus [JF2015], sp. n., Minyomerus reburrus [JF2015], sp. n., Minyomerus politus [JF2015], sp. n., Minyomerus puticulatus [JF2015], sp. n., Minyomerus rutellirostris [JF2015], sp. n., and Minyomerus trisetosus [JF2015], sp. n. A cladistic analysis using 46 morphological characters of 22 terminal taxa (5/17 outgroup/ingroup) yielded a single most-parsimonious cladogram (L = 82, CI = 65, RI = 82). The analysis strongly supports the monophyly of Minyomerus [JF2015] with eight unreversed synapomorphies, and places Piscatopus griseus sec. O'Brien & Wibmer (1982) within the genus as sister to Minyomerus rutellirostris [JF2015]. Accordingly, Piscatopus sec. Sleeper (1960), syn. n. is changed to junior synonymy of Minyomerus [JF2015], and

  14. Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

    PubMed Central

    Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka

    2016-01-01

    Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378296

  15. Analysis of full-length genomes of porcine teschovirus (PTV) and the effect of purifying selection on phylogenetic trees.

    PubMed

    Villanova, Fabiola; Cui, Shangjin; Ai, Xia; Leal, Élcio

    2016-05-01

    To study the outcome of natural selection using phylogenetic trees, we analyzed full-length genome sequences of porcine teschovirus (PTV). PTV belongs to the family Picornaviridae and has a positive-stranded RNA genome, the replication of which is carried out by the error-prone viral RNA-dependent RNA polymerase. The viral RNA encodes a single polyprotein that is cleaved into structural (i.e., L, VP4, VP2, VP3 and VP1) and nonstructural proteins (i.e., 2A, 2B, 2C, 3A, 3B, and 3C). A high degree of genetic diversity was found based on the pairwise nucleotide distances and on the mean ratio of the number of nonsynonymous (dN) and synonymous (dS) substitutions (dN/dS) in the structural genes. Conversely, the diversity of the nonstructural genes was lower. The differences in genetic diversity between the structural and nonstructural genomic regions were likely due to strong purifying selection; consequently, the estimates of phylogenies were also discordant among these genes. In particular, maximum-likelihood and Bayesian methods generated short-branched trees when loci that are under strong purifying selection were used. These findings indicate that even in an RNA virus with an intrinsically high mutation rate, a strong purifying selection will curb genetic diversity and should be considered an important source of bias in future studies based on phylogenetic methods.

  16. Determining the Position of Storks on the Phylogenetic Tree of Waterbirds by Retroposon Insertion Analysis

    PubMed Central

    Kuramoto, Tae; Nishihara, Hidenori; Watanabe, Maiko; Okada, Norihiro

    2015-01-01

    Despite many studies on avian phylogenetics in recent decades that used morphology, mitochondrial genomes, and/or nuclear genes, the phylogenetic positions of several birds (e.g., storks) remain unsettled. In addition to the aforementioned approaches, analysis of retroposon insertions, which are nearly homoplasy-free phylogenetic markers, has also been used in avian phylogenetics. However, the first step in the analysis of retroposon insertions, that is, isolation of retroposons from genomic libraries, is a costly and time-consuming procedure. Therefore, we developed a high-throughput and cost-effective protocol to collect retroposon insertion information based on next-generation sequencing technology, which we call here the STRONG (Screening of Transposons Obtained by Next Generation Sequencing) method, and applied it to 3 waterbird species, for which we identified 35,470 loci containing chicken repeat 1 retroposons (CR1). Our analysis of the presence/absence of 30 CR1 insertions demonstrated the intra- and interordinal phylogenetic relationships in the waterbird assemblage, namely 1) Loons diverged first among the waterbirds, 2) penguins (Sphenisciformes) and petrels (Procellariiformes) diverged next, and 3) among the remaining families of waterbirds traditionally classified in Ciconiiformes/Pelecaniformes, storks (Ciconiidae) diverged first. Furthermore, our genome-scale, in silico retroposon analysis based on published genome data uncovered a complex divergence history among pelican, heron, and ibis lineages, presumably involving ancient interspecies hybridization between the heron and ibis lineages. Thus, our retroposon-based waterbird phylogeny and the established phylogenetic position of storks will help to understand the evolutionary processes of aquatic adaptation and related morphological convergent evolution. PMID:26527652

  17. Large multi-gene phylogenetic trees of the grasses (Poaceae): progress towards complete tribal and generic level sampling.

    PubMed

    Bouchenak-Khelladi, Yanis; Salamin, Nicolas; Savolainen, Vincent; Forest, Felix; Bank, Michelle van der; Chase, Mark W; Hodkinson, Trevor R

    2008-05-01

    In this paper we included a very broad representation of grass family diversity (84% of tribes and 42% of genera). Phylogenetic inference was based on three plastid DNA regions rbcL, matK and trnL-F, using maximum parsimony and Bayesian methods. Our results resolved most of the subfamily relationships within the major clades (BEP and PACCMAD), which had previously been unclear, such as, among others the: (i) BEP and PACCMAD sister relationship, (ii) composition of clades and the sister-relationship of Ehrhartoideae and Bambusoideae + Pooideae, (iii) paraphyly of tribe Bambuseae, (iv) position of Gynerium as sister to Panicoideae, (v) phylogenetic position of Micrairoideae. With the presence of a relatively large amount of missing data, we were able to increase taxon sampling substantially in our analyses from 107 to 295 taxa. However, bootstrap support and to a lesser extent Bayesian inference posterior probabilities were generally lower in analyses involving missing data than those not including them. We produced a fully resolved phylogenetic summary tree for the grass family at subfamily level and indicated the most likely relationships of all included tribes in our analysis.

  18. Biological pattern and transcriptomic exploration and phylogenetic analysis in the odd floral architecture tree: Helwingia willd

    PubMed Central

    2014-01-01

    Background Odd traits in few of plant species usually implicate potential biology significances in plant evolutions. The genus Helwingia Willd, a dioecious medical shrub in Aquifoliales order, has an odd floral architecture-epiphyllous inflorescence. The potential significances and possible evolutionary origin of this specie are not well understood due to poorly available data of biological and genetic studies. In addition, the advent of genomics-based technologies has widely revolutionized plant species with unknown genomic information. Results Morphological and biological pattern were detailed via anatomical and pollination analyses. An RNA sequencing based transcriptomic analysis were undertaken and a high-resolution phylogenetic analysis was conducted based on single-copy genes in more than 80 species of seed plants, including H. japonica. It is verified that a potential fusion of rachis to the leaf midvein facilitates insect pollination. RNA sequencing yielded a total of 111450 unigenes; half of them had significant similarity with proteins in the public database, and 20281 unigenes were mapped to 119 pathways. Deduced from the phylogenetic analysis based on single-copy genes, the group of Helwingia is closer with Euasterids II and rather than Euasterids, congruent with previous reports using plastid sequences. Conclusions The odd flower architecture make H. Willd adapt to insect pollination by hosting those insects larger than the flower in size via leave, which has little common character that other insect pollination plants hold. Further the present transcriptome greatly riches genomics information of Helwingia species and nucleus genes based phylogenetic analysis also greatly improve the resolution and robustness of phylogenetic reconstruction in H. japonica. PMID:24969969

  19. Evolutionary history of woodpeckers and allies (Aves: Picidae): placing key taxa on the phylogenetic tree.

    PubMed

    Benz, Brett W; Robbins, Mark B; Peterson, A Townsend

    2006-08-01

    We analyzed 2995 base pairs of nucleotide sequence data (nuclear beta-fibrinogen intron 7 and mitochondrial cytochrome b and ND2 genes), using parsimony and model-based approaches to infer phylogenetic relationships of the woodpeckers and allies, yielding novel hypotheses for several critical gaps in the knowledge of picid phylogeny. We tested the monophyly of sub-families within the Picidae, and sampled from widely distributed and diverse genera (Celeus, Colaptes, Dryocopus, Melanerpes, Picoides, Picumnus, Sasia, Piculus, and Picus). Relationships of three poorly known Southeast Asian genera (Dinopium, Reinwardtipicus, and Blythipicus) were also examined, revealing unexpected sister relationships. All phylogenetic approaches recovered largely congruent topologies, supporting a monophyletic Picinae and paraphyletic Picumninae, with the monotypic piculet, Nesoctites micromegas, as sister to the Picinae. We report paraphyly for Celeus and Piculus, whereas the broadly distributed genera Picumnus and Dryocopus were supported as monophyletic. Our phylogenetic results indicate a complex geographic history for the Picidae, with multiple disjunct sister lineages distributed between the New World and Asia. The relationships and geographic distribution of basal picid lineages indicates an Old World origin of the Picidae; however, the geographic origin of the Picinae remains equivocal, as the sister relationship between the Caribbean N. micromegas and the true woodpeckers presents the possibility of a New World origin for the Picinae.

  20. Extreme convergence in stick insect evolution: phylogenetic placement of the Lord Howe Island tree lobster

    PubMed Central

    Buckley, Thomas R.; Attanayake, Dilini; Bradler, Sven

    2008-01-01

    The ‘tree lobsters’ are an enigmatic group of robust, ground-dwelling stick insects (order Phasmatodea) from the subfamily Eurycanthinae, distributed in New Guinea, New Caledonia and associated islands. Its most famous member is the Lord Howe Island stick insect Dryococelus australis (Montrouzier), which was believed to have become extinct but was rediscovered in 2001 and is considered to be one of the rarest insects in the world. To resolve the evolutionary position of Dryococelus, we constructed a phylogeny from approximately 2.4 kb of mitochondrial and nuclear sequence data from representatives of all major phasmatodean lineages. Our data placed Dryococelus and the New Caledonian tree lobsters outside the New Guinean Eurycanthinae as members of an unrelated Australasian stick insect clade, the Lanceocercata. These results suggest a convergent origin of the ‘tree lobster’ body form. Our reanalysis of tree lobster characters provides additional support for our hypothesis of convergent evolution. We conclude that the phenotypic traits leading to the traditional classification are convergent adaptations to ground-living behaviour. Our molecular dating analyses indicate an ancient divergence (more than 22 Myr ago) between Dryococelus and its Australian relatives. Hence, Dryococelus represents a long-standing separate evolutionary lineage within the stick insects and must be regarded as a key taxon to protect with respect to phasmatodean diversity. PMID:19129110

  1. An Efficient Independence Sampler for Updating Branches in Bayesian Markov chain Monte Carlo Sampling of Phylogenetic Trees.

    PubMed

    Aberer, Andre J; Stamatakis, Alexandros; Ronquist, Fredrik

    2016-01-01

    Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that do not rely on old branch lengths but instead are based on approximations of the conditional posterior. Using a diverse set of empirical data sets, we show that most conditional branch posteriors can be accurately approximated via a [Formula: see text] distribution. We empirically determine the relationship between the logarithmic conditional posterior density, its derivatives, and the characteristics of the branch posterior. We use these relationships to derive an independence sampler for proposing branches with an acceptance ratio of ~90% on most data sets. This proposal samples branches between 2× and 3× more efficiently than traditional proposals with respect to the effective sample size per unit of runtime. We also compare the performance of standard topology proposals with hybrid proposals that use the new independence sampler to update those branches that are most affected by the topological change. Our results show that hybrid proposals can sometimes noticeably decrease the number of generations necessary for topological convergence. Inconsistent performance gains indicate that branch updates are not the limiting factor in improving topological convergence for the currently employed set of proposals. However, our independence sampler might be essential for the construction of novel tree proposals that apply more radical topology changes.

  2. [Phylogeny of genus Spermophilus and position of Alashan ground squirrel (Spermophilus alashanicus, Buchner, 1888) on phylogenetic tree of Paleartic short-tailed ground squirrels].

    PubMed

    Kapustina, S Yu; Brandler, O V; Adiya, Ya

    2015-01-01

    Phylogenetic relationships within a group of Paleartic short tailed ground squirrels (Spermophilus), recently defined as genus, are not sufficiently clear and need a critical revision. Interspecies hybridization, found in Eurasian Spermophilus, can affect the results of reconstruction of molecular phylogeny. Alashan ground squirrel position on the phylogenetic tree needs clarification. We analyzed eight nucleotide sequences of cytb gene of S. alashanicus and 127 sequences of other Spermophilus species form GenBank. S.alashanicus and S. dauricus close phylogenetic relationship, and their affinity to ancestral forms of the group are revealed. Monophyly of Colobotis subgenus was confirmed. Paraphyly of eastern and western forms of S. relictus was shown.

  3. PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees

    PubMed Central

    Ribeiro-Gonçalves, Bruno; Francisco, Alexandre P.; Vaz, Cátia; Ramirez, Mário; Carriço, João André

    2016-01-01

    High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software. PHYLOViZ Online is freely available at https://online.phyloviz.net. PMID:27131357

  4. The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction

    NASA Technical Reports Server (NTRS)

    Weisburg, W. G.; Giovannoni, S. J.; Woese, C. R.

    1989-01-01

    Through comparative analysis of 16S ribosomal RNA sequences, it can be shown that two seemingly dissimilar types of eubacteria Deinococcus and the ubiquitous hot spring organism Thermus are distantly but specifically related to one another. This confirms an earlier report based upon 16S rRNA oligonucleotide cataloging studies (Hensel et al., 1986). Their two lineages form a distinctive grouping within the eubacteria that deserved the taxonomic status of a phylum. The (partial) sequence of T. aquaticus rRNA appears relatively close to those of other thermophilic eubacteria. e.g. Thermotoga maritima and Thermomicrobium roseum. However, this closeness does not reflect a true evolutionary closeness; rather it is due to a "thermophilic convergence", the result of unusually high G+C composition in the rRNAs of thermophilic bacteria. Unless such compositional biases are taken into account, the branching order and root of phylogenetic trees can be incorrectly inferred.

  5. Phylogenetic assemblage structure of North American trees is more strongly shaped by glacial-interglacial climate variability in gymnosperms than in angiosperms.

    PubMed

    Ma, Ziyu; Sandel, Brody; Svenning, Jens-Christian

    2016-05-01

    How fast does biodiversity respond to climate change? The relationship of past and current climate with phylogenetic assemblage structure helps us to understand this question. Studies of angiosperm tree diversity in North America have already suggested effects of current water-energy balance and tropical niche conservatism. However, the role of glacial-interglacial climate variability remains to be determined, and little is known about any of these relationships for gymnosperms. Moreover, phylogenetic endemism, the concentration of unique lineages in restricted ranges, may also be related to glacial-interglacial climate variability and needs more attention. We used a refined phylogeny of both angiosperms and gymnosperms to map phylogenetic diversity, clustering and endemism of North American trees in 100-km grid cells, and climate change velocity since Last Glacial Maximum together with postglacial accessibility to recolonization to quantify glacial-interglacial climate variability. We found: (1) Current climate is the dominant factor explaining the overall patterns, with more clustered angiosperm assemblages toward lower temperature, consistent with tropical niche conservatism. (2) Long-term climate stability is associated with higher angiosperm endemism, while higher postglacial accessibility is linked to to more phylogenetic clustering and endemism in gymnosperms. (3) Factors linked to glacial-interglacial climate change have stronger effects on gymnosperms than on angiosperms. These results suggest that paleoclimate legacies supplement current climate in shaping phylogenetic patterns in North American trees, and especially so for gymnosperms.

  6. Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus).

    PubMed

    Moody, Michael L; Rieseberg, Loren H

    2012-07-01

    The annual sunflowers (Helianthus sect. Helianthus) present a formidable challenge for phylogenetic inference because of ancient hybrid speciation, recent introgression, and suspected issues with deep coalescence. Here we analyze sequence data from 11 nuclear DNA (nDNA) genes for multiple genotypes of species within the section to (1) reconstruct the phylogeny of this group, (2) explore the utility of nDNA gene trees for detecting hybrid speciation and introgression; and (3) test an empirical method of hybrid identification based on the phylogenetic congruence of nDNA gene trees from tightly linked genes. We uncovered considerable topological heterogeneity among gene trees with or without three previously identified hybrid species included in the analyses, as well as a general lack of reciprocal monophyly of species. Nonetheless, partitioned Bayesian analyses provided strong support for the reciprocal monophyly of all species except H. annuus (0.89 PP), the most widespread and abundant annual sunflower. Previous hypotheses of relationships among taxa were generally strongly supported (1.0 PP), except among taxa typically associated with H. annuus, apparently due to the paraphyly of the latter in all gene trees. While the individual nDNA gene trees provided a useful means for detecting recent hybridization, identification of ancient hybridization was problematic for all ancient hybrid species, even when linkage was considered. We discuss biological factors that affect the efficacy of phylogenetic methods for hybrid identification.

  7. Predicting MicroRNA Biomarkers for Cancer Using Phylogenetic Tree and Microarray Analysis

    PubMed Central

    Wang, Hsiuying

    2016-01-01

    MicroRNAs (miRNAs) are shown to be involved in the initiation and progression of cancers in the literature, and the expression of miRNAs is used as an important cancer prognostic tool. The aim of this study is to predict high-confidence miRNA biomarkers for cancer. We adopt a method that combines miRNA phylogenetic structure and miRNA microarray data analysis to discover high-confidence miRNA biomarkers for colon, prostate, pancreatic, lung, breast, bladder and kidney cancers. There are 53 miRNAs selected through this method that either have potential to involve a single cancer’s development or to involve several cancers’ development. These miRNAs can be used as high-confidence miRNA biomarkers of these seven investigated cancers for further experiment validation. miR-17, miR-20, miR-106a, miR-106b, miR-92, miR-25, miR-16, miR-195 and miR-143 are selected to involve a single cancer’s development in these seven cancers. They have the potential to be useful miRNA biomarkers when the result can be confirmed by experiments. PMID:27213352

  8. Temporal turnover in the composition of tropical tree communities: functional determinism and phylogenetic stochasticity.

    PubMed

    Swenson, Nathan G; Stegen, James C; Davies, Stuart J; Erickson, David L; Forero-Montaña, Jimena; Hurlbert, Allen H; Kress, W John; Thompson, Jill; Uriarte, María; Wright, S Joseph; Zimmerman, Jess K

    2012-03-01

    The degree to which turnover in biological communities is structured by deterministic or stochastic factors and the identities of influential deterministic factors are fundamental, yet unresolved, questions in ecology. Answers to these questions are particularly important for projecting the fate of forests with diverse disturbance histories worldwide. To uncover the processes governing turnover we use species-level molecular phylogenies and functional trait data sets for two long-term tropical forest plots with contrasting disturbance histories: one forest is older-growth, and one was recently disturbed. Having both phylogenetic and functional information further allows us to parse out the deterministic influences of different ecological filters. With the use of null models we find that compositional turnover was random with respect to phylogeny on average, but highly nonrandom with respect to measured functional traits. Furthermore, as predicted by a deterministic assembly process, the older-growth and disturbed forests were characterized by less than and greater than expected functional turnover, respectively. These results suggest that the abiotic environment, which changes due to succession in the disturbed forest, strongly governs the temporal dynamics of disturbed and undisturbed tropical forests. Predicting future changes in the composition of disturbed and undisturbed forests may therefore be tractable when using a functional-trait-based approach.

  9. Complete mitochondrial genome from South American catfish Pseudoplatystoma reticulatum (Eigenmann & Eigenmann) and its impact in Siluriformes phylogenetic tree.

    PubMed

    Villela, Luciana Cristine Vasques; Alves, Anderson Luis; Varela, Eduardo Sousa; Yamagishi, Michel Eduardo Beleza; Giachetto, Poliana Fernanda; da Silva, Naiara Milagres Augusto; Ponzetto, Josi Margarete; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues

    2017-02-01

    The cachara (Pseudoplatystoma reticulatum) is a Neotropical freshwater catfish from family Pimelodidae (Siluriformes) native to Brazil. The species is of relative economic importance for local aquaculture production and basic biological information is under development to help boost efforts to domesticate and raise the species in commercial systems. The complete cachara mitochondrial genome was obtained by assembling Illumina RNA-seq data from pooled samples. The full mitogenome was found to be 16,576 bp in length, showing the same basic structure, order, and genetic organization observed in other Pimelodidae, with 13 protein-coding genes, 2 rNA genes, 22 trNAs, and a control region. Observed base composition was 24.63% T, 28.47% C, 31.45% A, and 15.44% G. With the exception of NAD6 and eight tRNAs, all of the observed mitochondrial genes were found to be coded on the H strand. A total of 107 SNPs were identified in P. reticulatum mtDNA, 67 of which were located in coding regions. Of these SNPs, 10 result in amino acid changes. Analysis of the obtained sequence with 94 publicly available full Siluriformes mitogenomes resulted in a phylogenetic tree that generally agreed with available phylogenetic proposals for the order. The first report of the complete Pseudoplatystoma reticulatum mitochondrial genome sequence revealed general gene organization, structure, content, and order similar to most vertebrates. Specific sequence and content features were observed and may have functional attributes which are now available for further investigation.

  10. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree

    PubMed Central

    2013-01-01

    Background The Solanaceae is a plant family of great economic importance. Despite a wealth of phylogenetic work on individual clades and a deep knowledge of particular cultivated species such as tomato and potato, a robust evolutionary framework with a dated molecular phylogeny for the family is still lacking. Here we investigate molecular divergence times for Solanaceae using a densely-sampled species-level phylogeny. We also review the fossil record of the family to derive robust calibration points, and estimate a chronogram using an uncorrelated relaxed molecular clock. Results Our densely-sampled phylogeny shows strong support for all previously identified clades of Solanaceae and strongly supported relationships between the major clades, particularly within Solanum. The Tomato clade is shown to be sister to section Petota, and the Regmandra clade is the first branching member of the Potato clade. The minimum age estimates for major splits within the family provided here correspond well with results from previous studies, indicating splits between tomato & potato around 8 Million years ago (Ma) with a 95% highest posterior density (HPD) 7–10 Ma, Solanum & Capsicum c. 19 Ma (95% HPD 17–21), and Solanum & Nicotiana c. 24 Ma (95% HPD 23–26). Conclusions Our large time-calibrated phylogeny provides a significant step towards completing a fully sampled species-level phylogeny for Solanaceae, and provides age estimates for the whole family. The chronogram now includes 40% of known species and all but two monotypic genera, and is one of the best sampled angiosperm family phylogenies both in terms of taxon sampling and resolution published thus far. The increased resolution in the chronogram combined with the large increase in species sampling will provide much needed data for the examination of many biological questions using Solanaceae as a model system. PMID:24283922

  11. Amino acid sequence of myoglobin from the chiton Liolophura japonica and a phylogenetic tree for molluscan globins.

    PubMed

    Suzuki, T; Furukohri, T; Okamoto, S

    1993-02-01

    Myoglobin was isolated from the radular muscle of the chiton Liolophura japonica, a primitive archigastropodic mollusc. Liolophura contains three monomeric myoglobins (I, II, and III), and the complete amino acid sequence of myoglobin I has been determined. It is composed of 145 amino acid residues, and the molecular mass was calculated to be 16,070 D. The E7 distal histidine, which is replaced by valine or glutamine in several molluscan globins, is conserved in Liolophura myoglobin. The autoxidation rate at physiological conditions indicated that Liolophura oxymyoglobin is fairly stable when compared with other molluscan myoglobins. The amino acid sequence of Liolophura myoglobin shows low homology (11-21%) with molluscan dimeric myoglobins and hemoglobins, but shows higher homology (26-29%) with monomeric myoglobins from the gastropodic molluscs Aplysia, Dolabella, and Bursatella. A phylogenetic tree was constructed from 19 molluscan globin sequences. The tree separated them into two distinct clusters, a cluster for muscle myoglobins and a cluster for erythrocyte or gill hemoglobins. The myoglobin cluster is divided further into two subclusters, corresponding to monomeric and dimeric myoglobins, respectively. Liolophura myoglobin was placed on the branch of monomeric myoglobin lineage, showing that it diverged earlier from other monomeric myoglobins. The hemoglobin cluster is also divided into two subclusters. One cluster contains homodimeric, heterodimeric, tetrameric, and didomain chains of erythrocyte hemoglobins of the blood clams Anadara, Scapharca, and Barbatia. Of special interest is the other subcluster. It consists of three hemoglobin chains derived from the bacterial symbiontharboring clams Calyptogena and Lucina, in which hemoglobins are supposed to play an important role in maintaining the symbiosis with sulfide bacteria.

  12. Phylogenetic Tree Analysis of the Cold-Hot Nature of Traditional Chinese Marine Medicine for Possible Anticancer Activity

    PubMed Central

    Song, Xuxia; Li, Xuebo; Zhang, Fengcong; Wang, Changyun

    2017-01-01

    Traditional Chinese Marine Medicine (TCMM) represents one of the medicinal resources for research and development of novel anticancer drugs. In this study, to investigate the presence of anticancer activity (AA) displayed by cold or hot nature of TCMM, we analyzed the association relationship and the distribution regularity of TCMMs with different nature (613 TCMMs originated from 1,091 species of marine organisms) via association rules mining and phylogenetic tree analysis. The screened association rules were collected from three taxonomy groups: (1) Bacteria superkingdom, Phaeophyceae class, Fucales order, Sargassaceae family, and Sargassum genus; (2) Viridiplantae kingdom, Streptophyta phylum, Malpighiales class, and Rhizophoraceae family; (3) Holothuroidea class, Aspidochirotida order, and Holothuria genus. Our analyses showed that TCMMs with closer taxonomic relationship were more likely to possess anticancer bioactivity. We found that the cluster pattern of marine organisms with reported AA tended to cluster with cold nature TCMMs. Moreover, TCMMs with salty-cold nature demonstrated properties for softening hard mass and removing stasis to treat cancers, and species within Metazoa or Viridiplantae kingdom of cold nature were more likely to contain AA properties. We propose that TCMMs from these marine groups may enable focused bioprospecting for discovery of novel anticancer drugs derived from marine bioresources. PMID:28191021

  13. Hal: an Automated Pipeline for Phylogenetic Analyses of Genomic Data

    PubMed Central

    Robbertse, Barbara; Yoder, Ryan J.; Boyd, Alex; Reeves, John; Spatafora, Joseph W.

    2011-01-01

    The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed. PMID:21327165

  14. Were the first springtails semi-aquatic? A phylogenetic approach by means of 28S rDNA and optimization alignment.

    PubMed Central

    D'Haese, Cyrille A

    2002-01-01

    Emergence from an aquatic environment to the land is one of the major evolutionary transitions within the arthropods. It is often considered that the first hexapods, and in particular the first springtails, were semi-aquatic and this assumption drives evolutionary models towards particular conclusions. To address the question of the ecological origin of the springtails, phylogenetic analyses by optimization alignment were performed on D1 and D2 regions of the 28S rDNA for 55 collembolan exemplars and eight outgroups. Relationships among the orders Symphypleona, Entomobryomorpha and Poduromorpha are inferred. More specifically, a robust hypothesis is provided for the subfamilial relationships within the order Poduromorpha. Contrary to previous statements, the semi-aquatic species Podura aquatica is not basal or 'primitive', but well nested in the Poduromorpha. The analyses performed for the 24 different weighting schemes yielded the same conclusion: semi-aquatic ecology is not ancestral for the springtails. It is a derived condition that evolved independently several times. The adaptation for semi-aquatic life is better interpreted as a step towards independence from land, rather than indication of an aquatic origin. PMID:12061958

  15. Were the first springtails semi-aquatic? A phylogenetic approach by means of 28S rDNA and optimization alignment.

    PubMed

    D'Haese, Cyrille A

    2002-06-07

    Emergence from an aquatic environment to the land is one of the major evolutionary transitions within the arthropods. It is often considered that the first hexapods, and in particular the first springtails, were semi-aquatic and this assumption drives evolutionary models towards particular conclusions. To address the question of the ecological origin of the springtails, phylogenetic analyses by optimization alignment were performed on D1 and D2 regions of the 28S rDNA for 55 collembolan exemplars and eight outgroups. Relationships among the orders Symphypleona, Entomobryomorpha and Poduromorpha are inferred. More specifically, a robust hypothesis is provided for the subfamilial relationships within the order Poduromorpha. Contrary to previous statements, the semi-aquatic species Podura aquatica is not basal or 'primitive', but well nested in the Poduromorpha. The analyses performed for the 24 different weighting schemes yielded the same conclusion: semi-aquatic ecology is not ancestral for the springtails. It is a derived condition that evolved independently several times. The adaptation for semi-aquatic life is better interpreted as a step towards independence from land, rather than indication of an aquatic origin.

  16. Characterization and phylogenetic analysis of Krüppel-like transcription factor (KLF) gene family in tree shrews (Tupaia belangeri chinensis).

    PubMed

    Shao, Ming; Ge, Guang-Zhe; Liu, Wen-Jing; Xiao, Ji; Xia, Hou-Jun; Fan, Yu; Zhao, Feng; He, Bao-Li; Chen, Ceshi

    2016-12-10

    Krüppel-like factors (KLFs) are a family of zinc finger transcription factors regulating embryonic development and diseases. The phylogenetics of KLFs has not been studied in tree shrews, an animal lineage with a closer relationship to primates than rodents. Here, we identified 17 KLFs from Chinese tree shrew (Tupaia belangeri chinensis). KLF proteins are highly conserved among humans, monkeys, rats, mice and tree shrews compared to zebrafish and chickens. The CtBP binding site, Sin3A binding site and nuclear localization signals are largely conserved between tree shrews and human beings. Tupaia belangeri (Tb) KLF5 contains several conserved post-transcriptional modification motifs. Moreover, the mRNA and protein expression patterns of multiple tbKLFs are tissue-specific . TbKLF5, like hKLF5, significantly promotes NIH3T3 cell proliferation in vitro. These results provide insight for future studies regarding the structure and function of the tbKLF gene family.

  17. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  18. Open Reading Frame Phylogenetic Analysis on the Cloud

    PubMed Central

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  19. Multiple protein structure alignment.

    PubMed Central

    Taylor, W. R.; Flores, T. P.; Orengo, C. A.

    1994-01-01

    A method was developed to compare protein structures and to combine them into a multiple structure consensus. Previous methods of multiple structure comparison have only concatenated pairwise alignments or produced a consensus structure by averaging coordinate sets. The current method is a fusion of the fast structure comparison program SSAP and the multiple sequence alignment program MULTAL. As in MULTAL, structures are progressively combined, producing intermediate consensus structures that are compared directly to each other and all remaining single structures. This leads to a hierarchic "condensation," continually evaluated in the light of the emerging conserved core regions. Following the SSAP approach, all interatomic vectors were retained with well-conserved regions distinguished by coherent vector bundles (the structural equivalent of a conserved sequence position). Each bundle of vectors is summarized by a resultant, whereas vector coherence is captured in an error term, which is the only distinction between conserved and variable positions. Resultant vectors are used directly in the comparison, which is weighted by their error values, giving greater importance to the matching of conserved positions. The resultant vectors and their errors can also be used directly in molecular modeling. Applications of the method were assessed by the quality of the resulting sequence alignments, phylogenetic tree construction, and databank scanning with the consensus. Visual assessment of the structural superpositions and consensus structure for various well-characterized families confirmed that the consensus had identified a reasonable core. PMID:7849601

  20. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

    PubMed

    Gardner, Shea N; Hall, Barry G

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.

  1. Internal Transcribed Spacer 2 (nu ITS2 rRNA) Sequence-Structure Phylogenetics: Towards an Automated Reconstruction of the Green Algal Tree of Life

    PubMed Central

    Buchheim, Mark A.; Keller, Alexander; Koetschan, Christian; Förster, Frank; Merget, Benjamin; Wolf, Matthias

    2011-01-01

    Background Chloroplast-encoded genes (matK and rbcL) have been formally proposed for use in DNA barcoding efforts targeting embryophytes. Extending such a protocol to chlorophytan green algae, though, is fraught with problems including non homology (matK) and heterogeneity that prevents the creation of a universal PCR toolkit (rbcL). Some have advocated the use of the nuclear-encoded, internal transcribed spacer two (ITS2) as an alternative to the traditional chloroplast markers. However, the ITS2 is broadly perceived to be insufficiently conserved or to be confounded by introgression or biparental inheritance patterns, precluding its broad use in phylogenetic reconstruction or as a DNA barcode. A growing body of evidence has shown that simultaneous analysis of nucleotide data with secondary structure information can overcome at least some of the limitations of ITS2. The goal of this investigation was to assess the feasibility of an automated, sequence-structure approach for analysis of IT2 data from a large sampling of phylum Chlorophyta. Methodology/Principal Findings Sequences and secondary structures from 591 chlorophycean, 741 trebouxiophycean and 938 ulvophycean algae, all obtained from the ITS2 Database, were aligned using a sequence structure-specific scoring matrix. Phylogenetic relationships were reconstructed by Profile Neighbor-Joining coupled with a sequence structure-specific, general time reversible substitution model. Results from analyses of the ITS2 data were robust at multiple nodes and showed considerable congruence with results from published phylogenetic analyses. Conclusions/Significance Our observations on the power of automated, sequence-structure analyses of ITS2 to reconstruct phylum-level phylogenies of the green algae validate this approach to assessing diversity for large sets of chlorophytan taxa. Moreover, our results indicate that objections to the use of ITS2 for DNA barcoding should be weighed against the utility of an automated

  2. A practical guide to phylogenetics for nonexperts.

    PubMed

    O'Halloran, Damien

    2014-02-05

    Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab.

  3. A Practical Guide to Phylogenetics for Nonexperts

    PubMed Central

    O'Halloran, Damien

    2014-01-01

    Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab. PMID:24562012

  4. Evolutionary history of the Afro-Madagascan Ixora species (Rubiaceae): species diversification and distribution of key morphological traits inferred from dated molecular phylogenetic trees

    PubMed Central

    Tosh, J.; Dessein, S.; Buerki, S.; Groeninckx, I.; Mouly, A.; Bremer, B.; Smets, E. F.; De Block, P.

    2013-01-01

    Background and Aims Previous work on the pantropical genus Ixora has revealed an Afro-Madagascan clade, but as yet no study has focused in detail on the evolutionary history and morphological trends in this group. Here the evolutionary history of Afro-Madagascan Ixora spp. (a clade of approx. 80 taxa) is investigated and the phylogenetic trees compared with several key morphological traits in taxa occurring in Madagascar. Methods Phylogenetic relationships of Afro-Madagascan Ixora are assessed using sequence data from four plastid regions (petD, rps16, rpoB-trnC and trnL-trnF) and nuclear ribosomal external transcribed spacer (ETS) and internal transcribed spacer (ITS) regions. The phylogenetic distribution of key morphological characters is assessed. Bayesian inference (implemented in BEAST) is used to estimate the temporal origin of Ixora based on fossil evidence. Key Results Two separate lineages of Madagascan taxa are recovered, one of which is nested in a group of East African taxa. Divergence in Ixora is estimated to have commenced during the mid Miocene, with extensive cladogenesis occurring in the Afro-Madagascan clade during the Pliocene onwards. Conclusions Both lineages of Madagascan Ixora exhibit morphological innovations that are rare throughout the rest of the genus, including a trend towards pauciflorous inflorescences and a trend towards extreme corolla tube length, suggesting that the same ecological and selective pressures are acting upon taxa from both Madagascan lineages. Novel ecological opportunities resulting from climate-induced habitat fragmentation and corolla tube length diversification are likely to have facilitated species radiation on Madagascar. PMID:24142919

  5. Trees

    NASA Astrophysics Data System (ADS)

    Epstein, Henri

    2016-11-01

    An algebraic formalism, developed with V. Glaser and R. Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large category of space-times.

  6. ImOSM: intermittent evolution and robustness of phylogenetic methods.

    PubMed

    Thi Nguyen, Minh Anh; Gesell, Tanja; von Haeseler, Arndt

    2012-02-01

    Among the criteria to evaluate the performance of a phylogenetic method, robustness to model violation is of particular practical importance as complete a priori knowledge of evolutionary processes is typically unavailable. For studies of robustness in phylogenetic inference, a utility to add well-defined model violations to the simulated data would be helpful. We therefore introduce ImOSM, a tool to imbed intermittent evolution as model violation into an alignment. Intermittent evolution refers to extra substitutions occurring randomly on branches of a tree, thus changing alignment site patterns. This means that the extra substitutions are placed on the tree after the typical process of sequence evolution is completed. We then study the robustness of widely used phylogenetic methods: maximum likelihood (ML), maximum parsimony (MP), and a distance-based method (BIONJ) to various scenarios of model violation. Violation of rates across sites (RaS) heterogeneity and simultaneous violation of RaS and the transition/transversion ratio on two nonadjacent external branches hinder all the methods recovery of the true topology for a four-taxon tree. For an eight-taxon balanced tree, the violations cause each of the three methods to infer a different topology. Both ML and MP fail, whereas BIONJ, which calculates the distances based on the ML estimated parameters, reconstructs the true tree. Finally, we report that a test of model homogeneity and goodness of fit tests have enough power to detect such model violations. The outcome of the tests can help to actually gain confidence in the inferred trees. Therefore, we recommend using these tests in practical phylogenetic analyses.

  7. The space of phylogenetic mixtures for equivariant models

    PubMed Central

    2012-01-01

    Background The selection of an evolutionary model to best fit given molecular data is usually a heuristic choice. In his seminal book, J. Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree could be used for model selection. It remained an open question, however, whether these equations were sufficient to fully characterize the evolutionary model under consideration. Results Here we prove that, for most equivariant models of evolution, the space of distributions satisfying these linear equations coincides with the space of distributions arising from mixtures of trees. In other words, we prove that the evolution of an observed multiple sequence alignment can be modeled by a mixture of phylogenetic trees under an equivariant evolutionary model if and only if the distribution of patterns at its columns satisfies the linear equations mentioned above. Moreover, we provide a set of linearly independent equations defining this space of phylogenetic mixtures for each equivariant model and for any number of taxa. Lastly, we use these results to perform a study of identifiability of phylogenetic mixtures. Conclusions The space of phylogenetic mixtures under equivariant models is a linear space that fully characterizes the evolutionary model. We provide an explicit algorithm to obtain the equations defining these spaces for a number of models and taxa. Its implementation has proved to be a powerful tool for model selection. PMID:23190710

  8. Comparing alignment methods for inferring the history of the new world lizard genus Mabuya (Squamata: Scincidae).

    PubMed

    Whiting, Alison S; Sites, Jack W; Pellegrino, Katia C M; Rodrigues, Miguel T

    2006-03-01

    The rapid increase in the ability to generate molecular data, and the focus on model-based methods for tree reconstruction have greatly advanced the use of phylogenetics in many fields. The recent flurry of new analytical techniques has focused almost solely on tree reconstruction, whereas alignment issues have received far less attention. In this paper, we use a diverse sampling of gene regions from lizards of the genus Mabuya to compare the impact, on phylogeny estimation, of new maximum likelihood alignment algorithms with more widely used methods. Sequences aligned under different optimality criteria are analyzed using partitioned Bayesian analysis with independent models and parameter settings for each gene region, and the most strongly supported phylogenetic hypothesis is then used to test the hypothesis of two colonizations of the New World by African scincid lizards. Our results show that the consistent use of model-based methods in both alignment and tree reconstruction leads to trees with more optimal likelihood scores than the use of independent criteria in alignment and tree reconstruction. We corroborate and extend earlier evidence for two independent colonizations of South America by scincid lizards. Relationships within South American Mabuya are found to be in need of taxonomic revision, specifically complexes under the names M. heathi, M. agilis, and M. bistriata (sensu, M.T. Rodrigues, Papeis Avulsos de Zoologia 41 (2000) 313).

  9. Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies.

    PubMed

    Leaché, Adam D; Banbury, Barbara L; Felsenstein, Joseph; de Oca, Adrián Nieto-Montes; Stamatakis, Alexandros

    2015-11-01

    Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the

  10. Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies

    PubMed Central

    Leaché, Adam D.; Banbury, Barbara L.; Felsenstein, Joseph; de Oca, Adrián nieto-Montes; Stamatakis, Alexandros

    2015-01-01

    Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the

  11. Each flying fox on its own branch: a phylogenetic tree for Pteropus and related genera (Chiroptera: Pteropodidae).

    PubMed

    Almeida, Francisca C; Giannini, Norberto P; Simmons, Nancy B; Helgen, Kristofer M

    2014-08-01

    Pteropodidae is a diverse Old World family of non-echolocating, frugivorous and nectarivorous bats that includes the flying foxes (genus Pteropus) and allied genera. The subfamily Pteropodinae includes the largest living bats and is distributed across an immense geographic range from islands in East Africa to the Cook Islands of Polynesia. These bats are keystone species in their ecosystems and some carry zoonotic diseases that are increasingly a focus of interest in biomedical research. Here we present a comprehensive phylogeny for pteropodines focused on Pteropus. The analyses included 50 of the ∼63 species of Pteropus and 11 species from 7 related genera. We obtained sequences of the cytochrome b and the 12S rRNA mitochondrial genes for all species and sequences of the nuclear RAG1, vWF, and BRCA1 genes for a subsample of taxa. Some of the sequences of Pteropus were obtained from skin biopsies of museum specimens including that of an extinct species, P. tokudae. The resulting trees recovered Pteropus as monophyletic, although further work is needed to determine whether P. personatus belongs in the genus. Monophyly of the majority of traditionally-recognized Pteropus species groups was rejected, but statistical support was strong for several clades on which we based a new classification of the Pteropus species into 13 species groups. Other noteworthy results emerged regarding species status of several problematic taxa, including recognition of P. capistratus and P. ennisae as distinct species, paraphyly of the P. hypomelanus complex, and conspecific status of P. pelewensis pelewensis and P. p. yapensis. Relationships among the pteropodine genera were not completely resolved with the current dataset. Divergence time analysis suggests that Pteropus originated in the Miocene and that two independent bursts of diversification occurred in the Pleistocene in different regions of the Indo-Pacific realm.

  12. The impact of single substitutions on multiple sequence alignments.

    PubMed

    Klaere, Steffen; Gesell, Tanja; von Haeseler, Arndt

    2008-12-27

    We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.

  13. Vestige: Maximum likelihood phylogenetic footprinting

    PubMed Central

    Wakefield, Matthew J; Maxwell, Peter; Huttley, Gavin A

    2005-01-01

    selection can be evaluated both spatially (along a sequence alignment) and temporally (for each branch of the tree) providing visual indicators to the attributes and functions of DNA sequences. PMID:15921531

  14. Motif Yggdrasil: sampling sequence motifs from a tree mixture model.

    PubMed

    Andersson, Samuel A; Lagergren, Jens

    2007-06-01

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.

  15. Aligning Biomolecular Networks Using Modular Graph Kernels

    NASA Astrophysics Data System (ADS)

    Towfic, Fadi; Greenlee, M. Heather West; Honavar, Vasant

    Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. We explore a class of algorithms for aligning large biomolecular networks by breaking down such networks into subgraphs and computing the alignment of the networks based on the alignment of their subgraphs. The resulting subnetworks are compared using graph kernels as scoring functions. We provide implementations of the resulting algorithms as part of BiNA, an open source biomolecular network alignment toolkit. Our experiments using Drosophila melanogaster, Saccharomyces cerevisiae, Mus musculus and Homo sapiens protein-protein interaction networks extracted from the DIP repository of protein-protein interaction data demonstrate that the performance of the proposed algorithms (as measured by % GO term enrichment of subnetworks identified by the alignment) is competitive with some of the state-of-the-art algorithms for pair-wise alignment of large protein-protein interaction networks. Our results also show that the inter-species similarity scores computed based on graph kernels can be used to cluster the species into a species tree that is consistent with the known phylogenetic relationships among the species.

  16. Verification of phylogenetic inference programs using metamorphic testing.

    PubMed

    Sadi, Md Shaik; Kuo, Fei-Ching; Ho, Joshua W K; Charleston, Michael A; Chen, T Y

    2011-12-01

    Many phylogenetic inference programs are available to infer evolutionary relationships among taxa using aligned sequences of characters, typically DNA or amino acids. These programs are often used to infer the evolutionary history of species. However, in most cases it is impossible to systematically verify the correctness of the tree returned by these programs, as the correct evolutionary history is generally unknown and unknowable. In addition, it is nearly impossible to verify whether any non-trivial tree is correct in accordance to the specification of the often complicated search and scoring algorithms. This difficulty is known as the oracle problem of software testing: there is no oracle that we can use to verify the correctness of the returned tree. This makes it very challenging to test the correctness of any phylogenetic inference programs. Here, we demonstrate how to apply a simple software testing technique, called Metamorphic Testing, to alleviate the oracle problem in testing phylogenetic inference programs. We have used both real and randomly generated test inputs to evaluate the effectiveness of metamorphic testing, and found that metamorphic testing can detect failures effectively in faulty phylogenetic inference programs with both types of test inputs.

  17. Rooting the eutherian tree: the power and pitfalls of phylogenomics

    PubMed Central

    Nishihara, Hidenori; Okada, Norihiro; Hasegawa, Masami

    2007-01-01

    Background Ongoing genome sequencing projects have led to a phylogenetic approach based on genome-scale data (phylogenomics), which is beginning to shed light on longstanding unresolved phylogenetic issues. The use of large datasets in phylogenomic analysis results in a global increase in resolution due to a decrease in sampling error. However, a fully resolved tree can still be wrong if the phylogenetic inference is biased. Results Here, in an attempt to root the eutherian tree using genome-scale data with the maximum likelihood method, we demonstrate a case in which a concatenate analysis strongly supports a putatively wrong tree, whereas the total evaluation of separate analyses of different genes grossly reduced the bias of the phylogenetic inference. A conventional method of concatenate analysis of nucleotide sequences from our dataset, which includes a more than 1 megabase alignment of 2,789 nuclear genes, suggests a misled monophyly of Afrotheria (for example, elephant) and Xenarthra (for example, armadillo) with 100% bootstrap probability. However, this tree is not supported by our 'separate method', which takes into account the different tempos and modes of evolution among genes, and instead the basal Afrotheria tree is favored. Conclusion Our analysis demonstrates that in cases in which there is great variation in evolutionary features among different genes, the separate model, rather than the concatenate model, should be used for phylogenetic inference, especially in genome-scale data. PMID:17883877

  18. Speciation process of Salvia isensis (Lamiaceae), a species endemic to serpentine areas in the Ise-Tokai district, Japan, from the viewpoint of the contradictory phylogenetic trees generated from chloroplast and nuclear DNA.

    PubMed

    Sudarmono; Okada, Hiroshi

    2007-07-01

    To understand the speciation process of Salvia isensis (Lamiaceae), a species endemic to a special environment (serpentine areas in the Ise-Tokai district, central Honshu, Japan), chloroplast DNA (cpDNA) and nuclear ribosomal DNA (nrDNA) were employed to analyze the phylogenetic relationships of S. isensis with related species in Japan. Allozymic polymorphisms were also used to analyze genetic relationships among Salvia species. A contradiction in the phylogenetic positions of species studied was detected when phylogenetic trees were constructed using cpDNA or nrDNA, i.e., S. isensis was a sister to the other species in phylogenetic trees generated from cpDNA, while S. japonica was a sister to the other species in the case of nrDNA. Genetic relationships between Salvia species estimated from allozymic polymorphisms did not contradict to the topology for nrDNA. Using the present results, the speciation process of S. isensis is discussed with regard to introgressive gene exchanges between related species.

  19. Diversity of a ribonucleoprotein family in tobacco chloroplasts: two new chloroplast ribonucleoproteins and a phylogenetic tree of ten chloroplast RNA-binding domains.

    PubMed Central

    Ye, L H; Li, Y Q; Fukami-Kobayashi, K; Go, M; Konishi, T; Watanabe, A; Sugiura, M

    1991-01-01

    Two new ribonucleoproteins (RNPs) have been identified from a tobacco chloroplast lysate. These two proteins (cp29A and cp29B) are nuclear-encoded and have a less affinity to single-stranded DNA as compared with three other chloroplast RNPs (cp28, cp31 and cp33) previously isolated. DNA sequencing revealed that both contain two consensus sequence-type homologous RNA-binding domains (CS-RBDs) and a very acidic amino-terminal domain but shorter than that of cp28, cp31 and cp33. Comparison of cp29A and cp29B showed a 19 amino acid insertion in the region separating the two CS-RBDs in cp29B. This insertion results in three tandem repeats of a glycine-rich sequence of 10 amino acids, which is a novel feature in RNPs. The two proteins are encoded by different single nuclear genes and no alternatively spliced transcripts could be identified. We constructed a phylogenetic tree for the ten chloroplast CS-RBDs. These results suggest that there is a sizable RNP family in chloroplasts and the diversity was mainly generated through a series of gene duplications rather than through alternative pre-mRNA splicing. The gene for cp29B contains three introns. The first and second introns interrupt the first CS-RBD and the third intron does the second CS-RBD. The position of the first intron site is the same as that in the human hnRNP A1 protein gene. Images PMID:1721701

  20. k-merSNP discovery: Software for alignment-and reference-free scalable SNP discovery, phylogenetics, and annotation for hundreds of microbial genomes

    SciTech Connect

    2014-11-18

    With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.

  1. Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer

    PubMed Central

    Grishin, Nick V.; Otwinowski, Zbyszek

    2016-01-01

    Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz. PMID:27336403

  2. Make Your Own Phylogenetic Tree

    ERIC Educational Resources Information Center

    Rau, Gerald

    2012-01-01

    Molecular similarity is one of the strongest lines of evidence for evolution--and one of the most difficult for students to grasp. That is because the underlying observations--that identical mutations are found in closely related species and the degree of similarity decreases with evolutionary distance--are not visible to the human eye. And it's…

  3. Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments

    PubMed Central

    Pollard, Daniel A; Moses, Alan M; Iyer, Venky N; Eisen, Michael B

    2006-01-01

    Background Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. Results Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. Conclusion Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how

  4. Classifying and counting linear phylogenetic invariants for the Jukes-Cantor model.

    PubMed

    Steel, M A; Fu, Y X

    1995-01-01

    Linear invariants are useful tools for testing phylogenetic hypotheses from aligned DNA/RNA sequences, particularly when the sites evolve at different rates. Here we give a simple, graph theoretic classification for each phylogenetic tree T, of its associated vector space I(T) of linear invariants under the Jukes-Cantor one-parameter model of nucleotide substitution. We also provide an easily described basis for I(T), and show that if I is a binary (fully resolved) phylogenetic tree with n sequences at its leaves then: dim[I(T)] = 4n-F2n-2 where Fn is the nth Fibonacci number. Our method applies a recently developed Hadamard matrix-based technique to describe elements of I(T) in terms of edge-disjoint packings of subtrees in T, and thereby complements earlier more algebraic treatments.

  5. STBase: one million species trees for comparative biology.

    PubMed

    McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed

  6. Phylogenetically resolving epidemiologic linkage

    PubMed Central

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-01-01

    Although the use of phylogenetic trees in epidemiological investigations has become commonplace, their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the true transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. We confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results. PMID:26903617

  7. Phylogenetically resolving epidemiologic linkage

    SciTech Connect

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-02-22

    The use of phylogenetic trees in epidemiological investigations has become commonplace, but their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the true transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. Moreover, we confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results.

  8. Phylogenetically resolving epidemiologic linkage

    DOE PAGES

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-02-22

    The use of phylogenetic trees in epidemiological investigations has become commonplace, but their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the truemore » transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. Moreover, we confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results.« less

  9. Biochemical and structural characterizations of two Dictyostelium cellobiohydrolases from the amoebozoa kingdom reveal a high level of conservation between distant phylogenetic trees of life

    SciTech Connect

    Hobdey, Sarah E.; Knott, Brandon C.; Momeni, Majid Haddad; Taylor, II, Larry E.; Borisova, Anna S.; Podkaminer, Kara K.; VanderWall, Todd A.; Himmel, Michael E.; Decker, Stephen R.; Beckham, Gregg T.; Stahlberg, Jerry

    2016-04-01

    Glycoside hydrolase family 7 (GH7) cellobiohydrolases (CBHs) are enzymes often employed in plant cell wall degradation across eukaryotic kingdoms of life, as they provide significant hydrolytic potential in cellulose turnover. To date, many fungal GH7 CBHs have been examined, yet many questions regarding structure-activity relationships in these important natural and commercial enzymes remain. Here, we present the crystal structures and a biochemical analysis of two GH7 CBHs from social amoeba: Dictyostelium discoideum Cel7A (DdiCel7A) and Dictyostelium purpureum Cel7A (DpuCel7A). DdiCel7A and DpuCel7A natively consist of a catalytic domain and do not exhibit a carbohydrate-binding module (CBM). The structures of DdiCel7A and DpuCel7A, resolved to 2.1 Å and 2.7 Å, respectively, are homologous to those of other GH7 CBHs with an enclosed active-site tunnel. Two primary differences between the Dictyostelium CBHs and the archetypal model GH7 CBH, Trichoderma reesei Cel7A (TreCel7A), occur near the hydrolytic active site and the product-binding sites. To compare the activities of these enzymes with the activity of TreCel7A, the family 1 TreCel7A CBM and linker were added to the C terminus of each of the Dictyostelium enzymes, creating DdiCel7ACBM and DpuCel7ACBM, which were recombinantly expressed in T. reesei. DdiCel7ACBM and DpuCel7ACBM hydrolyzed Avicel, pretreated corn stover, and phosphoric acid-swollen cellulose as efficiently as TreCel7A when hydrolysis was compared at their temperature optima. The Ki of cellobiose was significantly higher for DdiCel7ACBM and DpuCel7ACBM than for TreCel7A: 205, 130, and 29 μM, respectively. Finally, taken together, the present study highlights the remarkable degree of conservation of the activity of these key natural and industrial enzymes across quite distant phylogenetic trees of life.

  10. Biochemical and structural characterizations of two Dictyostelium cellobiohydrolases from the amoebozoa kingdom reveal a high level of conservation between distant phylogenetic trees of life

    DOE PAGES

    Hobdey, Sarah E.; Knott, Brandon C.; Momeni, Majid Haddad; ...

    2016-04-01

    Glycoside hydrolase family 7 (GH7) cellobiohydrolases (CBHs) are enzymes often employed in plant cell wall degradation across eukaryotic kingdoms of life, as they provide significant hydrolytic potential in cellulose turnover. To date, many fungal GH7 CBHs have been examined, yet many questions regarding structure-activity relationships in these important natural and commercial enzymes remain. Here, we present the crystal structures and a biochemical analysis of two GH7 CBHs from social amoeba: Dictyostelium discoideum Cel7A (DdiCel7A) and Dictyostelium purpureum Cel7A (DpuCel7A). DdiCel7A and DpuCel7A natively consist of a catalytic domain and do not exhibit a carbohydrate-binding module (CBM). The structures of DdiCel7Amore » and DpuCel7A, resolved to 2.1 Å and 2.7 Å, respectively, are homologous to those of other GH7 CBHs with an enclosed active-site tunnel. Two primary differences between the Dictyostelium CBHs and the archetypal model GH7 CBH, Trichoderma reesei Cel7A (TreCel7A), occur near the hydrolytic active site and the product-binding sites. To compare the activities of these enzymes with the activity of TreCel7A, the family 1 TreCel7A CBM and linker were added to the C terminus of each of the Dictyostelium enzymes, creating DdiCel7ACBM and DpuCel7ACBM, which were recombinantly expressed in T. reesei. DdiCel7ACBM and DpuCel7ACBM hydrolyzed Avicel, pretreated corn stover, and phosphoric acid-swollen cellulose as efficiently as TreCel7A when hydrolysis was compared at their temperature optima. The Ki of cellobiose was significantly higher for DdiCel7ACBM and DpuCel7ACBM than for TreCel7A: 205, 130, and 29 μM, respectively. Finally, taken together, the present study highlights the remarkable degree of conservation of the activity of these key natural and industrial enzymes across quite distant phylogenetic trees of life.« less

  11. Efficient exploration of the space of reconciled gene trees.

    PubMed

    Szöllõsi, Gergely J; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent

    2013-11-01

    Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree-species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source

  12. Evolutionary Phylogenetic Networks: Models and Issues

    NASA Astrophysics Data System (ADS)

    Nakhleh, Luay

    Phylogenetic networks are special graphs that generalize phylogenetic trees to allow for modeling of non-treelike evolutionary histories. The ability to sequence multiple genetic markers from a set of organisms and the conflicting evolutionary signals that these markers provide in many cases, have propelled research and interest in phylogenetic networks to the forefront in computational phylogenetics. Nonetheless, the term 'phylogenetic network' has been generically used to refer to a class of models whose core shared property is tree generalization. Several excellent surveys of the different flavors of phylogenetic networks and methods for their reconstruction have been written recently. However, unlike these surveys, this chapte focuses specifically on one type of phylogenetic networks, namely evolutionary phylogenetic networks, which explicitly model reticulate evolutionary events. Further, this chapter focuses less on surveying existing tools, and addresses in more detail issues that are central to the accurate reconstruction of phylogenetic networks.

  13. Biochemical and Structural Characterizations of Two Dictyostelium Cellobiohydrolases from the Amoebozoa Kingdom Reveal a High Level of Conservation between Distant Phylogenetic Trees of Life

    PubMed Central

    Hobdey, Sarah E.; Knott, Brandon C.; Haddad Momeni, Majid; Taylor, Larry E.; Borisova, Anna S.; Podkaminer, Kara K.; VanderWall, Todd A.; Himmel, Michael E.; Decker, Stephen R.

    2016-01-01

    ABSTRACT Glycoside hydrolase family 7 (GH7) cellobiohydrolases (CBHs) are enzymes commonly employed in plant cell wall degradation across eukaryotic kingdoms of life, as they provide significant hydrolytic potential in cellulose turnover. To date, many fungal GH7 CBHs have been examined, yet many questions regarding structure-activity relationships in these important natural and commercial enzymes remain. Here, we present the crystal structures and a biochemical analysis of two GH7 CBHs from social amoeba: Dictyostelium discoideum Cel7A (DdiCel7A) and Dictyostelium purpureum Cel7A (DpuCel7A). DdiCel7A and DpuCel7A natively consist of a catalytic domain and do not exhibit a carbohydrate-binding module (CBM). The structures of DdiCel7A and DpuCel7A, resolved to 2.1 Å and 2.7 Å, respectively, are homologous to those of other GH7 CBHs with an enclosed active-site tunnel. Two primary differences between the Dictyostelium CBHs and the archetypal model GH7 CBH, Trichoderma reesei Cel7A (TreCel7A), occur near the hydrolytic active site and the product-binding sites. To compare the activities of these enzymes with the activity of TreCel7A, the family 1 TreCel7A CBM and linker were added to the C terminus of each of the Dictyostelium enzymes, creating DdiCel7ACBM and DpuCel7ACBM, which were recombinantly expressed in T. reesei. DdiCel7ACBM and DpuCel7ACBM hydrolyzed Avicel, pretreated corn stover, and phosphoric acid-swollen cellulose as efficiently as TreCel7A when hydrolysis was compared at their temperature optima. The Ki of cellobiose was significantly higher for DdiCel7ACBM and DpuCel7ACBM than for TreCel7A: 205, 130, and 29 μM, respectively. Taken together, the present study highlights the remarkable degree of conservation of the activity of these key natural and industrial enzymes across quite distant phylogenetic trees of life. IMPORTANCE GH7 CBHs are among the most important cellulolytic enzymes both in nature and for emerging industrial applications for

  14. ALFRED: A Practical Method for Alignment-Free Distance Computation.

    PubMed

    Thankachan, Sharma V; Chockalingam, Sriram P; Liu, Yongchao; Apostolico, Alberto; Aluru, Srinivas

    2016-06-01

    Alignment-free approaches are gaining persistent interest in many sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, especially for large-scale sequence datasets. Besides the widely used k-mer methods, the average common substring (ACS) approach has emerged to be one of the well-known alignment-free approaches. Two recent works further generalize this ACS approach by allowing a bounded number k of mismatches in the common substrings, relying on approximation (linear time) and exact computation, respectively. Albeit having a good worst-case time complexity [Formula: see text], the exact approach is complex and unlikely to be efficient in practice. Herein, we present ALFRED, an alignment-free distance computation method, which solves the generalized common substring search problem via exact computation. Compared to the theoretical approach, our algorithm is easier to implement and more practical to use, while still providing highly competitive theoretical performances with an expected run-time of [Formula: see text]. By applying our program to phylogenetic inference as a case study, we find that our program facilitates to exactly reconstruct the topology of the reference phylogenetic tree for a set of 27 primate mitochondrial genomes, at reasonably acceptable speed. ALFRED is implemented in C++ programming language and the source code is freely available online.

  15. Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts

    NASA Technical Reports Server (NTRS)

    Woese, C. R.; Achenbach, L.; Rouviere, P.; Mandelco, L.

    1991-01-01

    A major and too little recognized source of artifact in phylogenetic analysis of molecular sequence data is compositional difference among sequences. The problem becomes particularly acute when alignments contain ribosomal RNAs from both mesophilic and thermophilic species. Among prokaryotes the latter are considerably higher in G + C content than the former, which often results in artificial clustering of thermophilic lineages and their being placed artificially deep in phylogenetic trees. In this communication we review archaeal phylogeny in the light of this consideration, focusing in particular on the phylogenetic position of the sulfate reducing species Archaeoglobus fulgidus, using both 16S rRNA and 23S rRNA sequences. The analysis shows clearly that the previously reported deep branching of the A. fulgidus lineage (very near the base of the euryarchaeal side of the archaeal tree) is incorrect, and that the lineage actually groups with a previously recognized unit that comprises the Methanomicrobiales and extreme halophiles.

  16. An exploration of how to define and measure the evolution of behavior, learning, memory and mind across the full phylogenetic tree of life

    PubMed Central

    Eisenstein, E. M.; Eisenstein, D. L.; Sarma, J. S. M.

    2016-01-01

    ABSTRACT There are probably few terms in evolutionary studies regarding neuroscience issues that are used more frequently than ‘behavior', ‘learning', ‘memory', and ‘mind'. Yet there are probably as many different meanings of these terms as there are users of them. Further, investigators in such studies, while recognizing the full phylogenetic spectrum of life and the evolution of these phenomena, rarely go beyond mammals and other vertebrates in their investigations; invertebrates are sometimes included. What is rarely taken into consideration, though, is that to fully understand the evolution and significance for survival of these phenomena across phylogeny, it is essential that they be measured and compared in the same units of measurement across the full phylogenetic spectrum from aneural bacteria and protozoa to humans. This paper explores how these terms are generally used as well as how they might be operationally defined and measured to facilitate uniform examination and comparisons across the full phylogenetic spectrum of life. This paper has 2 goals: (1) to provide models for measuring the evolution of ‘behavior' and its changes across the full phylogenetic spectrum, and (2) to explain why ‘mind phenomena' cannot be measured scientifically at the present time. PMID:27489578

  17. Molecular phylogenetics before sequences: oligonucleotide catalogs as k-mer spectra.

    PubMed

    Ragan, Mark A; Bernard, Guillaume; Chan, Cheong Xin

    2014-01-01

    From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D 2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today.

  18. Y-chromosome Short Tandem Repeat Intermediate Variant Alleles DYS392.2, DYS449.2, and DYS385.2 Delineate New Phylogenetic Substructure in Human Y-chromosome Haplogroup Tree

    PubMed Central

    Myres, Natalie M.; Ritchie, Kathleen H.; Lin, Alice A.; Hughes, Robert H.; Woodward, Scott R.; Underhill, Peter A.

    2009-01-01

    Aim To determine the human Y-chromosome haplogroup backgrounds of intermediate-sized variant alleles displayed by short tandem repeat (STR) loci DYS392, DYS449, and DYS385, and to evaluate the potential of each intermediate variant to elucidate new phylogenetic substructure within the human Y-chromosome haplogroup tree. Methods Molecular characterization of lineages was achieved using a combination of Y-chromosome haplogroup defining binary polymorphisms and up to 37 short tandem repeat loci. DNA sequencing and median-joining network analyses were used to evaluate Y-chromosome lineages displaying intermediate variant alleles. Results We show that DYS392.2 occurs on a single haplogroup background, specifically I1*-M253, and likely represents a new phylogenetic subdivision in this European haplogroup. Intermediate variants DYS449.2 and DYS385.2 both occur on multiple haplogroup backgrounds, and when evaluated within specific haplogroup contexts, delineate new phylogenetic substructure, with DYS449.2 being informative within haplogroup A-P97 and DYS385.2 in haplogroups D-M145, E1b1a-M2, and R1b*-M343. Sequence analysis of variant alleles observed within the various haplogroup backgrounds showed that the nature of the intermediate variant differed, confirming the mutations arose independently. Conclusions Y-chromosome short tandem repeat intermediate variant alleles, while relatively rare, typically occur on multiple haplogroup backgrounds. This distribution indicates that such mutations arise at a rate generally intermediate to those of binary markers and Y-STR loci. As a result, intermediate-sized Y-STR variants can reveal phylogenetic substructure within the Y-chromosome phylogeny not currently detected by either binary or Y-STR markers alone, but only when such variants are evaluated within a haplogroup context. PMID:19480020

  19. Bayesian modelling of compositional heterogeneity in molecular phylogenetics.

    PubMed

    Heaps, Sarah E; Nye, Tom M W; Boys, Richard J; Williams, Tom A; Embley, T Martin

    2014-10-01

    In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.

  20. Treeness triangles: visualizing the loss of phylogenetic signal.

    PubMed

    White, W T; Hills, S F; Gaddam, R; Holland, B R; Penny, David

    2007-09-01

    It is well known that molecular data "saturates" with increasing sequence divergence (thereby losing phylogenetic information) and that in addition the accumulation of misleading information due to chance similarities or to systematic bias may accompany saturation as well. Exploratory data analysis methods that can quantify the extent of signal loss or convergence for a given data set are scarce. Such methods are needed because genomics delivers very long sequence alignments spanning substantial phylogenetic depth, where site saturation may be compounded by systematic biases or other alternative signals. Here we introduce the Treeness Triangle (TT) graph, in which signals detectable by Hadamard (spectral) analysis are summed into 3 categories--those supporting 1) external and 2) internal branches in the optimal tree, in addition to 3) the residuals (potential internal branches not present in the optimal tree). These 3 values are plotted in a standard ternary coordinate system. The approach is illustrated with simulated and real data sets, the latter from complete chloroplast genomes, where potential problems of paralogy or lateral gene acquisition can be excluded. The TT uncovers the divergence-dependent loss of phylogenetic signal as subsets of chloroplast genomes are investigated that span increasingly deeper evolutionary timescales. The rate of signal loss (or signal retention) varies with the gene and/or the method of analysis.

  1. Phylogenetic analysis of uroporphyrinogen III synthase (UROS) gene.

    PubMed

    Shaik, Abjal Pasha; Alsaeed, Abbas H; Sultana, Asma

    2012-01-01

    The uroporphyrinogen III synthase (UROS) enzyme (also known as hydroxymethylbilane hydrolyase) catalyzes the cyclization of hydroxymethylbilane to uroporphyrinogen III during heme biosynthesis. A deficiency of this enzyme is associated with the very rare Gunther's disease or congenital erythropoietic porphyria, an autosomal recessive inborn error of metabolism. The current study investigated the possible role of UROS (Homo sapiens [EC: 4.2.1.75; 265 aa; 1371 bp mRNA; Entrez Pubmed ref NP_000366.1, NM_000375.2]) in evolution by studying the phylogenetic relationship and divergence of this gene using computational methods. The UROS protein sequences from various taxa were retrieved from GenBank database and were compared using Clustal-W (multiple sequence alignment) with defaults and a first-pass phylogenetic tree was built using neighbor-joining method as in DELTA BLAST 2.2.27+ version. A total of 163 BLAST hits were found for the uroporphyrinogen III synthase query sequence and these hits showed putative conserved domain, HemD superfamily (as on 14(th) Nov 2012). We then narrowed down the search by manually deleting the proteins which were not UROS sequences and sequences belonging to phyla other than Chordata were deleted. A repeat phylogenetic analysis of 39 taxa was performed using PhyML and TreeDyn software to confirm that UROS is a highly conserved protein with approximately 85% conserved sequences in almost all chordate taxons emphasizing its importance in heme synthesis.

  2. Sequence and phylogenetic analysis of M-class genome segments of novel duck reovirus NP03

    PubMed Central

    Wang, Shao; Chen, Shilong; Cheng, Xiaoxia; Chen, Shaoying; Lin, FengQiang; Jiang, Bing; Zhu, Xiaoli; Li, Zhaolong; Wang, Jinxiang

    2015-01-01

    We report the sequence and phylogenetic analysis of the entire M1, M2, and M3 genome segments of the novel duck reovirus (NDRV) NP03. Alignment between the newly determined nucleotide sequences as well as their deduced amino acid sequences and the published sequences of avian reovirus (ARV) was carried out with DNASTAR software. Sequence comparison showed that the M2 gene had the most variability among the M-class genes of DRV. Phylogenetic analysis of the M-class genes of ARV strains revealed different lineages and clusters within DRVs. The 5 NDRV strains used in this study fall into a well-supported lineage that includes chicken ARV strains, whereas Muscovy DRV (MDRV) strains are separate from NDRV strains and form a distinct genetic lineage in the M2 gene tree. However, the MDRV and NDRV strains are closely related and located in a common lineage in the M1 and M3 gene trees, respectively. PMID:25852231

  3. Phylogenetic relationships among Agamid lizards of the Laudakia caucasia species group: testing hypotheses of biogeographic fragmentation and an area cladogram for the Iranian Plateau.

    PubMed

    Macey, J R; Schulte, J A; Ananjeva, N B; Larson, A; Rastegar-Pouyani, N; Shammakov, S M; Papenfuss, T J

    1998-08-01

    Phylogenetic relationships within the Laudakia caucasia species group on the Iranian Plateau were investigated using 1708 aligned bases of mitochondrial DNA sequence from the genes encoding ND1 (subunit one of NADH dehydrogenase), tRNAGln, tRNAIle, tRNAMet, ND2, tRNATrp, tRNAAla, tRNAAsn, tRNACys, tRNATyr, and COI (subunit I of cytochrome c oxidase). The aligned sequences contain 207 phylogenetically informative characters. Three hypotheses for historical fragmentation of Laudakia populations on the Iranian Plateau were tested. In two hypotheses, fragmentation of populations is suggested to have proceeded along continuous mountain belts that surround the Iranian Plateau. In another hypothesis, fragmentation is suggested to have resulted from a north-south split caused by uplifting of the Zagros Mountains in the late Miocene or early Pliocene [5-10 MYBP (million years before present)]. The shortest tree suggest the later hypothesis, and statistical tests reject the other two hypothesis. The phylogenetic tree is exceptional in that every branch is well supported. Geologic history provides dates for most branches of the tree. A plot of DNA substitutions against dates from geologic history refines the date for the north-south split across the Iranian Plateau to 9 MYBP (late Miocene). The rate of evolution for this segment of mtDNA is 0.65% (0.61-0.70%) change per lineage per million years. A hypothesis of area relationships for the biota of the Iranian Plateau is generated from the phylogenetic tree.

  4. Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments

    SciTech Connect

    Pollard, Daniel A.; Moses, Alan M.; Iyer, Venky N.; Eisen,Michael B.

    2006-08-14

    Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and least accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and

  5. Molecular identification and phylogenetic study of Demodex caprae.

    PubMed

    Zhao, Ya-E; Cheng, Juan; Hu, Li; Ma, Jun-Xian

    2014-10-01

    The DNA barcode has been widely used in species identification and phylogenetic analysis since 2003, but there have been no reports in Demodex. In this study, to obtain an appropriate DNA barcode for Demodex, molecular identification of Demodex caprae based on mitochondrial cox1 was conducted. Firstly, individual adults and eggs of D. caprae were obtained for genomic DNA (gDNA) extraction; Secondly, mitochondrial cox1 fragment was amplified, cloned, and sequenced; Thirdly, cox1 fragments of D. caprae were aligned with those of other Demodex retrieved from GenBank; Finally, the intra- and inter-specific divergences were computed and the phylogenetic trees were reconstructed to analyze phylogenetic relationship in Demodex. Results obtained from seven 429-bp fragments of D. caprae showed that sequence identities were above 99.1% among three adults and four eggs. The intraspecific divergences in D. caprae, Demodex folliculorum, Demodex brevis, and Demodex canis were 0.0-0.9, 0.5-0.9, 0.0-0.2, and 0.0-0.5%, respectively, while the interspecific divergences between D. caprae and D. folliculorum, D. canis, and D. brevis were 20.3-20.9, 21.8-23.0, and 25.0-25.3, respectively. The interspecific divergences were 10 times higher than intraspecific ones, indicating considerable barcoding gap. Furthermore, the phylogenetic trees showed that four Demodex species gathered separately, representing independent species; and Demodex folliculorum gathered with canine Demodex, D. caprae, and D. brevis in sequence. In conclusion, the selected 429-bp mitochondrial cox1 gene is an appropriate DNA barcode for molecular classification, identification, and phylogenetic analysis of Demodex. D. caprae is an independent species and D. folliculorum is closer to D. canis than to D. caprae or D. brevis.

  6. The phylogenetic likelihood library.

    PubMed

    Flouri, T; Izquierdo-Carrasco, F; Darriba, D; Aberer, A J; Nguyen, L-T; Minh, B Q; Von Haeseler, A; Stamatakis, A

    2015-03-01

    We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2-10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required). The PLL is available at http://www.libpll.org under the GNU General Public License (GPL).

  7. arb_tree_32

    SciTech Connect

    Bavykin, Sergey; Alferov, Oleg

    2006-08-01

    The purpose of this program is to generate probes specific for the group of sequences that belong to a given phylogenetic node. For each node of the input tree, this program selects probes that are positive for all sequences that belong to this node and negative for all that doesn't. The program uses condensed tree for probe representation to save computer memory. As a result of calculation, the program prints lists for each node from the tree. Input file formats: FASTA for sequence database and ARB tree for phylogenetic organization of nodes. Output file format: text file.

  8. SUNPLIN: Simulation with Uncertainty for Phylogenetic Investigations

    PubMed Central

    2013-01-01

    Background Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability. Results In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/. Conclusion We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets. PMID:24229408

  9. Point estimates in phylogenetic reconstructions

    PubMed Central

    Benner, Philipp; Bačák, Miroslav; Bourguignon, Pierre-Yves

    2014-01-01

    Motivation: The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the space of phylogenetic trees, and ad hoc methods such as the derivation of a consensus tree makeup for the ill-definition of the usual concepts of posterior mean, while bootstrap methods mitigate the absence of a sound concept of variance. Yielding satisfactory results with sufficiently concentrated posterior distributions, such methods fall short of providing a faithful summary of posterior distributions if the data do not offer compelling evidence for a single topology. Results: Building upon previous work of Billera et al., summary statistics such as sample mean, median and variance are defined as the geometric median, Fréchet mean and variance, respectively. Their computation is enabled by recently published works, and embeds an algorithm for computing shortest paths in the space of trees. Studying the phylogeny of a set of plants, where several tree topologies occur in the posterior sample, the posterior mean balances correctly the contributions from the different topologies, where a consensus tree would be biased. Comparisons of the posterior mean, median and consensus trees with the ground truth using simulated data also reveals the benefits of a sound averaging method when reconstructing phylogenetic trees. Availability and implementation: We provide two independent implementations of the algorithm for computing Fréchet means, geometric medians and variances in the space of phylogenetic trees. TFBayes: https://github.com/pbenner/tfbayes, TrAP: https://github.com/bacak/TrAP. Contact: philipp.benner@mis.mpg.de PMID:25161244

  10. Phylogenics & Tree-Thinking

    ERIC Educational Resources Information Center

    Baum, David A.; Offner, Susan

    2008-01-01

    Phylogenetic trees, which are depictions of the inferred evolutionary relationships among a set of species, now permeate almost all branches of biology and are appearing in increasing numbers in biology textbooks. While few state standards explicitly require knowledge of phylogenetics, most require some knowledge of evolutionary biology, and many…

  11. pHMM-tree: phylogeny of profile hidden Markov models.

    PubMed

    Huo, Luyang; Zhang, Han; Huo, Xueting; Yang, Yasong; Li, Xueqiong; Yin, Yanbin

    2017-01-05

    Protein families are often represented by profile hidden Markov models (pHMMs). Homology between two distant protein families can be determined by comparing the pHMMs. Here we explored the idea of building a phylogeny of protein families using the distance matrix of their pHMMs. We developed a new software and web server (pHMM-tree) to allow four major types of inputs: (i) multiple pHMM files, (ii) multiple aligned protein sequence files, (iii) mixture of pHMM and aligned sequence files and (iv) unaligned protein sequences in a single file. The output will be a pHMM phylogeny of different protein families delineating their relationships. We have applied pHMM-tree to build phylogenies for CAZyme (carbohydrate active enzyme) classes and Pfam clans, which attested its usefulness in the phylogenetic representation of the evolutionary relationship among distant protein families.

  12. Graphics processing unit-based alignment of protein interaction networks.

    PubMed

    Xie, Jiang; Zhou, Zhonghua; Ma, Jin; Xiang, Chaojuan; Nie, Qing; Zhang, Wu

    2015-08-01

    Network alignment is an important bridge to understanding human protein-protein interactions (PPIs) and functions through model organisms. However, the underlying subgraph isomorphism problem complicates and increases the time required to align protein interaction networks (PINs). Parallel computing technology is an effective solution to the challenge of aligning large-scale networks via sequential computing. In this study, the typical Hungarian-Greedy Algorithm (HGA) is used as an example for PIN alignment. The authors propose a HGA with 2-nearest neighbours (HGA-2N) and implement its graphics processing unit (GPU) acceleration. Numerical experiments demonstrate that HGA-2N can find alignments that are close to those found by HGA while dramatically reducing computing time. The GPU implementation of HGA-2N optimises the parallel pattern, computing mode and storage mode and it improves the computing time ratio between the CPU and GPU compared with HGA when large-scale networks are considered. By using HGA-2N in GPUs, conserved PPIs can be observed, and potential PPIs can be predicted. Among the predictions based on 25 common Gene Ontology terms, 42.8% can be found in the Human Protein Reference Database. Furthermore, a new method of reconstructing phylogenetic trees is introduced, which shows the same relationships among five herpes viruses that are obtained using other methods.

  13. Building phylogenomic tree with N-gram contrast value vector

    NASA Astrophysics Data System (ADS)

    Kheng, Goh Yong; Weng, Lim Foo; Ling, Leo Yean

    2013-09-01

    Traditional phylogenetic tree are build based on the alignment of fragment of annotated sequences of the interested genomes. In this paper, we will present an alternative algorithm of building phylogenetic tree based on the characteristic "signature" word usage in genomes analogous to the natural language processing. Here, we apply statistical N-gram analysis on whole genome sequences of several organisms. We calculated the occurrences of difference N-grams and the contrast value and departure values, which are the deviations of N-grams occurrences from their expectations, for every chromosome of 28 organisms. It could be shown that a few particular genome N-grams are found in abundance in one organism but occurring very rarely in other organisms, there by serving as genome signatures. Later, we consolidate the signature information on each organism to a features vector that consists of the average of the contrast value and departure values for 2-gram, 3-gram, 4-gram and 5-gram. From the features vector, we build the phylogenetic tree using correlations as the similarity measures, we could reproduce the taxonomy tree of 28 organisms.

  14. Insights into the phylogenetic positions of photosynthetic bacteria obtained from 5S rRNA and 16S rRNA sequence data

    NASA Technical Reports Server (NTRS)

    Fox, G. E.

    1985-01-01

    Comparisons of complete 16S ribosomal ribonucleic acid (rRNA) sequences established that the secondary structure of these molecules is highly conserved. Earlier work with 5S rRNA secondary structure revealed that when structural conservation exists the alignment of sequences is straightforward. The constancy of structure implies minimal functional change. Under these conditions a uniform evolutionary rate can be expected so that conditions are favorable for phylogenetic tree construction.

  15. Diversity Measures in Environmental Sequences Are Highly Dependent on Alignment Quality—Data from ITS and New LSU Primers Targeting Basidiomycetes

    PubMed Central

    Fischer, Christiane; Daniel, Rolf; Wubet, Tesfaye

    2012-01-01

    The ribosomal DNA comprised of the ITS1-5.8S-ITS2 regions is widely used as a fungal marker in molecular ecology and systematics but cannot be aligned with confidence across genetically distant taxa. In order to study the diversity of Agaricomycotina in forest soils, we designed primers targeting the more alignable 28S (LSU) gene, which should be more useful for phylogenetic analyses of the detected taxa. This paper compares the performance of the established ITS1F/4B primer pair, which targets basidiomycetes, to that of two new pairs. Key factors in the comparison were the diversity covered, off-target amplification, rarefaction at different Operational Taxonomic Unit (OTU) cutoff levels, sensitivity of the method used to process the alignment to missing data and insecure positional homology, and the congruence of monophyletic clades with OTU assignments and BLAST-derived OTU names. The ITS primer pair yielded no off-target amplification but also exhibited the least fidelity to the expected phylogenetic groups. The LSU primers give complementary pictures of diversity, but were more sensitive to modifications of the alignment such as the removal of difficult-to align stretches. The LSU primers also yielded greater numbers of singletons but also had a greater tendency to produce OTUs containing sequences from a wider variety of species as judged by BLAST similarity. We introduced some new parameters to describe alignment heterogeneity based on Shannon entropy and the extent and contents of the OTUs in a phylogenetic tree space. Our results suggest that ITS should not be used when calculating phylogenetic trees from genetically distant sequences obtained from environmental DNA extractions and that it is inadvisable to define OTUs on the basis of very heterogeneous alignments. PMID:22363808

  16. Diversity measures in environmental sequences are highly dependent on alignment quality--data from ITS and new LSU primers targeting basidiomycetes.

    PubMed

    Krüger, Dirk; Kapturska, Danuta; Fischer, Christiane; Daniel, Rolf; Wubet, Tesfaye

    2012-01-01

    The ribosomal DNA comprised of the ITS1-5.8S-ITS2 regions is widely used as a fungal marker in molecular ecology and systematics but cannot be aligned with confidence across genetically distant taxa. In order to study the diversity of Agaricomycotina in forest soils, we designed primers targeting the more alignable 28S (LSU) gene, which should be more useful for phylogenetic analyses of the detected taxa. This paper compares the performance of the established ITS1F/4B primer pair, which targets basidiomycetes, to that of two new pairs. Key factors in the comparison were the diversity covered, off-target amplification, rarefaction at different Operational Taxonomic Unit (OTU) cutoff levels, sensitivity of the method used to process the alignment to missing data and insecure positional homology, and the congruence of monophyletic clades with OTU assignments and BLAST-derived OTU names. The ITS primer pair yielded no off-target amplification but also exhibited the least fidelity to the expected phylogenetic groups. The LSU primers give complementary pictures of diversity, but were more sensitive to modifications of the alignment such as the removal of difficult-to align stretches. The LSU primers also yielded greater numbers of singletons but also had a greater tendency to produce OTUs containing sequences from a wider variety of species as judged by BLAST similarity. We introduced some new parameters to describe alignment heterogeneity based on Shannon entropy and the extent and contents of the OTUs in a phylogenetic tree space. Our results suggest that ITS should not be used when calculating phylogenetic trees from genetically distant sequences obtained from environmental DNA extractions and that it is inadvisable to define OTUs on the basis of very heterogeneous alignments.

  17. Phylogenetic relationships among Phytophthora species inferred from sequence analysis of mitochondrially encoded cytochrome oxidase I and II genes.

    PubMed

    Martin, Frank N; Tooley, Paul W

    2003-01-01

    The phylogenetic relationships of 51 isolates representing 27 species of Phytophthora were assessed by sequence alignment of 568 bp of the mitochondrially encoded cytochrome oxidase II gene. A total of 1299 bp of the cytochrome oxidase I gene also were examined for a subset of 13 species. The cox II gene trees constructed by a heuristic search, based on maximum parsimony for a bootstrap 50% majority-rule consensus tree, revealed 18 species grouping into seven clades and nine species unaffiliated with a specific clade. The phylogenetic relationships among species observed on cox II gene trees did not exhibit consistent similarities in groupings for morphology, pathogenicity, host range or temperature optima. The topology of cox I gene trees, constructed by a heuristic search based on maximum parsimony for a bootstrap 50% majority-rule consensus tree for 13 species of Phytophthora, revealed 10 species grouping into three clades and three species unaffiliated with a specific clade. The groupings in general agreed with what was observed in the cox II tree. Species relationships observed for the cox II gene tree were in agreement with those based on ITS regions, with several notable exceptions. Some of these differences were noted in species in which the same isolates were used for both ITS and cox II analysis, suggesting either a differential rate of evolutionary divergence for these two regions or incorrect assumptions about alignment of ITS sequences. Analysis of combined data sets of ITS and cox II sequences generated a tree that did not differ substantially from analysis of ITS data alone, however, the results of a partition homogeneity test suggest that combining data sets may not be valid.

  18. Phylogenetic analysis of ALAD and MGP genes related to lead toxicity.

    PubMed

    Shaik, A P; Khan, M; Jamil, K

    2009-07-01

    Experimental studies in our laboratory have established the role of delta-aminolevulinic acid dehydratase (ALAD) and matrix gamma-carboxyglutamic acid (MGP) gene polymorphisms in the etiology of lead toxicity. Polymorphisms in these genes influenced the levels of lead in subjects exposed to this metal. In extension to our studies, we aimed to investigate the possible role of these proteins in evolution by studying the phylogenetic relationship and divergence of ALAD and MGP genes using computational phylogenetic methods. The human ALAD and MGP protein sequences from various species were retrieved from Swiss-Prot database and were compared using Basic Local Alignment Search Tool. Multiple sequence alignment was carried out using ClustalW with defaults, and phylogenetic trees for both the genes were built using neighbor-joining method as in Mega software. Our study indicated that ALAD is a highly conserved protein with the same metal binding site distributed in all the phyla (from archaea to chordates). Phylogenetic analysis of MGP gene revealed that it had an important role in the evolution of endogenous skeleton in contrast to exoskeleton of insects. Occurrence of these genes in evolution with conserved metal binding sites strengthens the role of ALAD and MGP genes in regulating heme biosynthesis and mineralization, respectively, in evolution and helps in better understanding of lead poisoning.

  19. Phylogenetic analyses of phylum Actinobacteria based on whole genome sequences.

    PubMed

    Verma, Mansi; Lal, Devi; Kaur, Jaspreet; Saxena, Anjali; Kaur, Jasvinder; Anand, Shailly; Lal, Rup

    2013-09-01

    Actinobacteria constitute one of the largest and ancient taxonomic phylum within the domain bacteria and are well known for their secondary metabolites. Considerable variation in the metabolic properties, genome size and GC content of the members of this phylum has been observed. Therefore, the placement of new or existing species based on 16S rRNA gene sometimes becomes problematic due to the low congruence level. In the present study, phylogeny of ninety actinobacterial genomes was reconstructed using single gene and whole genome based data. Where alignment-free phylogenetic method was found to be more robust, the concatenation of 94 proteins improved the resolution which all single gene based phylogenies failed to resolve. The comprehensive analysis of 94 conserved proteins resulted in a total of 42,447 informative sites, which is so far the largest meta-alignment obtained for this phylum. But the ultimate resolved phylogeny was obtained by generating a consensus tree by combining the information from single gene and genome based phylogenies. The present investigation clearly revealed that the consensus approach is a useful tool for phylogenetic inference and the taxonomic affiliations must be based on this approach. The consensus approach suggested that there is a need for taxonomic amendments of the orders Frankiales and Micrococcales.

  20. Update of phylogenetic and genetic diversity of Sporothrix schenckii sensu lato.

    PubMed

    Rangel-Gamboa, Lucía; Martínez-Hernandez, Fernando; Maravilla, Pablo; Arenas-Guzmán, Roberto; Flisser, Ana

    2016-03-01

    Sporothrix schenckii sensu lato causes subcutaneous mycosis. In this article we analysed its phylogeny and genetic diversity using calmodulin DNA sequences deposited in GenBank database. Population genetics indices were calculated, plus phylogenetic and haplotype network trees were built. Five clades with high values of posterior probability, 47 haplotypes and high diversity in the complex were found. Analysis of partial calmodulin sequences alignment revealed conserved and polymorphic regions that could be used as reference for taxonomic identification. The use of population genetics analysis allowed understanding the phylogenetic proximity of S. schenckii s. str. and S. brasiliensis; scarce genetic flow among them with low migration index and high ancestry coefficient was found. Similarly, S. globosa, S. mexicana and S. pallida sequences showed highly differentiated species with no genetic exchange. The phylogenetic tree suggests that S. mexicana shared a common ancestor with S. pallida; while S. globosa and S. brasiliensis are more related to S. schenckii s. str. and showed less haplotype diversity and restrictions in geographic distribution. In the haplotype network tree S. schenckii s. str. species displayed worldwide distribution without dispersion centres; while S. brasiliensis and S. globosa, exhibited Brazil and Euro-Asia as dispersion centres, respectively. Our data suggest that S. schenckii complex has been submitted to a divergent evolution process, probably due to the pressure of the environment and of the host. In contrast, S. brasiliensis could have been submitted to purifying selection or expansion process.

  1. The dawn of open access to phylogenetic data.

    PubMed

    Magee, Andrew F; May, Michael R; Moore, Brian R

    2014-01-01

    The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are estimated from increasingly large, genome-scale datasets using increasingly complex statistical methods that require increasing levels of expertise and computational investment. Moreover, the resulting phylogenetic data provide an explicit historical perspective that critically informs research in a vast and growing number of scientific disciplines. One such use is the study of changes in rates of lineage diversification (speciation--extinction) through time. As part of a meta-analysis in this area, we sought to collect phylogenetic data (comprising nucleotide sequence alignment and tree files) from 217 studies published in 46 journals over a 13-year period. We document our attempts to procure those data (from online archives and by direct request to corresponding authors), and report results of analyses (using Bayesian logistic regression) to assess the impact of various factors on the success of our efforts. Overall, complete phylogenetic data for [Formula: see text] of these studies are effectively lost to science. Our study indicates that phylogenetic data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students. Importantly, our survey spans recent policy initiatives and infrastructural changes; our analyses indicate that the positive impact of these community initiatives has been both dramatic and immediate. Although the results of our study indicate that the situation is dire, our findings also reveal tremendous recent progress in the sharing and preservation of phylogenetic data.

  2. One origin for metallo-β-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees.

    PubMed

    Alderson, Rosanna G; Barker, Daniel; Mitchell, John B O

    2014-10-01

    Bacteria use metallo-β-lactamase enzymes to hydrolyse lactam rings found in many antibiotics, rendering them ineffective. Metallo-β-lactamase activity is thought to be polyphyletic, having arisen on more than one occasion within a single functionally diverse homologous superfamily. Since discovery of multiple origins of enzymatic activity conferring antibiotic resistance has broad implications for the continued clinical use of antibiotics, we test the hypothesis of polyphyly further; if lactamase function has arisen twice independently, the most recent common ancestor (MRCA) is not expected to possess lactam-hydrolysing activity. Two major problems present themselves. Firstly, even with a perfectly known phylogeny, ancestral sequence reconstruction is error prone. Secondly, the phylogeny is not known, and in fact reconstructing a single, unambiguous phylogeny for the superfamily has proven impossible. To obtain a more statistical view of the strength of evidence for or against MRCA lactamase function, we reconstructed a sample of 98 MRCAs of the metallo-β-lactamases, each based on a different tree in a bootstrap sample of reconstructed phylogenies. InterPro sequence signatures and homology modelling were then used to assess our sample of MRCAs for lactamase functionality. Only 5 % of these models conform to our criteria for metallo-β-lactamase functionality, suggesting that the ancestor was unlikely to have been a metallo-β-lactamase. On the other hand, given that ancestral proteins may have had metallo-β-lactamase functionality with variation in sequence and structural properties compared with extant enzymes, our criteria are conservative, estimating a lower bound of evidence for metallo-β-lactamase functionality but not an upper bound.

  3. Dual phylogenetic origins of Nigerian lions (Panthera leo)

    PubMed Central

    Tende, Talatu; Bensch, Staffan; Ottosson, Ulf; Hansson, Bengt

    2014-01-01

    Lion fecal DNA extracts from four individuals each from Yankari Game Reserve and Kainji-Lake National Park (central northeast and west Nigeria, respectively) were Sanger-sequenced for the mitochondrial cytochrome b gene. The sequences were aligned against 61 lion reference sequences from other parts of Africa and India. The sequence data were analyzed further for the construction of phylogenetic trees using the maximum-likelihood approach to depict phylogenetic patterns of distribution among sequences. Our results show that Nigerian lions grouped together with lions from West and Central Africa. At the smaller geographical scale, lions from Kainji-Lake National Park in western Nigeria grouped with lions from Benin (located west of Nigeria), whereas lions from Yankari Game Reserve in central northeastern Nigeria grouped with the lion populations in Cameroon (located east of Nigeria). The finding that the two remaining lion populations in Nigeria have different phylogenetic origins is an important aspect to consider in future decisions regarding management and conservation of rapidly shrinking lion populations in West Africa. PMID:25077018

  4. Dual phylogenetic origins of Nigerian lions (Panthera leo).

    PubMed

    Tende, Talatu; Bensch, Staffan; Ottosson, Ulf; Hansson, Bengt

    2014-07-01

    Lion fecal DNA extracts from four individuals each from Yankari Game Reserve and Kainji-Lake National Park (central northeast and west Nigeria, respectively) were Sanger-sequenced for the mitochondrial cytochrome b gene. The sequences were aligned against 61 lion reference sequences from other parts of Africa and India. The sequence data were analyzed further for the construction of phylogenetic trees using the maximum-likelihood approach to depict phylogenetic patterns of distribution among sequences. Our results show that Nigerian lions grouped together with lions from West and Central Africa. At the smaller geographical scale, lions from Kainji-Lake National Park in western Nigeria grouped with lions from Benin (located west of Nigeria), whereas lions from Yankari Game Reserve in central northeastern Nigeria grouped with the lion populations in Cameroon (located east of Nigeria). The finding that the two remaining lion populations in Nigeria have different phylogenetic origins is an important aspect to consider in future decisions regarding management and conservation of rapidly shrinking lion populations in West Africa.

  5. Phylogenetic Analyses of Meloidogyne Small Subunit rDNA

    PubMed Central

    De Ley, Irma Tandingan; De Ley, Paul; Vierstraete, Andy; Karssen, Gerrit; Moens, Maurice; Vanfleteren, Jacques

    2002-01-01

    Phylogenies were inferred from nearly complete small subunit (SSU) 18S rDNA sequences of 12 species of Meloidogyne and 4 outgroup taxa (Globodera pallida, Nacobbus abberans, Subanguina radicicola, and Zygotylenchus guevarai). Alignments were generated manually from a secondary structure model, and computationally using ClustalX and Treealign. Trees were constructed using distance, parsimony, and likelihood algorithms in PAUP* 4.0b4a. Obtained tree topologies were stable across algorithms and alignments, supporting 3 clades: clade I = [M. incognita (M. javanica, M. arenaria)]; clade II = M. duytsi and M. maritima in an unresolved trichotomy with (M. hapla, M. microtyla); and clade III = (M. exigua (M. graminicola, M. chitwoodi)). Monophyly of [(clade I, clade II) clade III] was given maximal bootstrap support (mbs). M. artiellia was always a sister taxon to this joint clade, while M. ichinohei was consistently placed with mbs as a basal taxon within the genus. Affinities with the outgroup taxa remain unclear, although G. pallida and S. radicicola were never placed as closest relatives of Meloidogyne. Our results show that SSU sequence data are useful in addressing deeper phylogeny within Meloidogyne, and that both M. ichinohei and M. artiellia are credible outgroups for phylogenetic analysis of speciations among the major species. PMID:19265950

  6. Entanglement, Invariants, and Phylogenetics

    NASA Astrophysics Data System (ADS)

    Sumner, J. G.

    2007-10-01

    This thesis develops and expands upon known techniques of mathematical physics relevant to the analysis of the popular Markov model of phylogenetic trees required in biology to reconstruct the evolutionary relationships of taxonomic units from biomolecular sequence data. The techniques of mathematical physics are plethora and have been developed for some time. The Markov model of phylogenetics and its analysis is a relatively new technique where most progress to date has been achieved by using discrete mathematics. This thesis takes a group theoretical approach to the problem by beginning with a remarkable mathematical parallel to the process of scattering in particle physics. This is shown to equate to branching events in the evolutionary history of molecular units. The major technical result of this thesis is the derivation of existence proofs and computational techniques for calculating polynomial group invariant functions on a multi-linear space where the group action is that relevant to a Markovian time evolution. The practical results of this thesis are an extended analysis of the use of invariant functions in distance based methods and the presentation of a new reconstruction technique for quartet trees which is consistent with the most general Markov model of sequence evolution.

  7. [Analysis phylogenetic relationship of Gynostemma (Cucurbitaceae)].

    PubMed

    Qin, Shuang-shuang; Li, Hai-tao; Wang, Zhou-yong; Cui, Zhan-hu; Yu, Li-ying

    2015-05-01

    The sequences of ITS, matK, rbcL and psbA-trnH of 9 Gynostemma species or variety including 38 samples were compared and analyzed by molecular phylogeny method. Hemsleya macrosperma was designated as outgroup. The MP and NJ phylogenetic tree of Gynostemma was built based on ITS sequence, the results of PAUP phylogenetic analysis showed the following results: (1) The eight individuals of G. pentaphyllum var. pentaphyllum were not supported as monophyletic in the strict consensus trees and NJ trees. (2) It is suspected whether G. longipes and G. laxum should be classified as the independent species. (3)The classification of subgenus units of Gynostemma plants is supported.

  8. Refuting phylogenetic relationships

    PubMed Central

    Bucknam, James; Boucher, Yan; Bapteste, Eric

    2006-01-01

    Background Phylogenetic methods are philosophically grounded, and so can be philosophically biased in ways that limit explanatory power. This constitutes an important methodologic dimension not often taken into account. Here we address this dimension in the context of concatenation approaches to phylogeny. Results We discuss some of the limits of a methodology restricted to verificationism, the philosophy on which gene concatenation practices generally rely. As an alternative, we describe a software which identifies and focuses on impossible or refuted relationships, through a simple analysis of bootstrap bipartitions, followed by multivariate statistical analyses. We show how refuting phylogenetic relationships could in principle facilitate systematics. We also apply our method to the study of two complex phylogenies: the phylogeny of the archaea and the phylogeny of the core of genes shared by all life forms. While many groups are rejected, our results left open a possible proximity of N. equitans and the Methanopyrales, of the Archaea and the Cyanobacteria, and as well the possible grouping of the Methanobacteriales/Methanoccocales and Thermosplasmatales, of the Spirochaetes and the Actinobacteria and of the Proteobacteria and firmicutes. Conclusion It is sometimes easier (and preferable) to decide which species do not group together than which ones do. When possible topologies are limited, identifying local relationships that are rejected may be a useful alternative to classical concatenation approaches aiming to find a globally resolved tree on the basis of weak phylogenetic markers. Reviewers This article was reviewed by Mark Ragan, Eugene V Koonin and J Peter Gogarten. PMID:16956399

  9. Efficient Exploration of the Space of Reconciled Gene Trees

    PubMed Central

    Szöllősi, Gergely J.; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent

    2013-01-01

    Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source

  10. The Complete Mitochondrial Genome of Aix galericulata and Tadorna ferruginea: Bearings on Their Phylogenetic Position in the Anseriformes

    PubMed Central

    Liu, Gang; Zhou, Lizhi; Li, Bo; Zhang, Lili

    2014-01-01

    Aix galericulata and Tadorna ferruginea are two Anatidae species representing different taxonomic groups of Anseriformes. We used a PCR-based method to determine the complete mtDNAs of both species, and estimated phylogenetic trees based on the complete mtDNA alignment of these and 14 other Anseriforme species, to clarify Anseriform phylogenetics. Phylogenetic trees were also estimated using a multiple sequence alignment of three mitochondrial genes (Cyt b, ND2, and COI) from 68 typical species in GenBank, to further clarify the phylogenetic relationships of several groups among the Anseriformes. The new mtDNAs are circular molecules, 16,651 bp (Aix galericulata) and 16,639 bp (Tadorna ferruginea) in length, containing the 37 typical genes, with an identical gene order and arrangement as those of other Anseriformes. Comparing the protein-coding genes among the mtDNAs of 16 Anseriforme species, ATG is generally the start codon, TAA is the most frequent stop codon, one of three, TAA, TAG, and T-, commonly observed. All tRNAs could be folded into canonical cloverleaf secondary structures except for tRNASer (AGY) and tRNALeu (CUN), which are missing the "DHU" arm.Phylogenetic relationships demonstrate that Aix galericula and Tadorna ferruginea are in the same group, the Tadorninae lineage, based on our analyses of complete mtDNAs and combined gene data. Molecular phylogenetic analysis suggests the 68 species of Anseriform birds be divided into three families: Anhimidae, Anatidae, and Anseranatidae. The results suggest Anatidae birds be divided into five subfamilies: Anatinae, Tadorninae, Anserinae, Oxyurinae, and Dendrocygninae. Oxyurinae and Dendrocygninae should not belong to Anserinae, but rather represent independent subfamilies. The Anatinae includes species from the tribes Mergini, Somaterini, Anatini, and Aythyini. The Anserinae includes species from the tribes Anserini and Cygnini. PMID:25375111

  11. Inference of phylogenetic distances from DNA-walk divergences

    NASA Astrophysics Data System (ADS)

    Licinio, P.; Caligiorne, R. B.

    2004-10-01

    A formalism for the analysis of DNA-sequences is presented. It develops the concept of a composition vector potential which incorporates intrinsic double-strand and four-base symmetries. The vector potential allows for straightforward coarse graining and graphical representation of whole genomes. Its projections are mapped onto DNA-walks. It is shown that distances due to mutation between sequences can be estimated from mean square differences between walks. A computer program for global alignment of sequences (DNAWD) is thus developed and applied to a set of toy sequences representing evolution under mutation. The distance matrix output of DNAWD is shown to provide a good estimate of the associated phylogenetic tree.

  12. Molecular systematics of terraranas (Anura: Brachycephaloidea) with an assessment of the effects of alignment and optimality criteria.

    PubMed

    Padial, José M; Grant, Taran; Frost, Darrel R

    2014-06-26

    Brachycephaloidea is a monophyletic group of frogs with more than 1000 species distributed throughout the New World tropics, subtropics, and Andean regions. Recently, the group has been the target of multiple molecular phylogenetic analyses, resulting in extensive changes in its taxonomy. Here, we test previous hypotheses of phylogenetic relationships for the group by combining available molecular evidence (sequences of 22 genes representing 431 ingroup and 25 outgroup terminals) and performing a tree-alignment analysis under the parsimony optimality criterion using the program POY. To elucidate the effects of alignment and optimality criterion on phylogenetic inferences, we also used the program MAFFT to obtain a similarity-alignment for analysis under both parsimony and maximum likelihood using the programs TNT and GARLI, respectively. Although all three analytical approaches agreed on numerous points, there was also extensive disagreement. Tree-alignment under parsimony supported the monophyly of the ingroup and the sister group relationship of the monophyletic marsupial frogs (Hemiphractidae), while maximum likelihood and parsimony analyses of the MAFFT similarity-alignment did not. All three methods differed with respect to the position of Ceuthomantis smaragdinus (Ceuthomantidae), with tree-alignment using parsimony recovering this species as the sister of Pristimantis + Yunganastes. All analyses rejected the monophyly of Strabomantidae and Strabomantinae as originally defined, and the tree-alignment analysis under parsimony further rejected the recently redefined Craugastoridae and Pristimantinae. Despite the greater emphasis in the systematics literature placed on the choice of optimality criterion for evaluating trees than on the choice of method for aligning DNA sequences, we found that the topological differences attributable to the alignment method were as great as those caused by the optimality criterion. Further, the optimal tree-alignment indicates

  13. Primate molecular phylogenetics in a genomic era.

    PubMed

    Ting, Nelson; Sterner, Kirstin N

    2013-02-01

    A primary objective of molecular phylogenetics is to use molecular data to elucidate the evolutionary history of living organisms. Dr. Morris Goodman founded the journal Molecular Phylogenetics and Evolution as a forum where scientists could further our knowledge about the tree of life, and he recognized that the inference of species trees is a first and fundamental step to addressing many important evolutionary questions. In particular, Dr. Goodman was interested in obtaining a complete picture of the primate species tree in order to provide an evolutionary context for the study of human adaptations. A number of recent studies use multi-locus datasets to infer well-resolved and well-supported primate phylogenetic trees using consensus approaches (e.g., supermatrices). It is therefore tempting to assume that we have a complete picture of the primate tree, especially above the species level. However, recent theoretical and empirical work in the field of molecular phylogenetics demonstrates that consensus methods might provide a false sense of support at certain nodes. In this brief review we discuss the current state of primate molecular phylogenetics and highlight the importance of exploring the use of coalescent-based analyses that have the potential to better utilize information contained in multi-locus data.

  14. Directional biases in phylogenetic structure quantification: a Mediterranean case study

    PubMed Central

    Molina-Venegas, Rafael; Roquet, Cristina

    2014-01-01

    Recent years have seen an increasing effort to incorporate phylogenetic hypotheses to the study of community assembly processes. The incorporation of such evolutionary information has been eased by the emergence of specialized software for the automatic estimation of partially resolved supertrees based on published phylogenies. Despite this growing interest in the use of phylogenies in ecological research, very few studies have attempted to quantify the potential biases related to the use of partially resolved phylogenies and to branch length accuracy, and no work has examined how tree shape may affect inference of community phylogenetic metrics. In this study, using a large plant community and elevational dataset, we tested the influence of phylogenetic resolution and branch length information on the quantification of phylogenetic structure; and also explored the impact of tree shape (stemminess) on the loss of accuracy in phylogenetic structure quantification due to phylogenetic resolution. For this purpose, we used 9 sets of phylogenetic hypotheses of varying resolution and branch lengths to calculate three indices of phylogenetic structure: the mean phylogenetic distance (NRI), the mean nearest taxon distance (NTI) and phylogenetic diversity (stdPD) metrics. The NRI metric was the less sensitive to phylogenetic resolution, stdPD showed an intermediate sensitivity, and NTI was the most sensitive one; NRI was also less sensitive to branch length accuracy than NTI and stdPD, the degree of sensitivity being strongly dependent on the dating method and the sample size. Directional biases were generally towards type II errors. Interestingly, we detected that tree shape influenced the accuracy loss derived from the lack of phylogenetic resolution, particularly for NRI and stdPD. We conclude that well-resolved molecular phylogenies with accurate branch length information are needed to identify the underlying phylogenetic structure of communities, and also that

  15. Complete genome sequences and phylogenetic analysis of two West Nile virus strains isolated from equines in Argentina in 2006 could indicate an early introduction of the virus in the Southern Cone.

    PubMed

    Fabbri, Cintia M; García, Jorge B; Morales, María Alejandra; Enría, Delia A; Levis, Silvana; Lanciotti, Robert S

    2014-11-01

    The complete nucleotide sequences of two West Nile virus (WNV) strains isolated in Argentina were determined. Phylogenetic trees were constructed from the aligned nucleic acid sequences of these two strains along with other previously published complete WNV genome sequences. Phylogenetic data showed that both strains belonged to clade 1a of lineage 1 and clustered in a subclade with American strains isolated during 1999-2002. These results suggest two independent routes of introduction of WNV in Argentina and that the virus could have been circulating in Argentina for some time before being isolated.

  16. dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees

    PubMed Central

    2016-01-01

    Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa

  17. Applications of next-generation sequencing to phylogeography and phylogenetics.

    PubMed

    McCormack, John E; Hird, Sarah M; Zellmer, Amanda J; Carstens, Bryan C; Brumfield, Robb T

    2013-02-01

    This is a time of unprecedented transition in DNA sequencing technologies. Next-generation sequencing (NGS) clearly holds promise for fast and cost-effective generation of multilocus sequence data for phylogeography and phylogenetics. However, the focus on non-model organisms, in addition to uncertainty about which sample preparation methods and analyses are appropriate for different research questions and evolutionary timescales, have contributed to a lag in the application of NGS to these fields. Here, we outline some of the major obstacles specific to the application of NGS to phylogeography and phylogenetics, including the focus on non-model organisms, the necessity of obtaining orthologous loci in a cost-effective manner, and the predominate use of gene trees in these fields. We describe the most promising methods of sample preparation that address these challenges. Methods that reduce the genome by restriction digest and manual size selection are most appropriate for studies at the intraspecific level, whereas methods that target specific genomic regions (i.e., target enrichment or sequence capture) have wider applicability from the population level to deep-level phylogenomics. Additionally, we give an overview of how to analyze NGS data to arrive at data sets applicable to the standard toolkit of phylogeography and phylogenetics, including initial data processing to alignment and genotype calling (both SNPs and loci involving many SNPs). Even though whole-genome sequencing is likely to become affordable rather soon, because phylogeography and phylogenetics rely on analysis of hundreds of individuals in many cases, methods that reduce the genome to a subset of loci should remain more cost-effective for some time to come.

  18. Progressive multiple sequence alignments from triplets

    PubMed Central

    Kruspe, Matthias; Stadler, Peter F

    2007-01-01

    Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores. PMID:17631683

  19. Alignment validation

    SciTech Connect

    ALICE; ATLAS; CMS; LHCb; Golling, Tobias

    2008-09-06

    The four experiments, ALICE, ATLAS, CMS and LHCb are currently under constructionat CERN. They will study the products of proton-proton collisions at the Large Hadron Collider. All experiments are equipped with sophisticated tracking systems, unprecedented in size and complexity. Full exploitation of both the inner detector andthe muon system requires an accurate alignment of all detector elements. Alignmentinformation is deduced from dedicated hardware alignment systems and the reconstruction of charged particles. However, the system is degenerate which means the data is insufficient to constrain all alignment degrees of freedom, so the techniques are prone to converging on wrong geometries. This deficiency necessitates validation and monitoring of the alignment. An exhaustive discussion of means to validate is subject to this document, including examples and plans from all four LHC experiments, as well as other high energy experiments.

  20. The augmentation algorithm and molecular phylogenetic trees

    NASA Technical Reports Server (NTRS)

    Holmquist, R.

    1978-01-01

    Moore's (1977) augmentation procedure is discussed, and it is concluded that the procedure is valid for obtaining estimates of the total number of fixed nucleotide substitutions both theoretically and in practice, for both simulated and real data, and in agreement, for experimentally dense data sets, with stochastic estimates of the divergence, provided the restrictions on codon mutability resulting from natural selection are explicitly allowed for. Tateno and Nei's (1978) critique that the augmentation procedure has a systematic bias toward overestimation of the total number of nucleotide replacements is disputed, and a data analysis suggests that ancestral sequences inferred by the method of parsimony contain a large number of incorrectly assigned nucleotides.

  1. Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

    PubMed

    Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

    2013-12-01

    Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.

  2. Coelomata and Not Ecdysozoa: Evidence From Genome-Wide Phylogenetic Analysis

    PubMed Central

    Wolf, Yuri I.; Rogozin, Igor B.; Koonin, Eugene V.

    2004-01-01

    Relative positions of nematodes, arthropods, and chordates in animal phylogeny remain uncertain. The traditional tree topology joins arthropods with chordates in a coelomate clade, whereas nematodes, which lack a coelome, occupy a basal position. However, the current leading hypothesis, based on phylogenetic trees for 18S ribosomal RNA and several proteins, joins nematodes with arthropods in a clade of molting animals, Ecdysozoa. We performed a phylogenetic analysis of over 500 sets of orthologous proteins, which are represented in plants, animals, and fungi, using maximum likelihood, maximum parsimony, and distance methods. Additionally, to increase the statistical power of topology tests, the same methods were applied to concatenated alignments of subunits of eight conserved macromolecular complexes. The majority of the methods, when applied to most of the orthologous clusters, both concatenated and individual, grouped the fly with humans to the exclusion of the nematode, in support of the coelomate phylogeny. Trees were also constructed using information on insertions and deletions in orthologous proteins, combinations of domains in multidomain proteins, and presence-absence of species in clusters of orthologs. All of these approaches supported the coelomate clade and showed concordance between evolution of protein sequences and higher-level evolutionary events, such as domain fusion or gene loss. PMID:14707168

  3. LSHPlace: fast phylogenetic placement using locality-sensitive hashing.

    PubMed

    Brown, Daniel G; Truszkowski, Jakub

    2013-01-01

    We consider the problem of phylogenetic placement, in which large numbers of sequences (often next-generation sequencing reads) are placed onto an existing phylogenetic tree. We adapt our recent work on phylogenetic tree inference, which uses ancestral sequence reconstruction and locality-sensitive hashing, to this domain. With these ideas, new sequences can be placed onto trees with high fidelity in strikingly fast runtimes. Our results are two orders of magnitude faster than existing programs for this domain, and show a modest accuracy tradeoff. Our results offer the possibility of analyzing many more reads in a next-generation sequencing project than is currently possible.

  4. Trees of trees: an approach to comparing multiple alternative phylogenies.

    PubMed

    Nye, Tom M W

    2008-10-01

    Phylogenetic analysis very commonly produces several alternative trees for a given fixed set of taxa. For example, different sets of orthologous genes may be analyzed, or the analysis may sample from a distribution of probable trees. This article describes an approach to comparing and visualizing multiple alternative phylogenies via the idea of a "tree of trees" or "meta-tree." A meta-tree clusters phylogenies with similar topologies together in the same way that a phylogeny clusters species with similar DNA sequences. Leaf nodes on a meta-tree correspond to the original set of phylogenies given by some analysis, whereas interior nodes correspond to certain consensus topologies. The construction of meta-trees is motivated by analogy with construction of a most parsimonious tree for DNA data, but instead of using DNA letters, in a meta-tree the characters are partitions or splits of the set of taxa. An efficient algorithm for meta-tree construction is described that makes use of a known relationship between the majority consensus and parsimony in terms of gain and loss of splits. To illustrate these ideas meta-trees are constructed for two datasets: a set of gene trees for species of yeast and trees from a bootstrap analysis of a set of gene trees in ray-finned fish. A software tool for constructing meta-trees and comparing alternative phylogenies is available online, and the source code can be obtained from the author.

  5. Phylogenetic Analysis of Selected Menthol-Producing Species Belonging to the Lamiaceae Family.

    PubMed

    Mirzaei, Motahareh; Mirzaei, Hamed; Sahebkar, Amirhossein; Bagherian, Ali; Masoud Khoi, Mohammad Jaber; Reza Mirzaei, Hamid; Salehi, Rasoul; Reza Jaafari, Mahmoud; Kazemi Oskuee, Reza

    2015-01-01

    Menthol is an organic compound with diverse medicinal and commercial applications, and is made either synthetically or through extraction from mint oils. The aim of the present study was to investigate menthol levels in selected menthol-producing species belonging to the Lamiaceae family, and to determine phylogenetic relationships of menthol dehydrogenase gene sequence among these species. Three genus of Lamiaceae, namely Mentha, Salvia, and Micromeria, were selected for phytochemical and phylogenetic analyses. After identification of each species based on menthol dehydrogenase gene in NCBI, BLAST software was used for the sequence alignment. MEGA4 software was used to draw phylogenetic tree for various species. Phytochemical analysis revealed that the highest and lowest amounts of both essential oil and menthol belonged to Mentha spicata and Micromeria hyssopifolia, respectively. The species Mentha spicata and Mentha piperita, which were assigned to one cluster in the dendrogram, contained the highest amounts of essential oil and menthol while Micromeria species, which was in the distinct cluster and placed in the farther evolutionary distance, contained the lowest amount of essential oil and menthol. Phylogenetic and phytochemistry analyses showed that essential oil and menthol contents of menthol-producing species are associated with menthol dehydrogenase gene sequence.

  6. Characterization and evolution of the mitochondrial DNA control region in Ranidae and their phylogenetic relationship.

    PubMed

    Huang, Z H; Tu, F Y

    2016-08-29

    The control region is considered to be one of the most variable parts of animal mitochondrial DNA (mtDNA). We compared the mtDNA control region from 37 species representing 14 genera and 4 subfamilies of Ranidae, to analyze the evolution of the control region and to determine their phylogenetic relationship. All the Ranidae species had a single control region, except four species that had two repeat regions. The control region spanned the region between the Cyt b and tRNAleu genes in most of the Ranidae species. The length of the control region sequences ranged from 1186 bp (Limnonectes bannaensis) to 6746 bp (Rana kunyuensis). The average genetic distances among the species varied from 1.94% (between R. chosenica and R. plancyi) to 113.25% (between Amolops ricketti and Euphlyctis hexadactylus). The alignment of three conserved sequence blocks was identified. However, conserved sequence boxes F to A were not found in Ranidae. A maximum likelihood method was used to reconstruct the phylogenetic relationship based on a general time reversible + gamma distribution model. The amount of A+T was higher than G+C across the whole control region. The phylogenetic tree grouped members of the respective subfamilies into separate clades, with the exception of Raninae. Our analysis supported that some genera, including Rana and Amolops, may be polyphyletic. Control region sequence is an effective molecular mark for Ranidae phylogenetic inference.

  7. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.

    PubMed

    Gaudet, Pascale; Livstone, Michael S; Lewis, Suzanna E; Thomas, Paul D

    2011-09-01

    The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.

  8. A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships.

    PubMed

    Amrine-Madsen, Heather; Koepfli, Klaus-Peter; Wayne, Robert K; Springer, Mark S

    2003-08-01

    Higher-level relationships within, and the root of Placentalia, remain contentious issues. Resolution of the placental tree is important to the choice of mammalian genome projects and model organisms, as well as for understanding the biogeography of the eutherian radiation. We present phylogenetic analyses of 63 species representing all extant eutherian mammal orders for a new molecular phylogenetic marker, a 1.3kb portion of exon 26 of the apolipoprotein B (APOB) gene. In addition, we analyzed a multigene concatenation that included APOB sequences and a previously published data set (Murphy et al., 2001b) of three mitochondrial and 19 nuclear genes, resulting in an alignment of over 17kb for 42 placentals and two marsupials. Due to computational difficulties, previous maximum likelihood analyses of large, multigene concatenations for placental mammals have used quartet puzzling, less complex models of sequence evolution, or phylogenetic constraints to approximate a full maximum likelihood bootstrap. Here, we utilize a Unix load sharing facility to perform maximum likelihood bootstrap analyses for both the APOB and concatenated data sets with a GTR+Gamma+I model of sequence evolution, tree-bisection and reconnection branch-swapping, and no phylogenetic constraints. Maximum likelihood and Bayesian analyses of both data sets provide support for the superordinal clades Boreoeutheria, Euarchontoglires, Laurasiatheria, Xenarthra, Afrotheria, and Ostentoria (pangolins+carnivores), as well as for the monophyly of the orders Eulipotyphla, Primates, and Rodentia, all of which have recently been questioned. Both data sets recovered an association of Hippopotamidae and Cetacea within Cetartiodactyla, as well as hedgehog and shrew within Eulipotyphla. APOB showed strong support for an association of tarsier and Anthropoidea within Primates. Parsimony, maximum likelihood and Bayesian analyses with both data sets placed Afrotheria at the base of the placental radiation

  9. Insect phylogenetics in the digital age.

    PubMed

    Dietrich, Christopher H; Dmitriev, Dmitry A

    2016-12-01

    Insect systematists have long used digital data management tools to facilitate phylogenetic research. Web-based platforms developed over the past several years support creation of comprehensive, openly accessible data repositories and analytical tools that support large-scale collaboration, accelerating efforts to document Earth's biota and reconstruct the Tree of Life. New digital tools have the potential to further enhance insect phylogenetics by providing efficient workflows for capturing and analyzing phylogenetically relevant data. Recent initiatives streamline various steps in phylogenetic studies and provide community access to supercomputing resources. In the near future, automated, web-based systems will enable researchers to complete a phylogenetic study from start to finish using resources linked together within a single portal and incorporate results into a global synthesis.

  10. RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits

    PubMed Central

    Teeling, Hanno; Gloeckner, Frank Oliver

    2006-01-01

    Background Until today, analysis of 16S ribosomal RNA (rRNA) sequences has been the de-facto gold standard for the assessment of phylogenetic relationships among prokaryotes. However, the branching order of the individual phlya is not well-resolved in 16S rRNA-based trees. In search of an improvement, new phylogenetic methods have been developed alongside with the growing availability of complete genome sequences. Unfortunately, only a few genes in prokaryotic genomes qualify as universal phylogenetic markers and almost all of them have a lower information content than the 16S rRNA gene. Therefore, emphasis has been placed on methods that are based on multiple genes or even entire genomes. The concatenation of ribosomal protein sequences is one method which has been ascribed an improved resolution. Since there is neither a comprehensive database for ribosomal protein sequences nor a tool that assists in sequence retrieval and generation of respective input files for phylogenetic reconstruction programs, RibAlign has been developed to fill this gap. Results RibAlign serves two purposes: First, it provides a fast and scalable database that has been specifically adapted to eubacterial ribosomal protein sequences and second, it provides sophisticated import and export capabilities. This includes semi-automatic extraction of ribosomal protein sequences from whole-genome GenBank and FASTA files as well as exporting aligned, concatenated and filtered sequence files that can directly be used in conjunction with the PHYLIP and MrBayes phylogenetic reconstruction programs. Conclusion Up to now, phylogeny based on concatenated ribosomal protein sequences is hampered by the limited set of sequenced genomes and high computational requirements. However, hundreds of full and draft genome sequencing projects are on the way, and advances in cluster-computing and algorithms make phylogenetic reconstructions feasible even with large alignments of concatenated marker genes. RibAlign

  11. [Foundations of the new phylogenetics].

    PubMed

    Pavlinov, I Ia

    2004-01-01

    phylistics (Rasnitsyn's term; close to Simpsonian evolutionary taxonomy) belonging rather to the classical realm, and Hennigian cladistics that pays attention to origin of monophyletic taxa exclusively. In early of the 20th century, microevolutionary doctrine became predominating in evolutionary studies. Its core is the population thinking accompanied by the phenetic one based on equation of kinship to overall similarity. They were connected to positivist philosophy and hence were characterized by reductionism at both ontological and epistemological levels. It led to fall of classical phylogenetics but created the prerequisites for the new phylogenetics which also appeared to be full of reductionism. The new rise of phylogenetic (rather than tree) thinking during the last third of the 20th century was caused by lost of explanatory power of population one and by development of the new worldview and new epistemological premises. That new worldview is based on the synergetic (Prigoginian) model of development of non-equilibrium systems: evolution of the biota, a part of which is phylogeny, is considered as such a development. At epistemological level, the principal premise appeared to be fall of positivism which was replaced by post-positivism argumentation schemes. Input of cladistics into new phylogenetics is twofold. On the one hand, it reduced phylogeny to cladistic history lacking any adaptivist interpretation and presuming minimal evolution model. From this it followed reduction of kinship relation to sister-group relation lacking any reference to real time scale and to ancestor-descendant relation. On the other hand, cladistics elaborated methodology of phylogenetic reconstructions based on the synapomorphy principle, the outgroup concept became its part. The both inputs served as premises of incorporation of both numerical techniques and molecular data into phylogenetic reconstruction. Numerical phyletics provided the new phylogenetics with easily manipulated algorithms

  12. Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.

    PubMed

    McTavish, Emily Jane; Steel, Mike; Holder, Mark T

    2015-12-01

    Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the true distances. They showed that if the corrected distance is a concave function of the true distance, then inconsistency due to long branch attraction will occur. If these functions are convex, then two "long branch repulsion" trees will be preferred over the true tree - though these two incorrect trees are expected to be tied as the preferred true. Here we extend their results, and demonstrate the existence of a tree shape (which we refer to as a "twisted Farris-zone" tree) for which a single incorrect tree topology will be guaranteed to be preferred if the corrected distance function is convex. We also report that the standard practice of treating gaps in sequence alignments as missing data is sufficient to produce non-linear corrected distance functions if the substitution process is not independent of the insertion/deletion process. Taken together, these results imply inconsistent tree inference under mild conditions. For example, if some positions in a sequence are constrained to be free of substitutions and insertion/deletion events while the remaining sites evolve with independent substitutions and insertion/deletion events, then the distances obtained by treating gaps as missing data can support an incorrect tree topology even given an unlimited amount of data.

  13. Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing.

    PubMed

    Sánchez, Rubén; Serra, François; Tárraga, Joaquín; Medina, Ignacio; Carbonell, José; Pulido, Luis; de María, Alejandro; Capella-Gutíerrez, Salvador; Huerta-Cepas, Jaime; Gabaldón, Toni; Dopazo, Joaquín; Dopazo, Hernán

    2011-07-01

    Phylemon 2.0 is a new release of the suite of web tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. It has been designed as a response to the increasing demand of molecular sequence analyses for experts and non-expert users. Phylemon 2.0 has several unique features that differentiates it from other similar web resources: (i) it offers an integrated environment that enables evolutionary analyses, format conversion, file storage and edition of results; (ii) it suggests further analyses, thereby guiding the users through the web server; and (iii) it allows users to design and save phylogenetic pipelines to be used over multiple genes (phylogenomics). Altogether, Phylemon 2.0 integrates a suite of 30 tools covering sequence alignment reconstruction and trimming; tree reconstruction, visualization and manipulation; and evolutionary hypotheses testing.

  14. Estimating Bayesian Phylogenetic Information Content

    PubMed Central

    Lewis, Paul O.; Chen, Ming-Hui; Kuo, Lynn; Lewis, Louise A.; Fučíková, Karolina; Neupane, Suman; Wang, Yu-Bo; Shi, Daoyuan

    2016-01-01

    Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data. [Bayesian; concatenation; conditional clade distribution; entropy; information; phylogenetics; saturation.] PMID:27155008

  15. QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically

    PubMed Central

    Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

    2015-01-01

    Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082

  16. Recapitulating phylogenies using k-mers: from trees to networks

    PubMed Central

    Bernard, Guillaume; Ragan, Mark A.; Chan, Cheong Xin

    2016-01-01

    Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner. PMID:28105314

  17. Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses.

    PubMed

    Jacox, Edwin; Weller, Mathias; Tannier, Eric; Scornavacca, Celine

    2017-01-10

    Gene trees reconstructed from sequence alignments contain poorly supported branches when the phylogenetic signal in the sequences is insufficient to determine them all. When a species tree is available, the signal of gains and losses of genes can be used to correctly resolve the unsupported parts of the gene history. However finding a most parsimonious binary resolution of a non-binary tree obtained by contracting the unsupported branches is NP-hard if transfer events are considered as possible gene scale events, in addition to gene origination, duplication and loss. We propose an exact, parameterized algorithm to solve this problem in single-exponential time, where the parameter is the number of connected branches of the gene tree that show low support from the sequence alignment or, equivalently, the maximum number of children of any node of the gene tree once the low-support branches have been collapsed. This improves on the best known algorithm by an exponential factor. We propose a way to choose among optimal solutions based on the available information. We show the usability of this principle on several simulated and biological datasets. The results are comparable in quality to several other tested methods having similar goals, but our approach provides a lower running time and a guarantee that the produced solution is optimal.

  18. Molecular phylogenetics: testing evolutionary hypotheses.

    PubMed

    Walsh, David A; Sharma, Adrian K

    2009-01-01

    A common approach for investigating evolutionary relationships between genes and organisms is to compare extant DNA or protein sequences and infer an evolutionary tree. This methodology is known as molecular phylogenetics and may be the most informative means for exploring phage evolution, since there are few morphological features that can be used to differentiate between these tiny biological entities. In addition, phage genomes can be mosaic, meaning different genes or genomic regions can exhibit conflicting evolutionary histories due to lateral gene transfer or homologous recombination between different phage genomes. Molecular phylogenetics can be used to identify and study such genome mosaicism. This chapter provides a general introduction to the theory and methodology used to reconstruct phylogenetic relationships from molecular data. Also included is a discussion on how the evolutionary history of different genes within the same set of genomes can be compared, using a collection of T4-type phage genomes as an example. A compilation of programs and packages that are available for conducting phylogenetic analyses is supplied as an accompanying appendix.

  19. Phylogenetic informativeness reconciles ray-finned fish molecular divergence times

    PubMed Central

    2014-01-01

    Background Discordance among individual molecular age estimates, or between molecular age estimates and the fossil record, is observed in many clades across the Tree of Life. This discordance is attributed to a variety of variables including calibration age uncertainty, calibration placement, nucleotide substitution rate heterogeneity, or the specified molecular clock model. However, the impact of changes in phylogenetic informativeness of individual genes over time on phylogenetic inferences is rarely analyzed. Using nuclear and mitochondrial sequence data for ray-finned fishes (Actinopterygii) as an example, we extend the utility of phylogenetic informativeness profiles to predict the time intervals when nucleotide substitution saturation results in discordance among molecular ages estimated. Results We demonstrate that even with identical calibration regimes and molecular clock methods, mitochondrial based molecular age estimates are systematically older than those estimated from nuclear sequences. This discordance is most severe for highly nested nodes corresponding to more recent (i.e., Jurassic-Recent) divergences. By removing data deemed saturated, we reconcile the competing age estimates and highlight that the older mtDNA based ages were driven by nucleotide saturation. Conclusions Homoplasious site patterns in a DNA sequence alignment can systematically bias molecular divergence time estimates. Our study demonstrates that PI profiles can provide a non-arbitrary criterion for data exclusion to mitigate the influence of homoplasy on time calibrated branch length estimates. Analyses of actinopterygian molecular clocks demonstrate that scrutiny of the time scale on which sequence data is informative is a fundamental, but generally overlooked, step in molecular divergence time estimation. PMID:25103329

  20. Phylogenetic diversity of nonmarine picocyanobacteria.

    PubMed

    Callieri, Cristiana; Coci, Manuela; Corno, Gianluca; Macek, Miroslav; Modenutti, Beatriz; Balseiro, Esteban; Bertoni, Roberto

    2013-08-01

    We studied the phylogenetic diversity of nonmarine picocyanobacteria broadening the sequence data set with 43 new sequences of the 16S rRNA gene. The sequences were derived from monoclonal strains isolated from four volcanic high-altitude athalassohaline lakes in Mexico, five glacial ultraoligotrophic North Patagonian lakes and six Italian lakes of glacial, volcanic and morenic origin. The new sequences fall into a number of both novel and previously described clades within the phylogenetic tree of 16S rRNA gene. The new cluster of Lake Nahuel Huapi (North Patagonia) forms a sister clade to the subalpine cluster II and the marine Synechococcus subcluster 5.2. Our finding of the novel clade of 'halotolerants' close to the marine subcluster 5.3 (Synechococcus RCC307) constitutes an important demonstration that euryhaline and marine strains affiliate closely. The intriguing results obtained shed new light on the importance of the nonmarine halotolerants in the phylogenesis of picocyanobacteria.

  1. ALIGNING JIG

    DOEpatents

    Culver, J.S.; Tunnell, W.C.

    1958-08-01

    A jig or device is described for setting or aligning an opening in one member relative to another member or structure, with a predetermined offset, or it may be used for measuring the amount of offset with which the parts have previously been sct. This jig comprises two blocks rabbeted to each other, with means for securing thc upper block to the lower block. The upper block has fingers for contacting one of the members to be a1igmed, the lower block is designed to ride in grooves within the reference member, and calibration marks are provided to determine the amount of offset. This jig is specially designed to align the collimating slits of a mass spectrometer.

  2. Image alignment

    DOEpatents

    Dowell, Larry Jonathan

    2014-04-22

    Disclosed is a method and device for aligning at least two digital images. An embodiment may use frequency-domain transforms of small tiles created from each image to identify substantially similar, "distinguishing" features within each of the images, and then align the images together based on the location of the distinguishing features. To accomplish this, an embodiment may create equal sized tile sub-images for each image. A "key" for each tile may be created by performing a frequency-domain transform calculation on each tile. A information-distance difference between each possible pair of tiles on each image may be calculated to identify distinguishing features. From analysis of the information-distance differences of the pairs of tiles, a subset of tiles with high discrimination metrics in relation to other tiles may be located for each image. The subset of distinguishing tiles for each image may then be compared to locate tiles with substantially similar keys and/or information-distance metrics to other tiles of other images. Once similar tiles are located for each image, the images may be aligned in relation to the identified similar tiles.

  3. Visualizing Phylogenetic Treespace Using Cartographic Projections

    NASA Astrophysics Data System (ADS)

    Sundberg, Kenneth; Clement, Mark; Snell, Quinn

    Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger datasets.

  4. Morphological and molecular convergences in mammalian phylogenetics.

    PubMed

    Zou, Zhengting; Zhang, Jianzhi

    2016-09-02

    Phylogenetic trees reconstructed from molecular sequences are often considered more reliable than those reconstructed from morphological characters, in part because convergent evolution, which confounds phylogenetic reconstruction, is believed to be rarer for molecular sequences than for morphologies. However, neither the validity of this belief nor its underlying cause is known. Here comparing thousands of characters of each type that have been used for inferring the phylogeny of mammals, we find that on average morphological characters indeed experience much more convergences than amino acid sites, but this disparity is explained by fewer states per character rather than an intrinsically higher susceptibility to convergence for morphologies than sequences. We show by computer simulation and actual data analysis that a simple method for identifying and removing convergence-prone characters improves phylogenetic accuracy, potentially enabling, when necessary, the inclusion of morphologies and hence fossils for reliable tree inference.

  5. Morphological and molecular convergences in mammalian phylogenetics

    PubMed Central

    Zou, Zhengting; Zhang, Jianzhi

    2016-01-01

    Phylogenetic trees reconstructed from molecular sequences are often considered more reliable than those reconstructed from morphological characters, in part because convergent evolution, which confounds phylogenetic reconstruction, is believed to be rarer for molecular sequences than for morphologies. However, neither the validity of this belief nor its underlying cause is known. Here comparing thousands of characters of each type that have been used for inferring the phylogeny of mammals, we find that on average morphological characters indeed experience much more convergences than amino acid sites, but this disparity is explained by fewer states per character rather than an intrinsically higher susceptibility to convergence for morphologies than sequences. We show by computer simulation and actual data analysis that a simple method for identifying and removing convergence-prone characters improves phylogenetic accuracy, potentially enabling, when necessary, the inclusion of morphologies and hence fossils for reliable tree inference. PMID:27585543

  6. Comparative analysis of mt LSU rRNA secondary structures of Odonates: structural variability and phylogenetic signal.

    PubMed

    Misof, B; Fleck, G

    2003-12-01

    Secondary structures of the most conserved part of the mt 16S rRNA gene, domains IV and V, have been recently analysed in a comparative study. However, full secondary structures of the mt LSU rRNA molecule are published for only a few insect species. The present study presents full secondary structures of domains I, II, IV and V of Odonates and one representative of mayflies, Ephemera sp. The reconstructions are based on a comparative approach and minimal consensus structures derived from sequence alignments. The inferred structures exhibit remarkable similarities to the published Drosophila melanogaster model, which increases confidence in these structures. Structural variance within Odonates is homoplastic, and neighbour-joining trees based on tree edit distances do not correspond to any of the phylogenetically expected patterns. However, despite homoplastic quantitative structural variation, many similarities between Odonates and Ephemera sp. suggest promising character sets for higher order insect systematics that merit further investigations.

  7. A Consistent Phylogenetic Backbone for the Fungi

    PubMed Central

    Ebersberger, Ingo; de Matos Simoes, Ricardo; Kupczok, Anne; Gube, Matthias; Kothe, Erika; Voigt, Kerstin; von Haeseler, Arndt

    2012-01-01

    The kingdom of fungi provides model organisms for biotechnology, cell biology, genetics, and life sciences in general. Only when their phylogenetic relationships are stably resolved, can individual results from fungal research be integrated into a holistic picture of biology. However, and despite recent progress, many deep relationships within the fungi remain unclear. Here, we present the first phylogenomic study of an entire eukaryotic kingdom that uses a consistency criterion to strengthen phylogenetic conclusions. We reason that branches (splits) recovered with independent data and different tree reconstruction methods are likely to reflect true evolutionary relationships. Two complementary phylogenomic data sets based on 99 fungal genomes and 109 fungal expressed sequence tag (EST) sets analyzed with four different tree reconstruction methods shed light from different angles on the fungal tree of life. Eleven additional data sets address specifically the phylogenetic position of Blastocladiomycota, Ustilaginomycotina, and Dothideomycetes, respectively. The combined evidence from the resulting trees supports the deep-level stability of the fungal groups toward a comprehensive natural system of the fungi. In addition, our analysis reveals methodologically interesting aspects. Enrichment for EST encoded data—a common practice in phylogenomic analyses—introduces a strong bias toward slowly evolving and functionally correlated genes. Consequently, the generalization of phylogenomic data sets as collections of randomly selected genes cannot be taken for granted. A thorough characterization of the data to assess possible influences on the tree reconstruction should therefore become a standard in phylogenomic analyses. PMID:22114356

  8. Spatial predictions of phylogenetic diversity in conservation decision making.

    PubMed

    Pio, Dorothea V; Broennimann, Olivier; Barraclough, Timothy G; Reeves, Gail; Rebelo, Anthony G; Thuiller, Wilfried; Guisan, Antoine; Salamin, Nicolas

    2011-12-01

    Considering genetic relatedness among species has long been argued as an important step toward measuring biological diversity more accurately, rather than relying solely on species richness. Some researchers have correlated measures of phylogenetic diversity and species richness across a series of sites and suggest that values of phylogenetic diversity do not differ enough from those of species richness to justify their inclusion in conservation planning. We compared predictions of species richness and 10 measures of phylogenetic diversity by creating distribution models for 168 individual species of a species-rich plant family, the Cape Proteaceae. When we used average amounts of land set aside for conservation to compare areas selected on the basis of species richness with areas selected on the basis of phylogenetic diversity, correlations between species richness and different measures of phylogenetic diversity varied considerably. Correlations between species richness and measures that were based on the length of phylogenetic tree branches and tree shape were weaker than those that were based on tree shape alone. Elevation explained up to 31% of the segregation of species rich versus phylogenetically rich areas. Given these results, the increased availability of molecular data, and the known ecological effect of phylogenetically rich communities, consideration of phylogenetic diversity in conservation decision making may be feasible and informative.

  9. Morphological Phylogenetics in the Genomic Age.

    PubMed

    Lee, Michael S Y; Palci, Alessandro

    2015-10-05

    Evolutionary trees underpin virtually all of biology, and the wealth of new genomic data has enabled us to reconstruct them with increasing detail and confidence. While phenotypic (typically morphological) traits are becoming less important in reconstructing evolutionary trees, they still serve vital and unique roles in phylogenetics, even for living taxa for which vast amounts of genetic information are available. Morphology remains a powerful independent source of evidence for testing molecular clades, and - through fossil phenotypes - the primary means for time-scaling phylogenies. Morphological phylogenetics is therefore vital for transforming undated molecular topologies into dated evolutionary trees. However, if morphology is to be employed to its full potential, biologists need to start scrutinising phenotypes in a more objective fashion, models of phenotypic evolution need to be improved, and approaches for analysing phenotypic traits and fossils together with genomic data need to be refined.

  10. A Phylogenetic Analysis of Heterorhabditis (Nemata: Rhabditidae) Based on Internal Transcribed Spacer 1 DNA Sequence Data

    PubMed Central

    Adams, B. J.; Burnell, A. M.; Powers, T. O.

    1998-01-01

    Internal transcribed spacer 1 sequences were used to infer phylogenetic relationships among 8 of the 9 described species and one putative species of the entomopathogenic nematode genus Heterorhabditis. Sequences were aligned and optimized based on pairwise genetic distance and parsimony criteria and subjected to a variety of sequence alignment parameters. Phylogenetic trees were constructed with maximum parsimony, cladistic, distance, and maximum likelihood algorithms. Our results gave strong support for four pairs of sister species, while relationships between these pairs also were resolved but less well supported. The ITS1 region of the nuclear ribosomal repeat was a reliable source of homologous characters for resolving relationships between closely related taxa but provided more tenuous resolution among more divergent lineages. A high degree of sequence identity and lack of autapomorphic characters suggest that sister species pairs within three distinct lineages may be mutually conspecific. Application of these molecular data and current morphological knowledge to the delimitation of species is hindered by an incomplete understanding of their variability in natural populations. PMID:19274196

  11. Understanding phylogenetic incongruence: lessons from phyllostomid bats

    PubMed Central

    Dávalos, Liliana M; Cirranello, Andrea L; Geisler, Jonathan H; Simmons, Nancy B

    2012-01-01

    All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive

  12. Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo.

    PubMed

    Pagel, Mark; Meade, Andrew

    2008-12-27

    The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.

  13. Increased taxon sampling greatly reduces phylogenetic error.

    PubMed

    Zwickl, Derrick J; Hillis, David M

    2002-08-01

    Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little benefit of extensive taxon sampling, and so phylogenetic problems can or should be reduced to a few exemplar taxa as a means of reducing the computational complexity of the phylogenetic analysis. In this paper we examined five aspects of study design that may have led to these different perspectives. First, we considered the measurement of phylogenetic error across a wide range of taxon sample sizes, and conclude that the expected error based on randomly selecting trees (which varies by taxon sample size) must be considered in evaluating error in studies of the effects of taxon sampling. Second, we addressed the scope of the phylogenetic problems defined by different samples of taxa, and argue that phylogenetic scope needs to be considered in evaluating the importance of taxon-sampling strategies. Third, we examined the claim that fast and simple tree searches are as effective as more thorough searches at finding near-optimal trees that minimize error. We show that a more complete search of tree space reduces phylogenetic error, especially as the taxon sample size increases. Fourth, we examined the effects of simple versus complex simulation models on taxonomic sampling studies. Although benefits of taxon sampling are apparent for all models, data generated under more complex models of evolution produce higher overall levels of error and show greater positive effects of increased taxon sampling. Fifth, we asked if different phylogenetic optimality criteria show different effects of taxon sampling. Although we found strong differences in effectiveness of different optimality criteria as a function of taxon sample size, increased taxon sampling improved the results from all the common optimality criteria. Nonetheless, the method that showed the lowest overall

  14. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  15. The phylogenetic utility and functional constraint of microRNA flanking sequences

    PubMed Central

    Kenny, Nathan J.; Sin, Yung Wa; Hayward, Alexander; Paps, Jordi; Chu, Ka Hou; Hui, Jerome H. L.

    2015-01-01

    MicroRNAs (miRNAs) have recently risen to prominence as novel factors responsible for post-transcriptional regulation of gene expression. miRNA genes have been posited as highly conserved in the clades in which they exist. Consequently, miRNAs have been used as rare genome change characters to estimate phylogeny by tracking their gain and loss. However, their short length (21–23 bp) has limited their perceived utility in sequenced-based phylogenetic inference. Here, using reference taxa with established phylogenetic relationships, we demonstrate that miRNA sequences are of high utility in quantitative, rather than in qualitative, phylogenetic analysis. The clear orthology among miRNA genes from different species makes it straightforward to identify and align these sequences from even fragmentary datasets. We also identify significant sequence conservation in the regions directly flanking miRNA genes, and show that this too is of utility in phylogenetic analysis, as well as highlighting conserved regions that will be of interest to other fields. Employing miRNA sequences from 12 sequenced drosophilid genomes, together with a Tribolium castaneum outgroup, we demonstrate that this approach is robust using Bayesian and maximum-likelihood methods. The utility of these characters is further demonstrated in the rhabditid nematodes and primates. As next-generation sequencing makes it more cost-effective to sequence genomes and small RNA libraries, this methodology provides an alternative data source for phylogenetic analysis. The approach allows rapid resolution of relationships between both closely related and rapidly evolving species, and provides an additional tool for investigation of relationships within the tree of life. PMID:25694624

  16. Phylogenetic relationships of Phytophthora ramorum, P. nemorosa, and P. pseudosyringae, three species recovered from areas in California with sudden oak death.

    PubMed

    Martin, Frank N; Tooley, Paul W

    2003-12-01

    Sudden oak death has been an emerging disease problem in coastal California and has caused significant losses in forest ecosystems in some regions of the state. The causal agent of this disease has been described as Phytophthora ramorum with two other less aggressive species, P. nemorosa and P. pseudosyringae, recovered from some symptomatic plants. The phylogenetic relationship of these species with other members of the genus was examined by sequence alignment of 667 bp of the mitochondrially-encoded cytochrome oxidase II gene and the nuclear encoded rDNA internal transcribed spacer region. P. ramorum was most closely related to P. hibernalis and P. lateralis in trees from both regions, although the specific relationship among species differed depending on the tree. In the cox II tree these species were on a single clade with P. lateralis basal to a group containing P. ramorum and P. hibernalis. On the maximum parsimony ITS tree P. ramorum was most closely affiliated with P. lateralis and in the same clade as P. hibernalis, but with maximum likelihood analysis P. ramorum was basal to a grouping of P. hibernalis and P. lateralis. While bootstrap support was strong for the grouping of these species together, it was not for determining the relationship among them. In contrast to the cox II tree, the clade containing these three species grouped with P. cryptogea, P. drechsleri, P. erythroseptica, and P. syringae in the ITS tree. Since the same isolates of these species were used for both the cox II and ITS sequence analysis, this difference in species grouping suggests either a differential rate of evolutionary divergence for these two regions, incorrect assumptions about alignment of ITS sequences, or different evolutionary histories of the regions under study. Analysis of combined cox II and ITS data sets gave trees where the relationships among these species were the same as for the ITS tree alone, although the results of the partition homogeneity test (P=0

  17. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided.

  18. Prioritizing Populations for Conservation Using Phylogenetic Networks

    PubMed Central

    Volkmann, Logan; Martyn, Iain; Moulton, Vincent; Spillner, Andreas; Mooers, Arne O.

    2014-01-01

    In the face of inevitable future losses to biodiversity, ranking species by conservation priority seems more than prudent. Setting conservation priorities within species (i.e., at the population level) may be critical as species ranges become fragmented and connectivity declines. However, existing approaches to prioritization (e.g., scoring organisms by their expected genetic contribution) are based on phylogenetic trees, which may be poor representations of differentiation below the species level. In this paper we extend evolutionary isolation indices used in conservation planning from phylogenetic trees to phylogenetic networks. Such networks better represent population differentiation, and our extension allows populations to be ranked in order of their expected contribution to the set. We illustrate the approach using data from two imperiled species: the spotted owl Strix occidentalis in North America and the mountain pygmy-possum Burramys parvus in Australia. Using previously published mitochondrial and microsatellite data, we construct phylogenetic networks and score each population by its relative genetic distinctiveness. In both cases, our phylogenetic networks capture the geographic structure of each species: geographically peripheral populations harbor less-redundant genetic information, increasing their conservation rankings. We note that our approach can be used with all conservation-relevant distances (e.g., those based on whole-genome, ecological, or adaptive variation) and suggest it be added to the assortment of tools available to wildlife managers for allocating effort among threatened populations. PMID:24586451

  19. Phylogenetic relationships between chlorophytes, chrysophytes, and oomycetes.

    PubMed Central

    Gunderson, J H; Elwood, H; Ingold, A; Kindle, K; Sogin, M L

    1987-01-01

    The phylogenetic relationships among the chlorophyte Chlamydomonas reinhardtii, the chrysophyte Ochromonas danica, and the oomycete Achyla bisexualis were explored by comparing the sequences of their small-subunit ribosomal RNA coding regions. Comparisons of similarity values or inspection of phylogenetic trees constructed by distance matrix methods reveal a very close relationship between oomycetes and chrysophytes. The separation of chrysophytes from chlorophytes is comparable to that of plants from animals, and both separations are far antedated by the divergence of a number of other protist groups. PMID:3475703

  20. Reasoning over Taxonomic Change: Exploring Alignments for the Perelleschus Use Case

    PubMed Central

    Franz, Nico M.; Chen, Mingmin; Yu, Shizhuo; Kianmajd, Parisa; Bowers, Shawn; Ludäscher, Bertram

    2015-01-01

    Classifications and phylogenetic inferences of organismal groups change in light of new insights. Over time these changes can result in an imperfect tracking of taxonomic perspectives through the re-/use of Code-compliant or informal names. To mitigate these limitations, we introduce a novel approach for aligning taxonomies through the interaction of human experts and logic reasoners. We explore the performance of this approach with the Perelleschus use case of Franz & Cardona-Duque (2013). The use case includes six taxonomies published from 1936 to 2013, 54 taxonomic concepts (i.e., circumscriptions of names individuated according to their respective source publications), and 75 expert-asserted Region Connection Calculus articulations (e.g., congruence, proper inclusion, overlap, or exclusion). An Open Source reasoning toolkit is used to analyze 13 paired Perelleschus taxonomy alignments under heterogeneous constraints and interpretations. The reasoning workflow optimizes the logical consistency and expressiveness of the input and infers the set of maximally informative relations among the entailed taxonomic concepts. The latter are then used to produce merge visualizations that represent all congruent and non-congruent taxonomic elements among the aligned input trees. In this small use case with 6-53 input concepts per alignment, the information gained through the reasoning process is on average one order of magnitude greater than in the input. The approach offers scalable solutions for tracking provenance among succeeding taxonomic perspectives that may have differential biases in naming conventions, phylogenetic resolution, ingroup and outgroup sampling, or ostensive (member-referencing) versus intensional (property-referencing) concepts and articulations. PMID:25700173

  1. Application of 16S rRNA, cytochrome b and control region sequences for understanding the phylogenetic relationships in Oryx species.

    PubMed

    Khan, H A; Arif, I A; Al Homaidan, A A; Al Farhan, A H

    2008-12-16

    The present study reports the application of mitochondrial markers for the molecular phylogeny of Oryx species, including the Arabian oryx (AO), scimitar-horned oryx (SHO) and plains oryx (PO), using the Addax as an outgroup. Sequences of three molecular markers, 16S rRNA, cytochrome b and a control region, for the above four taxa were aligned and the topologies of respective phylogenetic trees were compared. All these markers clearly differentiated the genus Addax from Oryx. However, for species-level grouping, while 16S rRNA and cytochrome b produced similar phylogeny (SHO grouped with PO), the control region grouped SHO with AO. Further studies are warranted to generate more sequencing data, apply multiple bioinformatics tools and to include relevant nuclear markers for phylogenetic analysis of Oryx species.

  2. Cryptosporidium is more closely related to the gregarines than to coccidia as shown by phylogenetic analysis of apicomplexan parasites inferred using small-subunit ribosomal RNA gene sequences.

    PubMed

    Carreno, R A; Martin, D S; Barta, J R

    1999-11-01

    The phylogenetic placement of gregarine parasites (Apicomplexa: Gregarinasina) within the Apicomplexa was derived by comparison of small-subunit ribosomal RNA gene sequences. Gregarine sequences were obtained from Gregarina niphandrodes Clopton, Percival, and Janovy, 1991, and Monocystis agilis Stein, 1848 (Eugregarinorida Léger 1900), as well as from Ophriocystis elektroscirrha McLaughlin and Myers, 1970 (Neogregarinorida Grassé 1953). The sequences were aligned with several other gregarine and apicomplexan sequences from GenBank and the resulting data matrix analyzed by parsimony and maximum-likelihood methods. The gregarines form a monophyletic clade that is a sister group to Cryptosporidium spp. The gregarine/ Cryptosporidium clade is separate from the other major apicomplexan clade containing the coccidia, adeleids, piroplasms, and haemosporinids. The trees indicate that the genus Cryptosporidium has a closer phylogenetic affinity with the gregarines than with the coccidia. These results do not support the present classification of the Cryptosporidiidae in the suborder Eimerioirina Léger, 1911.

  3. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals.

    PubMed

    Romiguier, Jonathan; Ranwez, Vincent; Delsuc, Frédéric; Galtier, Nicolas; Douzery, Emmanuel J P

    2013-09-01

    Despite the rapid increase of size in phylogenomic data sets, a number of important nodes on animal phylogeny are still unresolved. Among these, the rooting of the placental mammal tree is still a controversial issue. One difficulty lies in the pervasive phylogenetic conflicts among genes, with each one telling its own story, which may be reliable or not. Here, we identified a simple criterion, that is, the GC content, which substantially helps in determining which gene trees best reflect the species tree. We assessed the ability of 13,111 coding sequence alignments to correctly reconstruct the placental phylogeny. We found that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree. We interpret this GC effect mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. When we focused on the AT-richest fraction of the data set, the resolution level of the placental phylogeny was greatly increased, and a strong support was obtained in favor of an Afrotheria rooting, that is, Afrotheria as the sister group of all other placentals. We show that in mammals most conflicts among gene trees, which have so far hampered the resolution of the placental tree, are concentrated in the GC-rich regions of the genome. We argue that the GC content-because it is a reliable indicator of the long-term recombination rate-is an informative criterion that could help in identifying the most reliable molecular markers for species tree inference.

  4. Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.

    PubMed

    Zhao, Ya-E; Hu, Li; Ma, Jun-Xian

    2013-11-01

    Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.

  5. High-resolution SAR11 ecotype dynamics at the Bermuda Atlantic Time-series Study site by phylogenetic placement of pyrosequences.

    PubMed

    Vergin, Kevin L; Beszteri, Bánk; Monier, Adam; Thrash, J Cameron; Temperton, Ben; Treusch, Alexander H; Kilpert, Fabian; Worden, Alexandra Z; Giovannoni, Stephen J

    2013-07-01

    Advances in next-generation sequencing technologies are providing longer nucleotide sequence reads that contain more information about phylogenetic relationships. We sought to use this information to understand the evolution and ecology of bacterioplankton at our long-term study site in the Western Sargasso Sea. A bioinformatics pipeline called PhyloAssigner was developed to align pyrosequencing reads to a reference multiple sequence alignment of 16S ribosomal RNA (rRNA) genes and assign them phylogenetic positions in a reference tree using a maximum likelihood algorithm. Here, we used this pipeline to investigate the ecologically important SAR11 clade of Alphaproteobacteria. A combined set of 2.7 million pyrosequencing reads from the 16S rRNA V1-V2 regions, representing 9 years at the Bermuda Atlantic Time-series Study (BATS) site, was quality checked and parsed into a comprehensive bacterial tree, yielding 929 036 Alphaproteobacteria reads. Phylogenetic structure within the SAR11 clade was linked to seasonally recurring spatiotemporal patterns. This analysis resolved four new SAR11 ecotypes in addition to five others that had been described previously at BATS. The data support a conclusion reached previously that the SAR11 clade diversified by subdivision of niche space in the ocean water column, but the new data reveal a more complex pattern in which deep branches of the clade diversified repeatedly across depth strata and seasonal regimes. The new data also revealed the presence of an unrecognized clade of Alphaproteobacteria, here named SMA-1 (Sargasso Mesopelagic Alphaproteobacteria, group 1), in the upper mesopelagic zone. The high-resolution phylogenetic analyses performed herein highlight significant, previously unknown, patterns of evolutionary diversification, within perhaps the most widely distributed heterotrophic marine bacterial clade, and strongly links to ecosystem regimes.

  6. Phylogenetic signal in bone microstructure of sauropsids.

    PubMed

    Cubo, J; Ponton, F; Laurin, M; de Margerie, E; Castanet, J

    2005-08-01

    In spite of the fact that the potential usefulness of bone histology in systematics has been discussed for over one and a half centuries, the presence of a phylogenetic signal in the variation of histological characters has rarely been assessed. A quantitative assessment of phylogenetic signal in bone histological characters could provide a justification for performing optimizations of these traits onto independently generated phylogenetic trees (as has been done in recent years). Here we present an investigation on the quantification of the phylogenetic signal in the following bone histological, microanatomical, and morphological traits in a sample of femora of 35 species of sauropsids: vascular density, vascular orientation, index of Haversian remodeling, cortical thickness, and cross-sectional area (bone size). For this purpose, we use two methods, regressions on distance matrices tested for significance using permutations (a Mantel test) and random tree length distribution. Within sauropsids, these bone microstructural traits have an optimal systematic value in archosaurs. In this taxon, a Mantel test shows that the phylogeny explains 81.8% of the variation of bone size and 86.2% of the variation of cortical thickness. In contrast, a Mantel test suggests that the phylogenetic signal in histological traits is weak: although the phylogeny explains 18.7% of the variation of vascular density in archosaurs, the phylogenetic signal is not significant either for vascular orientation or for the index of Haversian remodeling. However, Mantel tests seem to underestimate the proportion of variance of the dependent character explained by the phylogeny, as suggested by a PVR (phylogenetic eigenvector) analysis. We also deal with some complementary questions. First, we evaluate the functional dependence of bone vascular density on bone size by using phylogenetically independent contrasts. Second, we perform a variation partitioning analysis and show that the phylogenetic

  7. Ribosomal ITS sequences allow resolution of freshwater sponge phylogeny with alignments guided by secondary structure prediction.

    PubMed

    Itskovich, Valeria; Gontcharov, Andrey; Masuda, Yoshiki; Nohno, Tsutomu; Belikov, Sergey; Efremova, Sofia; Meixner, Martin; Janussen, Dorte

    2008-12-01

    Freshwater sponges include six extant families which belong to the suborder Spongillina (Porifera). The taxonomy of freshwater sponges is problematic and their phylogeny and evolution are not well understood. Sequences of the ribosomal internal transcribed spacers (ITS1 and ITS2) of 11 species from the family Lubomirskiidae, 13 species from the family Spongillidae, and 1 species from the family Potamolepidae were obtained to study the phylogenetic relationships between endemic and cosmopolitan freshwater sponges and the evolution of sponges in Lake Baikal. The present study is the first one where ITS1 sequences were successfully aligned using verified secondary structure models and, in combination with ITS2, used to infer relationships between the freshwater sponges. Phylogenetic trees inferred using maximum likelihood, neighbor-joining, and parsimony methods and Bayesian inference revealed that the endemic family Lubomirskiidae was monophyletic. Our results do not support the monophyly of Spongillidae because Lubomirskiidae formed a robust clade with E. muelleri, and Trochospongilla latouchiana formed a robust clade with the outgroup Echinospongilla brichardi (Potamolepidae). Within the cosmopolitan family Spongillidae the genera Radiospongilla and Eunapius were found to be monophyletic, while Ephydatia muelleri was basal to the family Lubomirskiidae. The genetic distances between Lubomirskiidae species being much lower than those between Spongillidae species are indicative of their relatively recent radiation from a common ancestor. These results indicated that rDNA spacers sequences can be useful in the study of phylogenetic relationships of and the identification of species of freshwater sponges.

  8. The Inference of Gene Trees with Species Trees

    PubMed Central

    Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

    2015-01-01

    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970

  9. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  10. Alignment-free sequence comparison based on next-generation sequencing reads.

    PubMed

    Song, Kai; Ren, Jie; Zhai, Zhiyuan; Liu, Xuemei; Deng, Minghua; Sun, Fengzhu

    2013-02-01

    Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D(2), D(*)(2) and D(s)(2), both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both D(*)(2) and D(s)(2), outperform D(2) for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of D(*)(2) and D(s)(2). Finally, variations of these statistics, d(2), d(*)(2) and d(s)(2), respectively, are used to first cluster five mammalian species with known phylogenetic relationships, and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using d(s)(2) are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic d(s)(2) provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.

  11. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-05

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

  12. Metabarcoding of marine nematodes – evaluation of reference datasets used in tree-based taxonomy assignment approach

    PubMed Central

    2016-01-01

    Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919

  13. 11. GAS STATION AND OLD ROAD ALIGNMENT, FACING S. VISITOR ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    11. GAS STATION AND OLD ROAD ALIGNMENT, FACING S. VISITOR CENTER BEHIND TREES. SAME CAMERA POSITION AS AZ-45-10. - South Entrance Road, Between South park boundary & Village Loop Road, Grand Canyon, Coconino County, AZ

  14. BranchClust: a phylogenetic algorithm for selecting gene families

    PubMed Central

    Poptsova, Maria S; Gogarten, J Peter

    2007-01-01

    Background Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved. Results Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at . Conclusion BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees. PMID:17425803

  15. A Deliberate Practice Approach to Teaching Phylogenetic Analysis

    PubMed Central

    Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.

    2013-01-01

    One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or “one-shot,” in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts. PMID:24297294

  16. A deliberate practice approach to teaching phylogenetic analysis.

    PubMed

    Hobbs, F Collin; Johnson, Daniel J; Kearns, Katherine D

    2013-01-01

    One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or "one-shot," in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts.

  17. Exploration of phylogenetic data using a global sequence analysis method

    PubMed Central

    Chapus, Charles; Dufraigne, Christine; Edwards, Scott; Giron, Alain; Fertil, Bernard; Deschavanne, Patrick

    2005-01-01

    Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis. PMID:16280081

  18. Probabilistic graphical model representation in phylogenetics.

    PubMed

    Höhna, Sebastian; Heath, Tracy A; Boussau, Bastien; Landis, Michael J; Ronquist, Fredrik; Huelsenbeck, John P

    2014-09-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis-Hastings or Gibbs sampling of the posterior distribution.

  19. Exact solutions for species tree inference from discordant gene trees.

    PubMed

    Chang, Wen-Chieh; Górecki, Paweł; Eulenstein, Oliver

    2013-10-01

    Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.

  20. Rapid phylogenetic and functional classification of short genomic fragments with signature peptides

    PubMed Central

    2012-01-01

    Background Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers. Results At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database. Conclusions Classification by exact matching against a precomputed list of signature peptides provides comparable

  1. LETTER TO THE EDITOR: Quantum field theory and phylogenetic branching

    NASA Astrophysics Data System (ADS)

    Jarvis, P. D.; Bashford, J. D.

    2001-12-01

    A calculational framework is proposed for phylogenetics, using nonlocal quantum field theories in hypercubic geometry. Quadratic terms in the Hamiltonian give the underlying Markov dynamics, while higher degree terms represent branching events. The spatial dimension L is the number of leaves of the evolutionary tree under consideration. Momentum conservation modulo ←1 scattering corresponds to tree edge labelling using binary L-vectors. The bilocal quadratic term allows for momentum-dependent rate constants - only the tree or trees compatible with selected nonzero edge rates contribute to the branching probability distribution. Applications to models of evolutionary branching processes are discussed.

  2. Gene tree discordance of wild and cultivated Asian rice deciphered by genome-wide sequence comparison.

    PubMed

    Yang, Ching-chia; Sakai, Hiroaki; Numa, Hisataka; Itoh, Takeshi

    2011-05-15

    Although a large number of genes are expected to correctly solve a phylogenetic relationship, inconsistent gene tree topologies have been observed. This conflicting evidence in gene tree topologies, known as gene tree discordance, becomes increasingly important as advanced sequencing technologies produce an enormous amount of sequence information for phylogenomic studies among closely related species. Here, we aim to characterize the gene tree discordance of the Asian cultivated rice Oryza sativa and its progenitor, O. rufipogon, which will be an ideal case study of gene tree discordance. Using genome and cDNA sequences of O. sativa and O. rufipogon, we have conducted the first in-depth analyses of gene tree discordance in Asian rice. Our comparison of full-length cDNA sequences of O. rufipogon with the genome sequences of the japonica and indica cultivars of O. sativa revealed that 60% of the gene trees showed a topology consistent with the expected one, whereas the remaining genes supported significantly different topologies. Moreover, the proportions of the topologies deviated significantly from expectation, suggesting at least one hybridization event between the two subgroups of O. sativa, japonica and indica. In fact, a genome-wide alignment between japonica and indica indicated that significant portions of the indica genome are derived from japonica. In addition, literature concerning the pedigree of the indica cultivar strongly supported the hybridization hypothesis. Our molecular evolutionary analyses deciphered complicated evolutionary processes in closely related species. They also demonstrated the importance of gene tree discordance in the era of high-speed DNA sequencing.

  3. Application of proteomics in phylogenetic and evolutionary studies.

    PubMed

    Navas, Alfonso; Albar, Juan Pablo

    2004-02-01

    There are few papers that deal specifically with evolutionary studies and proteomics. However, applying proteomics to these studies promises to open new perspectives apropos the construction of phylogenetic trees and the detection of evolutionary changes. Principles and methods of phylogenetic systematics could be used to compare and evaluate proteomes. This would permit the detection and characterization of specific proteins that have evolutionary value in defining monophyly, paraphyly, and polyphyly.

  4. Profiling phylogenetic informativeness.

    PubMed

    Townsend, Jeffrey P

    2007-04-01

    The resolution of four controversial topics in phylogenetic experimental design hinges upon the informativeness of characters about the historical relationships among taxa. These controversies regard the power of different classes of phylogenetic character, the relative utility of increased taxonomic versus character sampling, the differentiation between lack of phylogenetic signal and a historical rapid radiation, and the design of taxonomically broad phylogenetic studies optimized by taxonomically sparse genome-scale data. Quantification of the informativeness of characters for resolution of phylogenetic hypotheses during specified historical epochs is key to the resolution of these controversies. Here, such a measure of phylogenetic informativeness is formulated. The optimal rate of evolution of a character to resolve a dated four-taxon polytomy is derived. By scaling the asymptotic informativeness of a character evolving at a nonoptimal rate by the derived asymptotic optimum, and by normalizing so that net phylogenetic informativeness is equivalent for all rates when integrated across all of history, an informativeness profile across history is derived. Calculation of the informativeness per base pair allows estimation of the cost-effectiveness of character sampling. Calculation of the informativeness per million years allows comparison across historical radiations of the utility of a gene for the inference of rapid adaptive radiation. The theory is applied to profile the phylogenetic informativeness of the genes BRCA1, RAG1, GHR, and c-myc from a muroid rodent sequence data set. Bounded integrations of the phylogenetic profile of these genes over four epochs comprising the diversifications of the muroid rodents, the mammals, the lobe-limbed vertebrates, and the early metazoans demonstrate the differential power of these genes to resolve the branching order among ancestral lineages. This measure of phylogenetic informativeness yields a new kind of information

  5. Genomic Repeat Abundances Contain Phylogenetic Signal

    PubMed Central

    Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.

    2015-01-01

    A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464

  6. Tanglegrams: a Reduction Tool for Mathematical Phylogenetics.

    PubMed

    Matsen, Frederick; Billey, Sara; Kas, Arnold; Konvalinka, Matjaz

    2016-10-03

    Many discrete mathematics problems in phylogenetics are defined in terms of the relative labeling of pairsof leaf-labeled trees. These relative labelings are naturally formalized as tanglegrams, which have previously been an object of study in coevolutionary analysis. Although there has been considerable work on planar drawings of tanglegrams, they have not been fully explored as combinatorial objects until recently. In this paper, we describe how many discrete mathematical questions on trees "factor" through a problem on tanglegrams, and how understanding that factoring can simplify analysis. Depending on the problem, it may be useful to consider a unordered version of tanglegrams, and/or their unrooted counterparts. For all of these definitions, we show how the isomorphism types of tanglegrams can be understood in terms of double cosets of the symmetric group, and we investigate their automorphisms. Understanding tanglegrams better will isolate the distinct problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces associated with such problems.

  7. Mitochondrial phylogeny of the Chrysisignita (Hymenoptera: Chrysididae) species group based on simultaneous Bayesian alignment and phylogeny reconstruction.

    PubMed

    Soon, Villu; Saarma, Urmas

    2011-07-01

    The ignita species group within the genus Chrysis includes over 100 cuckoo wasp species, which all lead a parasitic lifestyle and exhibit very similar morphology. The lack of robust, diagnostic morphological characters has hindered phylogenetic reconstructions and contributed to frequent misidentification and inconsistent interpretations of species in this group. Therefore, molecular phylogenetic analysis is the most suitable approach for resolving the phylogeny and taxonomy of this group. We present a well-resolved phylogeny of the Chrysis ignita species group based on mitochondrial sequence data from 41 ingroup and six outgroup taxa. Although our emphasis was on European taxa, we included samples from most of the distribution range of the C. ignita species group to test for monophyly. We used a continuous mitochondrial DNA sequence consisting of 16S rRNA, tRNA(Val), 12S rRNA and ND4. The location of the ND4 gene at the 3' end of this continuous sequence, following 12S rRNA, represents a novel mitochondrial gene arrangement for insects. Due to difficulties in aligning rRNA genes, two different Bayesian approaches were employed to reconstruct phylogeny: (1) using a reduced data matrix including only those positions that could be aligned with confidence; or (2) using the full sequence dataset while estimating alignment and phylogeny simultaneously. In addition maximum-parsimony and maximum-likelihood analyses were performed to test the robustness of the Bayesian approaches. Although all approaches yielded trees with similar topology, considerably more nodes were resolved with analyses using the full data matrix. Phylogenetic analysis supported the monophyly of the C. ignita species group and divided its species into well-supported clades. The resultant phylogeny was only partly in accordance with published subgroupings based on morphology. Our results suggest that several taxa currently treated as subspecies or names treated as synonyms may in fact constitute

  8. Two Hybrid Algorithms for Multiple Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Naznin, Farhana; Sarker, Ruhul; Essam, Daryl

    2010-01-01

    In order to design life saving drugs, such as cancer drugs, the design of Protein or DNA structures has to be accurate. These structures depend on Multiple Sequence Alignment (MSA). MSA is used to find the accurate structure of Protein and DNA sequences from existing approximately correct sequences. To overcome the overly greedy nature of the well known global progressive alignment method for multiple sequence alignment, we have proposed two different algorithms in this paper; one is using an iterative approach with a progressive alignment method (PAMIM) and the second one is using a genetic algorithm with a progressive alignment method (PAMGA). Both of our methods started with a "kmer" distance table to generate single guide-tree. In the iterative approach, we have introduced two new techniques: the first technique is to generate Guide-trees with randomly selected sequences and the second is of shuffling the sequences inside that tree. The output of the tree is a multiple sequence alignment which has been evaluated by the Sum of Pairs Method (SPM) considering the real value data from PAM250. In our second GA approach, these two techniques are used to generate an initial population and also two different approaches of genetic operators are implemented in crossovers and mutation. To test the performance of our two algorithms, we have compared these with the existing well known methods: T-Coffee, MUSCEL, MAFFT and Probcon, using BAliBase benchmarks. The experimental results show that the first algorithm works well for some situations, where other existing methods face difficulties in obtaining better solutions. The proposed second method works well compared to the existing methods for all situations and it shows better performance over the first one.

  9. Sequence alignment of 18S ribosomal RNA and the basal relationships of Adephagan beetles: evidence for monophyly of aquatic families and the placement of Trachypachidae.

    PubMed

    Shull, V L; Vogler, A P; Baker, M D; Maddison, D R; Hammond, P M

    2001-01-01

    Current hypotheses regarding family relationships in the suborder Adephaga (Coleoptera) are conflicting. Here we report full-length 18S ribosomal RNA sequences of 39 adephagans and 13 outgroup taxa. Data analysis focused on the impact of sequence alignment on tree topology, using two principally different approaches. Tree alignments, which seek to minimize indels and substitutions on the tree in a single step, as implemented in an approximate procedure by the computer program POY, were contrasted with a more traditional procedure based on alignments followed by phylogenetic inference based on parsimony, likelihood, and distance analyses. Despite substantial differences between the procedures, phylogenetic conclusions regarding basal relationships within Adephaga and relationships between the four suborders of Coleoptera were broadly similar. The analysis weakly supports monophyly of Adephaga, with Polyphaga usually as its sister, and the two small suborders Myxophaga and Archostemata basal to them. In some analyses, however, Polyphaga was reconstructed as having arisen from within Hydradephaga. Adephaga generally split into two monophyletic groups, corresponding to the terrestrial Geadephaga and the aquatic Hydradephaga, as initially proposed by Crowson in 1955, consistent with a single colonization of the aquatic environment by adephagan ancestors and contradicting the recent proposition of three independent invasions. A monophyletic Hydradephaga is consistently, though not strongly, supported under most analyses, and a parametric bootstrapping test significantly rejects an hypothesis of nonmonophyly. The enigmatic Trachypachidae, which exhibit many similarities to aquatic forms but whose species are entirely terrestrial, were usually recovered as a basal lineage within Geadephaga. Strong evidence opposes the view that terrestrial trachypachids are related to the dytiscoid water beetles.

  10. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

    PubMed

    Allman, Elizabeth S; Degnan, James H; Rhodes, John A

    2011-06-01

    Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals-each with many genes-splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

  11. Evolutionary relationships of the Critically Endangered frog Ericabatrachus baleensis Largen, 1991 with notes on incorporating previously unsampled taxa into large-scale phylogenetic analyses

    PubMed Central

    2014-01-01

    Background The phylogenetic relationships of many taxa remain poorly known because of a lack of appropriate data and/or analyses. Despite substantial recent advances, amphibian phylogeny remains poorly resolved in many instances. The phylogenetic relationships of the Ethiopian endemic monotypic genus Ericabatrachus has been addressed thus far only with phenotypic data and remains contentious. Results We obtained fresh samples of the now rare and Critically Endangered Ericabatrachus baleensis and generated DNA sequences for two mitochondrial and four nuclear genes. Analyses of these new data using de novo and constrained-tree phylogenetic reconstructions strongly support a close relationship between Ericabatrachus and Petropedetes, and allow us to reject previously proposed alternative hypotheses of a close relationship with cacosternines or Phrynobatrachus. Conclusions We discuss the implications of our results for the taxonomy, biogeography and conservation of E. baleensis, and suggest a two-tiered approach to the inclusion and analyses of new data in order to assess the phylogenetic relationships of previously unsampled taxa. Such approaches will be important in the future given the increasing availability of relevant mega-alignments and potential framework phylogenies. PMID:24612655

  12. Phylogenetic mixtures and linear invariants for equal input models.

    PubMed

    Casanellas, Marta; Steel, Mike

    2016-09-07

    The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the 'equal input model'. This model generalizes the 'Felsenstein 1981' model (and thereby the Jukes-Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a 'random cluster' process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees-the so called 'model invariants'), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of [Formula: see text] leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167-191, 1987).

  13. Phylogenetic Relationships and Species Delimitation in Pinus Section Trifoliae Inferrred from Plastid DNA

    PubMed Central

    Hernández-León, Sergio; Gernandt, David S.; Pérez de la Rosa, Jorge A.; Jardón-Barbolla, Lev

    2013-01-01

    Recent diversification followed by secondary contact and hybridization may explain complex patterns of intra- and interspecific morphological and genetic variation in the North American hard pines (Pinus section Trifoliae), a group of approximately 49 tree species distributed in North and Central America and the Caribbean islands. We concatenated five plastid DNA markers for an average of 3.9 individuals per putative species and assessed the suitability of the five regions as DNA bar codes for species identification, species delimitation, and phylogenetic reconstruction. The ycf1 gene accounted for the greatest proportion of the alignment (46.9%), the greatest proportion of variable sites (74.9%), and the most unique sequences (75 haplotypes). Phylogenetic analysis recovered clades corresponding to subsections Australes, Contortae, and Ponderosae. Sequences for 23 of the 49 species were monophyletic and sequences for another 9 species were paraphyletic. Morphologically similar species within subsections usually grouped together, but there were exceptions consistent with incomplete lineage sorting or introgression. Bayesian relaxed molecular clock analyses indicated that all three subsections diversified relatively recently during the Miocene. The general mixed Yule-coalescent method gave a mixed model estimate of only 22 or 23 evolutionary entities for the plastid sequences, which corresponds to less than half the 49 species recognized based on morphological species assignments. Including more unique haplotypes per species may result in higher estimates, but low mutation rates, recent diversification, and large effective population sizes may limit the effectiveness of this method to detect evolutionary entities. PMID:23936218

  14. Phylogenetic relationships and species delimitation in pinus section trifoliae inferrred from plastid DNA.

    PubMed

    Hernández-León, Sergio; Gernandt, David S; Pérez de la Rosa, Jorge A; Jardón-Barbolla, Lev

    2013-01-01

    Recent diversification followed by secondary contact and hybridization may explain complex patterns of intra- and interspecific morphological and genetic variation in the North American hard pines (Pinus section Trifoliae), a group of approximately 49 tree species distributed in North and Central America and the Caribbean islands. We concatenated five plastid DNA markers for an average of 3.9 individuals per putative species and assessed the suitability of the five regions as DNA bar codes for species identification, species delimitation, and phylogenetic reconstruction. The ycf1 gene accounted for the greatest proportion of the alignment (46.9%), the greatest proportion of variable sites (74.9%), and the most unique sequences (75 haplotypes). Phylogenetic analysis recovered clades corresponding to subsections Australes, Contortae, and Ponderosae. Sequences for 23 of the 49 species were monophyletic and sequences for another 9 species were paraphyletic. Morphologically similar species within subsections usually grouped together, but there were exceptions consistent with incomplete lineage sorting or introgression. Bayesian relaxed molecular clock analyses indicated that all three subsections diversified relatively recently during the Miocene. The general mixed Yule-coalescent method gave a mixed model estimate of only 22 or 23 evolutionary entities for the plastid sequences, which corresponds to less than half the 49 species recognized based on morphological species assignments. Including more unique haplotypes per species may result in higher estimates, but low mutation rates, recent diversification, and large effective population sizes may limit the effectiveness of this method to detect evolutionary entities.

  15. Identification, phylogenetic evolutionary analysis of GDQY orf virus isolated from Qingyuan City, Guangdong Province, southern China.

    PubMed

    Duan, Chaohui; Liao, Meiying; Wang, Han; Luo, Xiaohong; Shao, Jing; Xu, Ying; Li, Wei; Hao, Wenbo; Luo, Shuhong

    2015-01-25

    Infection with the orf virus (ORFV) leads to contagious ecthyma, also called contagious pustular dermatitis, which usually affects sheep, goats and other small ruminants. It has a great distribution throughout the world and has also been reported to infect humans. Though many strains have been isolated from differing parts of mainland China, rarely has any strain been reported from the southern provinces of China. We studied a case of orf virus infection that occurred at Qingyuan City, Guangdong Province in southern China. An orf virus strain, GDQY, was successfully isolated and identified through cell culture techniques and transmission electron microscopy. Complete genes of ORFV011, ORFV059, ORFV106 and ORFV107 were amplified for the sequence analysis based on their nucleotide or amino acid level. In order to discuss the genetic variation, precise sequences were used to compare to other reference strains isolated from different districts or countries. Phylogenetic trees based on those strains were built up and evolutionary distances were calculated based on the alignment of their complete sequences. The typical structure of the orf virus was observed in cell-culture suspensions inoculated with GDQY, and the full-length of four genes was amplified and sequenced. Phylogenetic analysis indicated that GDQY is homologous to FJ-DS and CQ/WZ on ORFV011 nucleotides. ORFV059 may be more variable than ORFV011 based on the comparison between GDQY and other isolates. Genetic studies of ORFV106 and 107 are reported for the first time in the presented study.

  16. TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics

    PubMed Central

    Jobb, Gangolf; von Haeseler, Arndt; Strimmer, Korbinian

    2004-01-01

    Background Most analysis programs for inferring molecular phylogenies are difficult to use, in particular for researchers with little programming experience. Results TREEFINDER is an easy-to-use integrative platform-independent analysis environment for molecular phylogenetics. In this paper the main features of TREEFINDER (version of April 2004) are described. TREEFINDER is written in ANSI C and Java and implements powerful statistical approaches for inferring gene tree and related analyzes. In addition, it provides a user-friendly graphical interface and a phylogenetic programming language. Conclusions TREEFINDER is a versatile framework for analyzing phylogenetic data across different platforms that is suited both for exploratory as well as advanced studies. PMID:15222900

  17. Identification of a Bacteria Using Phylogenetic Relationships Revealed by MS/MS Sequencing of Tryptic Peptides Derived From Cellular Proteins

    DTIC Science & Technology

    2004-12-01

    phylogenetic relationships between bacterial species as a part of a hierarchical decision tree process. 1. INTRODUCTION The detection and...1 IDENTIFICATION OF BACTERIA USING PHYLOGENETIC RELATIONSHIPS REVEALED BY MS/MS SEQUENCING OF TRYPTIC PEPTIDES DERIVED FROM CELLULAR PROTEINS...based on analysis of an electrospray ionization (ESI)-MS/MS data for the fast classification of analyzed bacteria, using phylogenetic relationships

  18. Phylogenetic lineages in Entomophthoromycota

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Entomophthoromycota Humber is one of five major phylogenetic lineages among the former phylum Zygomycota. These early terrestrial fungi share evolutionarily ancestral characters such as coenocytic mycelium and gametangiogamy as a sexual process resulting in zygospore formation. Previous molecular st...

  19. 6. Aerial view of turnpike alignment running from lower left ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. Aerial view of turnpike alignment running from lower left diagonally up to right along row of trees. Migel Estate and Farm buildings (HABS No. NY-6356) located at lower right of photograph. W.K. Smith house (HABS No. NY-6356-A) located within clump of trees at lower center, with poultry houses (HABS No. NY-6356-F and G) visible left of the clump of trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY

  20. Erasing Errors due to Alignment Ambiguity When Estimating Positive Selection

    PubMed Central

    Redelings, Benjamin

    2014-01-01

    Current estimates of diversifying positive selection rely on first having an accurate multiple sequence alignment. Simulation studies have shown that under biologically plausible conditions, relying on a single estimate of the alignment from commonly used alignment software can lead to unacceptably high false-positive rates in detecting diversifying positive selection. We present a novel statistical method that eliminates excess false positives resulting from alignment error by jointly estimating the degree of positive selection and the alignment under an evolutionary model. Our model treats both substitutions and insertions/deletions as sequence changes on a tree and allows site heterogeneity in the substitution process. We conduct inference starting from unaligned sequence data by integrating over all alignments. This approach naturally accounts for ambiguous alignments without requiring ambiguously aligned sites to be identified and removed prior to analysis. We take a Bayesian approach and conduct inference using Markov chain Monte Carlo to integrate over all alignments on a fixed evolutionary tree topology. We introduce a Bayesian version of the branch-site test and assess the evidence for positive selection using Bayes factors. We compare two models of differing dimensionality using a simple alternative to reversible-jump methods. We also describe a more accurate method of estimating the Bayes factor using Rao-Blackwellization. We then show using simulated data that jointly estimating the alignment and the presence of positive selection solves the problem with excessive false positives from erroneous alignments and has nearly the same power to detect positive selection as when the true alignment is known. We also show that samples taken from the posterior alignment distribution using the software BAli-Phy have substantially lower alignment error compared with MUSCLE, MAFFT, PRANK, and FSA alignments. PMID:24866534

  1. Phylogenetic Stochastic Mapping Without Matrix Exponentiation

    PubMed Central

    Irvahn, Jan; Minin, Vladimir N.

    2014-01-01

    Abstract Phylogenetic stochastic mapping is a method for reconstructing the history of trait changes on a phylogenetic tree relating species/organism carrying the trait. State-of-the-art methods assume that the trait evolves according to a continuous-time Markov chain (CTMC) and works well for small state spaces. The computations slow down considerably for larger state spaces (e.g., space of codons), because current methodology relies on exponentiating CTMC infinitesimal rate matrices—an operation whose computational complexity grows as the size of the CTMC state space cubed. In this work, we introduce a new approach, based on a CTMC technique called uniformization, which does not use matrix exponentiation for phylogenetic stochastic mapping. Our method is based on a new Markov chain Monte Carlo (MCMC) algorithm that targets the distribution of trait histories conditional on the trait data observed at the tips of the tree. The computational complexity of our MCMC method grows as the size of the CTMC state space squared. Moreover, in contrast to competing matrix exponentiation methods, if the rate matrix is sparse, we can leverage this sparsity and increase the computational efficiency of our algorithm further. Using simulated data, we illustrate advantages of our MCMC algorithm and investigate how large the state space needs to be for our method to outperform matrix exponentiation approaches. We show that even on the moderately large state space of codons our MCMC method can be significantly faster than currently used matrix exponentiation methods. PMID:24918812

  2. Phylogenetic analysis of cubilin (CUBN) gene

    PubMed Central

    Shaik, Abjal Pasha; Alsaeed, Abbas H; Kiranmayee, S; Bammidi, VK; Sultana, Asma

    2013-01-01

    Cubilin, (CUBN; also known as intrinsic factor-cobalamin receptor [Homo sapiens Entrez Pubmed ref NM_001081.3; NG_008967.1; GI: 119606627]), located in the epithelium of intestine and kidney acts as a receptor for intrinsic factor – vitamin B12 complexes. Mutations in CUBN may play a role in autosomal recessive megaloblastic anemia. The current study investigated the possible role of CUBN in evolution using phylogenetic testing. A total of 588 BLAST hits were found for the cubilin query sequence and these hits showed putative conserved domain, CUB superfamily (as on 27th Nov 2012). A first-pass phylogenetic tree was constructed to identify the taxa which most often contained the CUBN sequences. Following this, we narrowed down the search by manually deleting sequences which were not CUBN. A repeat phylogenetic analysis of 25 taxa was performed using PhyML, RAxML and TreeDyn softwares to confirm that CUBN is a conserved protein emphasizing its importance as an extracellular domain and being present in proteins mostly known to be involved in development in many chordate taxa but not found in prokaryotes, plants and yeast.. No horizontal gene transfers have been found between different taxa. PMID:23390341

  3. Phylogenetic analysis of cubilin (CUBN) gene.

    PubMed

    Shaik, Abjal Pasha; Alsaeed, Abbas H; Kiranmayee, S; Bammidi, Vk; Sultana, Asma

    2013-01-01

    Cubilin, (CUBN; also known as intrinsic factor-cobalamin receptor [Homo sapiens Entrez Pubmed ref NM_001081.3; NG_008967.1; GI: 119606627]), located in the epithelium of intestine and kidney acts as a receptor for intrinsic factor - vitamin B12 complexes. Mutations in CUBN may play a role in autosomal recessive megaloblastic anemia. The current study investigated the possible role of CUBN in evolution using phylogenetic testing. A total of 588 BLAST hits were found for the cubilin query sequence and these hits showed putative conserved domain, CUB superfamily (as on 27(th) Nov 2012). A first-pass phylogenetic tree was constructed to identify the taxa which most often contained the CUBN sequences. Following this, we narrowed down the search by manually deleting sequences which were not CUBN. A repeat phylogenetic analysis of 25 taxa was performed using PhyML, RAxML and TreeDyn softwares to confirm that CUBN is a conserved protein emphasizing its importance as an extracellular domain and being present in proteins mostly known to be involved in development in many chordate taxa but not found in prokaryotes, plants and yeast.. No horizontal gene transfers have been found between different taxa.

  4. The molecular symplesiomorphies shared by the stem groups of metazoan evolution: can sites as few as 1% have a significant impact on recognizing the phylogenetic position of myzostomida?

    PubMed

    Wang, Yanhui; Xie, Qiang

    2014-08-01

    Although it is clear that taxon sampling, alignments, gene sampling, tree reconstruction methods and the total length of the sequences used are critical to the reconstruction of evolutionary history, weakly supported or misleading nodes exist in phylogenetic studies with no obvious flaw in those aspects. The phylogenetic studies focusing on the basal part of bilaterian evolution are such a case. During the past decade, Myzostomida has appeared in the basal part of Bilateria in several phylogenetic studies of Metazoa. However, most researchers have entertained only two competing hypotheses about the position of Myzostomida-an affinity with Annelida and an affinity with Platyhelminthes. In this study, dozens of symplesiomorphies were discovered by means of ancestral state reconstruction in the complete 18S and 28S rDNAs shared by the stem groups of Metazoa. By contrastive analysis on the datasets with or without such symplesiomorphic sites, we discovered that Myzostomida and other basal groups are basal lineages of Bilateria due to the corresponding symplesiomorphies shared with earlier lineages. As such, symplesiomorphies account for approximately 1-2% of the whole dataset have an essential impact on phylogenetic inference, and this study reminds molecular systematists of the importance of carrying out ancestral state reconstruction at each site in sequence-based phylogenetic studies. In addition, reasons should be explored for the low support of the hypothesis that Myzostomida belongs to Annelida in the results of phylogenomic studies. Future phylogenetic studies concerning Myzostomida should include all of the basal lineages of Bilateria to avoid directly neglecting the stand-alone basal position of Myzostomida as a potential hypothesis.

  5. Feasibility and effectiveness of a brief, intensive phylogenetics workshop in a middle-income country.

    PubMed

    Pollett, S; Leguia, M; Nelson, M I; Maljkovic Berry, I; Rutherford, G; Bausch, D G; Kasper, M; Jarman, R; Melendrez, M

    2016-01-01

    There is an increasing role for bioinformatic and phylogenetic analysis in tropical medicine research. However, scientists working in low- and middle-income regions may lack access to training opportunities in these methods. To help address this gap, a 5-day intensive bioinformatics workshop was offered in Lima, Peru. The syllabus is presented here for others who want to develop similar programs. To assess knowledge gained, a 20-point knowledge questionnaire was administered to participants (21 participants) before and after the workshop, covering topics on sequence quality control, alignment/formatting, database retrieval, models of evolution, sequence statistics, tree building, and results interpretation. Evolution/tree-building methods represented the lowest scoring domain at baseline and after the workshop. There was a considerable median gain in total knowledge scores (increase of 30%, p<0.001) with gains as high as 55%. A 5-day workshop model was effective in improving the pathogen-applied bioinformatics knowledge of scientists working in a middle-income country setting.

  6. Cloning, in Vitro expression, and novel phylogenetic classification of a channel catfish estrogen receptor

    USGS Publications Warehouse

    Xia, Z.; Patino, R.; Gale, W.L.; Maule, A.G.; Densmore, L.D.

    1999-01-01

    We obtained two channel catfish estrogen receptor (ccER) cDNA from liver of female fish using RT–PCR. The two fragments were identical in sequence except that the smaller one had an out-of-frame deletion in the E domain, suggesting the existence of ccER splice variants. The larger fragment was used to screen a cDNA library from liver of a prepubescent female. A cDNA was obtained that encoded a 581-amino-acid ER with a deduced molecular weight of 63.8 kDa. Extracts of COS-7 cells transfected with ccER cDNA bound estrogen with high affinity (Kd = 4.7 nM) and specificity. Maximum parsimony and Neighbor Joining analyses were used to generate a phylogenetic classification of ccER on the basis of 18 full-length ER sequences. The tree suggested the existence of two major ER branches. One branch contained two clearly divergent clades which included all piscine ER (except Japanese eel ER) and all tetrapod ERα, respectively. The second major branch contained the eel ER and the mammalian ERβ. The high degree of divergence between the eel ER and mammalian ERβ suggested that they also represent distinct piscine and tetrapod ER. These data suggest that ERα and ERβ are present throughout vertebrates and that these two major ER types evolved by duplication of an ancestral ER gene. Sequence alignments with other members of the nuclear hormone receptor superfamily indicated the presence of 8 amino acids in the E domain that align exclusively among ER. Four of these amino acids have not received prior research attention and their function is unknown. The novel finding of putative ER splice variants in a nonmammalian vertebrate and the novel phylogenetic classification of ER offer new perspectives in understanding the diversification and function of ER.

  7. A Distance Measure for Genome Phylogenetic Analysis

    NASA Astrophysics Data System (ADS)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  8. Progressive alignment of genomic signals by multiple dynamic time warping.

    PubMed

    Skutkova, Helena; Vitek, Martin; Sedlar, Karel; Provaznik, Ivo

    2015-11-21

    This paper presents the utilization of progressive alignment principle for positional adjustment of a set of genomic signals with different lengths. The new method of multiple alignment of signals based on dynamic time warping is tested for the purpose of evaluating the similarity of different length genes in phylogenetic studies. Two sets of phylogenetic markers were used to demonstrate the effectiveness of the evaluation of intraspecies and interspecies genetic variability. The part of the proposed method is modification of pairwise alignment of two signals by dynamic time warping with using correlation in a sliding window. The correlation based dynamic time warping allows more accurate alignment dependent on local homologies in sequences without the need of scoring matrix or evolutionary models, because mutual similarities of residues are included in the numerical code of signals.

  9. On the analysis of phylogenetically paired designs

    PubMed Central

    Funk, Jennifer L; Rakovski, Cyril S; Macpherson, J Michael

    2015-01-01

    As phylogenetically controlled experimental designs become increasingly common in ecology, the need arises for a standardized statistical treatment of these datasets. Phylogenetically paired designs circumvent the need for resolved phylogenies and have been used to compare species groups, particularly in the areas of invasion biology and adaptation. Despite the widespread use of this approach, the statistical analysis of paired designs has not been critically evaluated. We propose a mixed model approach that includes random effects for pair and species. These random effects introduce a “two-layer” compound symmetry variance structure that captures both the correlations between observations on related species within a pair as well as the correlations between the repeated measurements within species. We conducted a simulation study to assess the effect of model misspecification on Type I and II error rates. We also provide an illustrative example with data containing taxonomically similar species and several outcome variables of interest. We found that a mixed model with species and pair as random effects performed better in these phylogenetically explicit simulations than two commonly used reference models (no or single random effect) by optimizing Type I error rates and power. The proposed mixed model produces acceptable Type I and II error rates despite the absence of a phylogenetic tree. This design can be generalized to a variety of datasets to analyze repeated measurements in clusters of related subjects/species. PMID:25750719

  10. Phylogenetic analysis of the Australian rosella parrots (Platycercus) reveals discordance among molecules and plumage.

    PubMed

    Shipham, Ashlee; Schmidt, Daniel J; Joseph, Leo; Hughes, Jane M

    2015-10-01

    Relationships and species limits among the colourful Australian parrots known as rosellas (Platycercus) are contentious because of poorly understood patterns of parapatry, sympatry and hybridization as well as complex patterns of geographical replacement of phenotypic forms. Two subgenera are, however, conventionally recognised: Platycercus comprises the blue-cheeked crimson rosella complex (Crimson Rosella P. elegans and Green Rosella P. caledonicus), and Violania contains the remaining four currently recognised species (Pale-headed Rosella P. adscitus, Eastern Rosella P. eximius, Northern Rosella P. venustus, and Western Rosella P. icterotis). We used phylogenetic analysis of ten loci (one mitochondrial, eight autosomal and one z-linked) and several individuals per nominal species primarily to examine relationships within the subgenera, especially the relationships and species limits within Violania. Of these, P. adscitus and P. eximius have long been considered sister species or conspecific due to a morphology-based hybrid zone and an early phylogenetic analysis of mitochondrial DNA restriction fragment length polymorphisms. The multilocus phylogenetic analysis presented here supports an alternative hypothesis aligning P. adscitus and P. venustus as sister species. Using divergence rates published in other avian studies, we estimated the divergence between P. venustus and P. adscitus at 0.0148-0.6124MYA and that between the P. adscitus/P. venustus ancestor and P. eximius earlier at 0.1617-1.0816MYA, both within the Pleistocene. Discordant topologies among gene and species trees are discussed and proposed to be the result of historical gene flow and/or incomplete lineage sorting (ILS). In particular, we suggest that discordance between mitochondrial and nuclear data may be the result of asymmetrical mitochondrial introgression from P. adscitus into P. eximius. The biogeographical implications of our findings are discussed relative to similarly distributed groups

  11. Automated Masking of AFLP Markers Improves Reliability of Phylogenetic Analyses

    PubMed Central

    Gimnich, France

    2012-01-01

    The amplified fragment length polymorphisms (AFLP) method has become an attractive tool in phylogenetics due to the ease with which large numbers of characters can be generated. In contrast to sequence-based phylogenetic approaches, AFLP data consist of anonymous multilocus markers. However, potential artificial amplifications or amplification failures of fragments contained in the AFLP data set will reduce AFLP reliability especially in phylogenetic inferences. In the present study, we introduce a new automated scoring approach, called “AMARE” (AFLP MAtrix REduction). The approach is based on replicates and makes marker selection dependent on marker reproducibility to control for scoring errors. To demonstrate the effectiveness of our approach we record error rate estimations, resolution scores, PCoA and stemminess calculations. As in general the true tree (i.e. the species phylogeny) is not known, we tested AMARE with empirical, already published AFLP data sets, and compared tree topologies of different AMARE generated character matrices to existing phylogenetic trees and/or other independent sources such as morphological and geographical data. It turns out that the selection of masked character matrices with highest resolution scores gave similar or even better phylogenetic results than the original AFLP data sets. PMID:23152859

  12. Recursions for statistical multiple alignment

    PubMed Central

    Hein, Jotun; Jensen, Jens Ledet; Pedersen, Christian N. S.

    2003-01-01

    Algorithms are presented that allow the calculation of the probability of a set of sequences related by a binary tree that have evolved according to the Thorne–Kishino–Felsenstein model for a fixed set of parameters. The algorithms are based on a Markov chain generating sequences and their alignment at nodes in a tree. Depending on whether the complete realization of this Markov chain is decomposed into the first transition and the rest of the realization or the last transition and the first part of the realization, two kinds of recursions are obtained that are computationally similar but probabilistically different. The running time of the algorithms is \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} \\begin{equation*}O({\\Pi}_{i}^{d}=1~L_{i})\\end{equation*}\\end{document}, where Li is the length of the ith observed sequences and d is the number of sequences. An alternative recursion is also formulated that uses only a Markov chain involving the inner nodes of a tree. PMID:14657378

  13. Molecular phylogenetics of mastodon and Tyrannosaurus rex.

    PubMed

    Organ, Chris L; Schweitzer, Mary H; Zheng, Wenxia; Freimark, Lisa M; Cantley, Lewis C; Asara, John M

    2008-04-25

    We report a molecular phylogeny for a nonavian dinosaur, extending our knowledge of trait evolution within nonavian dinosaurs into the macromolecular level of biological organization. Fragments of collagen alpha1(I) and alpha2(I) proteins extracted from fossil bones of Tyrannosaurus rex and Mammut americanum (mastodon) were analyzed with a variety of phylogenetic methods. Despite missing sequence data, the mastodon groups with elephant and the T. rex groups with birds, consistent with predictions based on genetic and morphological data for mastodon and on morphological data for T. rex. Our findings suggest that molecular data from long-extinct organisms may have the potential for resolving relationships at critical areas in the vertebrate evolutionary tree that have, so far, been phylogenetically intractable.

  14. Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

    PubMed Central

    Stadler, Tanja; Degnan, James H.; Rosenberg, Noah A.

    2016-01-01

    Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth–death and multispecies coalescent model can explain the difference in empirical trees and birth–death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785

  15. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications?

    PubMed Central

    2015-01-01

    Abstract Trees contribute to enormous plant oil reserves because many trees contain 50%–80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the “proline knot” motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs. PMID:26258573

  16. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications?

    PubMed

    Cao, Heping

    2015-09-01

    Trees contribute to enormous plant oil reserves because many trees contain 50%-80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the "proline knot" motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs.

  17. Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix

    SciTech Connect

    N. Price, Morgan; S. Dehal, Paramvir; P. Arkin, Adam

    2009-07-31

    Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

  18. Assessment of Student Learning Associated with Tree Thinking in an Undergraduate Introductory Organismal Biology Course

    PubMed Central

    Smith, James J.; Cheruvelil, Kendra Spence; Auvenshine, Stacie

    2013-01-01

    Phylogenetic trees provide visual representations of ancestor–descendant relationships, a core concept of evolutionary theory. We introduced “tree thinking” into our introductory organismal biology course (freshman/sophomore majors) to help teach organismal diversity within an evolutionary framework. Our instructional strategy consisted of designing and implementing a set of experiences to help students learn to read, interpret, and manipulate phylogenetic trees, with a particular emphasis on using data to evaluate alternative phylogenetic hypotheses (trees). To assess the outcomes of these learning experiences, we designed and implemented a Phylogeny Assessment Tool (PhAT), an open-ended response instrument that asked students to: 1) map characters on phylogenetic trees; 2) apply an objective criterion to decide which of two trees (alternative hypotheses) is “better”; and 3) demonstrate understanding of phylogenetic trees as depictions of ancestor–descendant relationships. A pre–post test design was used with the PhAT to collect data from students in two consecutive Fall semesters. Students in both semesters made significant gains in their abilities to map characters onto phylogenetic trees and to choose between two alternative hypotheses of relationship (trees) by applying the principle of parsimony (Occam's razor). However, learning gains were much lower in the area of student interpretation of phylogenetic trees as representations of ancestor–descendant relationships. PMID:24006401

  19. A taxonomic and phylogenetic re-appraisal of the genus Curvularia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Species of Curvularia are important plant and human pathogens worldwide. In this study, the genus Curvularia is re-assessed based on molecular phylogenetic analysis and morphological observations of available isolates and specimens. A multi-gene phylogenetic tree inferred from ITS, TEF and GPDH gene...

  20. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    PubMed

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  1. MP-Align: alignment of metabolic pathways

    PubMed Central

    2014-01-01

    Background Comparing the metabolic pathways of different species is useful for understanding metabolic functions and can help in studying diseases and engineering drugs. Several comparison techniques for metabolic pathways have been introduced in the literature as a first attempt in this direction. The approaches are based on some simplified representation of metabolic pathways and on a related definition of a similarity score (or distance measure) between two pathways. More recent comparative research focuses on alignment techniques that can identify similar parts between pathways. Results We propose a methodology for the pairwise comparison and alignment of metabolic pathways that aims at providing the largest conserved substructure of the pathways under consideration. The proposed methodology has been implemented in a tool called MP-Align, which has been used to perform several validation tests. The results showed that our similarity score makes it possible to discriminate between different domains and to reconstruct a meaningful phylogeny from metabolic data. The results further demonstrate that our alignment algorithm correctly identifies subpathways sharing a common biological function. Conclusion The results of the validation tests performed with MP-Align are encouraging. A comparison with another proposal in the literature showed that our alignment algorithm is particularly well-suited to finding the largest conserved subpathway of the pathways under examination. PMID:24886436

  2. Budgeted phylogenetic diversity on circular split systems.

    PubMed

    Minh, Bui Quang; Pardi, Fabio; Klaere, Steffen; von Haeseler, Arndt

    2009-01-01

    In the last 15 years, Phylogenetic Diversity (PD) has gained interest in the community of conservation biologists as a surrogate measure for assessing biodiversity. We have recently proposed two approaches to select taxa for maximizing PD, namely PD with budget constraints and PD on split systems. In this paper, we will unify these two strategies and present a dynamic programming algorithm to solve the unified framework of selecting taxa with maximal PD under budget constraints on circular split systems. An improved algorithm will also be given if the underlying split system is a tree.

  3. Girder Alignment Plan

    SciTech Connect

    Wolf, Zackary; Ruland, Robert; LeCocq, Catherine; Lundahl, Eric; Levashov, Yurii; Reese, Ed; Rago, Carl; Poling, Ben; Schafer, Donald; Nuhn, Heinz-Dieter; Wienands, Uli; /SLAC

    2010-11-18

    The girders for the LCLS undulator system contain components which must be aligned with high accuracy relative to each other. The alignment is one of the last steps before the girders go into the tunnel, so the alignment must be done efficiently, on a tight schedule. This note documents the alignment plan which includes efficiency and high accuracy. The motivation for girder alignment involves the following considerations. Using beam based alignment, the girder position will be adjusted until the beam goes through the center of the quadrupole and beam finder wire. For the machine to work properly, the undulator axis must be on this line and the center of the undulator beam pipe must be on this line. The physics reasons for the undulator axis and undulator beam pipe axis to be centered on the beam are different, but the alignment tolerance for both are similar. In addition, the beam position monitor must be centered on the beam to preserve its calibration. Thus, the undulator, undulator beam pipe, quadrupole, beam finder wire, and beam position monitor axes must all be aligned to a common line. All relative alignments are equally important, not just, for example, between quadrupole and undulator. We begin by making the common axis the nominal beam axis in the girder coordinate system. All components will be initially aligned to this axis. A more accurate alignment will then position the components relative to each other, without incorporating the girder itself.

  4. Phylogenetics of pond and lake lifestyles in Chaoborus midge larvae.

    PubMed

    Berendonk, Thomas U; Barraclough, Timothy G; Barraclough, Jonelle C

    2003-09-01

    Aquatic invertebrates experience strong trade-offs between habitats due to the selective effects of different predators. Diel vertical migration and small body size are thought to be effective strategies against fish predation in lakes. In the absence of fish in small ponds, migration is ineffective against invertebrate predators and large body size is an advantage. Although widely discussed, this phenomenon has never been tested in a phylogenetic context. We reconstructed a mitochondrial DNA (mtDNA) tree to investigate the phylogenetic distribution of pond and lake lifestyles among 10 species of northern temperate Chaoborus midge larvae. The mtDNA tree is similar to previous morphological trees for Chaoborus, the only difference being the disruption of the subgenus Chaoborus sensu stricto. At least three shifts have occurred between pond and lake lifestyles, each time associated with evolution of diel vertical migration in the lake taxon. The trend in larval body size with habitat type is sensitive to tree and character reconstruction methods, only weakly consistent with the effects of fish predation. Despite long time periods over which adaptation to each habitat type could have occurred, there remains significant phylogenetic heritability in larval body size. The tree provides a framework for comparative studies of the metapopulation genetic consequences of pond and lake lifestyles.

  5. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    PubMed

    Chen, Shu-Chuan; Ogata, Aaron

    2015-01-01

    The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  6. Molecular phylogenetic study on the origin and evolution of Mustelidae.

    PubMed

    Yonezawa, Takahiro; Nikaido, Masato; Kohno, Naoki; Fukumoto, Yukio; Okada, Norihiro; Hasegawa, Masami

    2007-07-01

    The family Mustelidae, which consists of Mustelinae, Lutrinae, Melinae, and Taxidiinae, is the largest family among Carnivora and is a highly diverse group. Recent molecular phylogenetic studies have clarified the phylogenetic relations among Mustelidae, but there remain several unresolved problems, particularly concerning the deep branchings. Whereas many studies support the monophyly of Mustelidae+Procyonidae among Musteloidea, the relations between Mustelidae+Procyonidae, Ailuridae, and Miphitidae are still unclear. To address these problems, we inferred a tree on the basis of the sequences of mitochondrial genomes and of multiple nuclear genes using the maximum likelihood method. Our results strongly support the hypothesis that the Taxidiinae branched at first, followed by the branching of the Melinae. After that, Mustelinae diversified, and Lutrinae evolved within Mustelinae. With respect to the deep branchings in Musteloidea, the Ailuridae/Mephitidae monophyly tree and the Mephitidae-basal tree are indistinguishable in log-likelihood score, and this problem remains unresolved.

  7. Synthesis of phylogeny and taxonomy into a comprehensive tree of life

    PubMed Central

    Hinchliff, Cody E.; Smith, Stephen A.; Allman, James F.; Burleigh, J. Gordon; Chaudhary, Ruchi; Coghill, Lyndon M.; Crandall, Keith A.; Deng, Jiabin; Drew, Bryan T.; Gazis, Romina; Gude, Karl; Hibbett, David S.; Katz, Laura A.; Laughinghouse, H. Dail; McTavish, Emily Jane; Midford, Peter E.; Owen, Christopher L.; Ree, Richard H.; Rees, Jonathan A.; Soltis, Douglas E.; Williams, Tiffani; Cranston, Karen A.

    2015-01-01

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics. PMID:26385966

  8. Synthesis of phylogeny and taxonomy into a comprehensive tree of life.

    PubMed

    Hinchliff, Cody E; Smith, Stephen A; Allman, James F; Burleigh, J Gordon; Chaudhary, Ruchi; Coghill, Lyndon M; Crandall, Keith A; Deng, Jiabin; Drew, Bryan T; Gazis, Romina; Gude, Karl; Hibbett, David S; Katz, Laura A; Laughinghouse, H Dail; McTavish, Emily Jane; Midford, Peter E; Owen, Christopher L; Ree, Richard H; Rees, Jonathan A; Soltis, Douglas E; Williams, Tiffani; Cranston, Karen A

    2015-10-13

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.

  9. Investigating how students communicate tree-thinking

    NASA Astrophysics Data System (ADS)

    Boyce, Carrie Jo

    Learning is often an active endeavor that requires students work at building conceptual understandings of complex topics. Personal experiences, ideas, and communication all play large roles in developing knowledge of and understanding complex topics. Sometimes these experiences can promote formation of scientifically inaccurate or incomplete ideas. Representations are tools used to help individuals understand complex topics. In biology, one way that educators help people understand evolutionary histories of organisms is by using representations called phylogenetic trees. In order to understand phylogenetics trees, individuals need to understand the conventions associated with phylogenies. My dissertation, supported by the Tree-Thinking Representational Competence and Word Association frameworks, is a mixed-methods study investigating the changes in students' tree-reading, representational competence and mental association of phylogenetic terminology after participation in varied instruction. Participants included 128 introductory biology majors from a mid-sized southern research university. Participants were enrolled in either Introductory Biology I, where they were not taught phylogenetics, or Introductory Biology II, where they were explicitly taught phylogenetics. I collected data using a pre- and post-assessment consisting of a word association task and tree-thinking diagnostic (n=128). Additionally, I recruited a subset of students from both courses (n=37) to complete a computer simulation designed to teach students about phylogenetic trees. I then conducted semi-structured interviews consisting of a word association exercise with card sort task, a retrospective pre-assessment discussion, a post-assessment discussion, and interview questions. I found that students who received explicit lecture instruction had a significantly higher increase in scores on a tree-thinking diagnostic than students who did not receive lecture instruction. Students who received both

  10. Phyloclimatic modeling: combining phylogenetics and bioclimatic modeling.

    PubMed

    Yesson, C; Culham, A

    2006-10-01

    We investigate the impact of past climates on plant diversification by tracking the "footprint" of climate change on a phylogenetic tree. Diversity within the cosmopolitan carnivorous plant genus Drosera (Droseraceae) is focused within Mediterranean climate regions. We explore whether this diversity is temporally linked to Mediterranean-type climatic shifts of the mid-Miocene and whether climate preferences are conservative over phylogenetic timescales. Phyloclimatic modeling combines environmental niche (bioclimatic) modeling with phylogenetics in order to study evolutionary patterns in relation to climate change. We present the largest and most complete such example to date using Drosera. The bioclimatic models of extant species demonstrate clear phylogenetic patterns; this is particularly evident for the tuberous sundews from southwestern Australia (subgenus Ergaleium). We employ a method for establishing confidence intervals of node ages on a phylogeny using replicates from a Bayesian phylogenetic analysis. This chronogram shows that many clades, including subgenus Ergaleium and section Bryastrum, diversified during the establishment of the Mediterranean-type climate. Ancestral reconstructions of bioclimatic models demonstrate a pattern of preference for this climate type within these groups. Ancestral bioclimatic models are projected into palaeo-climate reconstructions for the time periods indicated by the chronogram. We present two such examples that each generate plausible estimates of ancestral lineage distribution, which are similar to their current distributions. This is the first study to attempt bioclimatic projections on evolutionary time scales. The sundews appear to have diversified in response to local climate development. Some groups are specialized for Mediterranean climates, others show wide-ranging generalism. This demonstrates that Phyloclimatic modeling could be repeated for other plant groups and is fundamental to the understanding of

  11. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling.

    PubMed

    Meinel, Thomas; Krause, Antje

    2012-01-01

    In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.

  12. IQPNNI: moving fast through tree space and stopping in time.

    PubMed

    Vinh, Le Sy; Von Haeseler, Arndt

    2004-08-01

    An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast algorithms to generate a list of potential candidate trees. The key ingredient is the definition of so-called important quartets (IQs), which allow the computation of an intermediate tree in O(n(2)) time for n sequences. The resulting tree is then further optimized by applying the nearest neighbor interchange (NNI) operation. Subsequently a random fraction of the sequences is deleted from the best tree found so far. The deleted sequences are then re-inserted in the smaller tree using the important quartet puzzling (IQP) algorithm. These steps are repeated several times and the best tree, with respect to the likelihood criterion, is considered as the inferred phylogenetic tree. Moreover, we suggest a rule which indicates when to stop the search. Simulations show that IQPNNI gives a slightly better accuracy than other programs tested. Moreover, we applied the approach to 218 small subunit rRNA sequences and 500 rbcL sequences. We found trees with higher likelihood compared to the results by others. A program to reconstruct DNA or amino acid based phylogenetic trees is available online (http://www.bi.uni-duesseldorf.de/software/iqpnni).

  13. Talking Trees

    ERIC Educational Resources Information Center

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  14. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search

    NASA Technical Reports Server (NTRS)

    Wheeler, Ward C.

    2003-01-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.

  15. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search.

    PubMed

    Wheeler, Ward C

    2003-06-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed.

  16. Partial gene sequences for the A subunit of methyl-coenzyme M reductase (mcrI) as a phylogenetic tool for the family Methanosarcinaceae

    NASA Technical Reports Server (NTRS)

    Springer, E.; Sachs, M. S.; Woese, C. R.; Boone, D. R.

    1995-01-01

    Representatives of the family Methanosarcinaceae were analyzed phylogenetically by comparing partial sequences of their methyl-coenzyme M reductase (mcrI) genes. A 490-bp fragment from the A subunit of the gene was selected, amplified by the PCR, cloned, and sequenced for each of 25 strains belonging to the Methanosarcinaceae. The sequences obtained were aligned with the corresponding portions of five previously published sequences, and all of the sequences were compared to determine phylogenetic distances by Fitch distance matrix methods. We prepared analogous trees based on 16S rRNA sequences; these trees corresponded closely to the mcrI trees, although the mcrI sequences of pairs of organisms had 3.01 +/- 0.541 times more changes than the respective pairs of 16S rRNA sequences, suggesting that the mcrI fragment evolved about three times more rapidly than the 16S rRNA gene. The qualitative similarity of the mcrI and 16S rRNA trees suggests that transfer of genetic information between dissimilar organisms has not significantly affected these sequences, although we found inconsistencies between some mcrI distances that we measured and and previously published DNA reassociation data. It is unlikely that multiple mcrI isogenes were present in the organisms that we examined, because we found no major discrepancies in multiple determinations of mcrI sequences from the same organism. Our primers for the PCR also match analogous sites in the previously published mcrII sequences, but all of the sequences that we obtained from members of the Methanosarcinaceae were more closely related to mcrI sequences than to mcrII sequences, suggesting that members of the Methanosarcinaceae do not have distinct mcrII genes.

  17. Phylogenetic community ecology of soil biodiversity using mitochondrial metagenomics.

    PubMed

    Andújar, Carmelo; Arribas, Paula; Ruzicka, Filip; Crampton-Platt, Alex; Timmermans, Martijn J T N; Vogler, Alfried P

    2015-07-01

    High-throughput DNA methods hold great promise for the study of taxonomically intractable mesofauna of the soil. Here, we assess species diversity and community structure in a phylogenetic framework, by sequencing total DNA from bulk specimen samples and assembly of mitochondrial genomes. The combination of mitochondrial metagenomics and DNA barcode sequencing of 1494 specimens in 69 soil samples from three geographic regions in southern Iberia revealed >300 species of soil Coleoptera (beetles) from a broad spectrum of phylogenetic lineages. A set of 214 mitochondrial sequences longer than 3000 bp was generated and used to estimate a well-supported phylogenetic tree of the order Coleoptera. Shorter sequences, including cox1 barcodes, were placed on this mitogenomic tree. Raw Illumina reads were mapped against all available sequences to test for species present in local samples. This approach simultaneously established the species richness, phylogenetic composition and community turnover at species and phylogenetic levels. We find a strong signature of vertical structuring in soil fauna that shows high local community differentiation between deep soil and superficial horizons at phylogenetic levels. Within the two vertical layers, turnover among regions was primarily at the tip (species) level and was stronger in the deep soil than leaf litter communities, pointing to layer-mediated drivers determining species diversification, spatial structure and evolutionary assembly of soil communities. This integrated phylogenetic framework opens the application of phylogenetic community ecology to the mesofauna of the soil, among the most diverse and least well-understood ecosystems, and will propel both theoretical and applied soil science.

  18. The gene tree delusion.

    PubMed

    Springer, Mark S; Gatesy, John

    2016-01-01

    Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are

  19. Phylogeny, phylogenetic inference, and cranial evolution in pitheciids and Aotus.

    PubMed

    Bjarnason, Alexander; Soligo, Christophe; Elton, Sarah

    2017-03-01

    Pitheciids, one of the major radiations of New World monkeys endemic to South and Central America, are distributed in the Amazon and Orinoco basins, and include Callicebus, Cacajao, Chiropotes, and Pithecia. Molecular phylogenetics strongly support pitheciid monophyly, whereas morphological analyses infer a range of phylogenies including a sister relationship between Aotus and Callicebus. We collected geometric morphometric cranial data from pitheciids and Aotus, and used cranial data for distance-based phylogenetic analysis and tests of phylogenetic signal. Phylogenetic analyses of pitheciids were repeated with Lagothrix, Callimico, and Saimiri outgroups for Procrustes shape with and without Aotus based on the whole cranium and six anatomical regions. All phylogenetic signal tests were significant, and tree lengths were shortest and had the least morphological change over the phylogeny for Procrustes residuals from the cranial base and palate. The majority of phylogenetic analyses of Procrustes shape for pitheciids without Aotus supported the molecular phylogeny, and with Aotus included the majority inferred an Aotus-Callicebus clade, although three analyses with Callimico as outgroup supported the molecular phylogeny. The morphological similarity of Aotus and Callicebus is likely a mix of plesiomorphy, allometry, and homoplasy, and future phylogenetic inference of living and extinct platyrrhine taxa should consider the impact of these factors alongside outgroup selection and cranial region.

  20. Revealing pancrustacean relationships: Phylogenetic analysis of ribosomal protein genes places Collembola (springtails) in a monophyletic Hexapoda and reinforces the discrepancy between mitochondrial and nuclear DNA markers

    PubMed Central

    2008-01-01

    Background In recent years, several new hypotheses on phylogenetic relations among arthropods have been proposed on the basis of DNA sequences. One of the challenged hypotheses is the monophyly of hexapods. This discussion originated from analyses based on mitochondrial DNA datasets that, due to an unusual positioning of Collembola, suggested that the hexapod body plan evolved at least twice. Here, we re-evaluate the position of Collembola using ribosomal protein gene sequences. Results In total 48 ribosomal proteins were obtained for the collembolan Folsomia candida. These 48 sequences were aligned with sequence data on 35 other ecdysozoans. Each ribosomal protein gene was available for 25% to 86% of the taxa. However, the total sequence information was unequally distributed over the taxa and ranged between 4% and 100%. A concatenated dataset was constructed (5034 inferred amino acids in length), of which ~66% of the positions were filled. Phylogenetic tree reconstructions, using Maximum Likelihood, Maximum Parsimony, and Bayesian methods, resulted in a topology that supports monophyly of Hexapoda. Conclusion Although ribosomal proteins in general may not evolve independently, they once more appear highly valuable for phylogenetic reconstruction. Our analyses clearly suggest that Hexapoda is monophyletic. This underpins the inconsistency between nuclear and mitochondrial datasets when analyzing pancrustacean relationships. Caution is needed when applying mitochondrial markers in deep phylogeny. PMID:18366624

  1. The SILVA and "All-species Living Tree Project (LTP)" taxonomic frameworks.

    PubMed

    Yilmaz, Pelin; Parfrey, Laura Wegener; Yarza, Pablo; Gerken, Jan; Pruesse, Elmar; Quast, Christian; Schweer, Timmy; Peplies, Jörg; Ludwig, Wolfgang; Glöckner, Frank Oliver

    2014-01-01

    SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive resource for up-to-date quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. SILVA provides a manually curated taxonomy for all three domains of life, based on representative phylogenetic trees for the small- and large-subunit rRNA genes. This article describes the improvements the SILVA taxonomy has undergone in the last 3 years. Specifically we are focusing on the curation process, the various resources used for curation and the comparison of the SILVA taxonomy with Greengenes and RDP-II taxonomies. Our comparisons not only revealed a reasonable overlap between the taxa names, but also points to significant differences in both names and numbers of taxa between the three resources.

  2. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks

    PubMed Central

    Yilmaz, Pelin; Parfrey, Laura Wegener; Yarza, Pablo; Gerken, Jan; Pruesse, Elmar; Quast, Christian; Schweer, Timmy; Peplies, Jörg; Ludwig, Wolfgang; Glöckner, Frank Oliver

    2014-01-01

    SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive resource for up-to-date quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. SILVA provides a manually curated taxonomy for all three domains of life, based on representative phylogenetic trees for the small- and large-subunit rRNA genes. This article describes the improvements the SILVA taxonomy has undergone in the last 3 years. Specifically we are focusing on the curation process, the various resources used for curation and the comparison of the SILVA taxonomy with Greengenes and RDP-II taxonomies. Our comparisons not only revealed a reasonable overlap between the taxa names, but also points to significant differences in both names and numbers of taxa between the three resources. PMID:24293649

  3. Tidal alignment of galaxies

    SciTech Connect

    Blazek, Jonathan; Vlah, Zvonimir; Seljak, Uroš

    2015-08-01

    We develop an analytic model for galaxy intrinsic alignments (IA) based on the theory of tidal alignment. We calculate all relevant nonlinear corrections at one-loop order, including effects from nonlinear density evolution, galaxy biasing, and source density weighting. Contributions from density weighting are found to be particularly important and lead to bias dependence of the IA amplitude, even on large scales. This effect may be responsible for much of the luminosity dependence in IA observations. The increase in IA amplitude for more highly biased galaxies reflects their locations in regions with large tidal fields. We also consider the impact of smoothing the tidal field on halo scales. We compare the performance of this consistent nonlinear model in describing the observed alignment of luminous red galaxies with the linear model as well as the frequently used "nonlinear alignment model," finding a significant improvement on small and intermediate scales. We also show that the cross-correlation between density and IA (the "GI" term) can be effectively separated into source alignment and source clustering, and we accurately model the observed alignment down to the one-halo regime using the tidal field from the fully nonlinear halo-matter cross correlation. Inside the one-halo regime, the average alignment of galaxies with density tracers no longer follows the tidal alignment prediction, likely reflecting nonlinear processes that must be considered when modeling IA on these scales. Finally, we discuss tidal alignment in the context of cosmic shear measurements.

  4. Tidal alignment of galaxies

    SciTech Connect

    Blazek, Jonathan; Vlah, Zvonimir; Seljak, Uroš E-mail: zvlah@stanford.edu

    2015-08-01

    We develop an analytic model for galaxy intrinsic alignments (IA) based on the theory of tidal alignment. We calculate all relevant nonlinear corrections at one-loop order, including effects from nonlinear density evolution, galaxy biasing, and source density weighting. Contributions from density weighting are found to be particularly important and lead to bias dependence of the IA amplitude, even on large scales. This effect may be responsible for much of the luminosity dependence in IA observations. The increase in IA amplitude for more highly biased galaxies reflects their locations in regions with large tidal fields. We also consider the impact of smoothing the tidal field on halo scales. We compare the performance of this consistent nonlinear model in describing the observed alignment of luminous red galaxies with the linear model as well as the frequently used 'nonlinear alignment model,' finding a significant improvement on small and intermediate scales. We also show that the cross-correlation between density and IA (the 'GI' term) can be effectively separated into source alignment and source clustering, and we accurately model the observed alignment down to the one-halo regime using the tidal field from the fully nonlinear halo-matter cross correlation. Inside the one-halo regime, the average alignment of galaxies with density tracers no longer follows the tidal alignment prediction, likely reflecting nonlinear processes that must be considered when modeling IA on these scales. Finally, we discuss tidal alignment in the context of cosmic shear measurements.

  5. Heterogeneous Compression of Large Collections of Evolutionary Trees.

    PubMed

    Matthews, Suzanne J

    2015-01-01

    Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.

  6. A phylogenetic blueprint for a modern whale.

    PubMed

    Gatesy, John; Geisler, Jonathan H; Chang, Joseph; Buell, Carl; Berta, Annalisa; Meredith, Robert W; Springer, Mark S; McGowen, Michael R

    2013-02-01

    The emergence of Cetacea in the Paleogene represents one of the most profound macroevolutionary transitions within Mammalia. The move from a terrestrial habitat to a committed aquatic lifestyle engendered wholesale changes in anatomy, physiology, and behavior. The results of this remarkable transformation are extant whales that include the largest, biggest brained, fastest swimming, loudest, deepest diving mammals, some of which can detect prey with a sophisticated echolocation system (Odontoceti - toothed whales), and others that batch feed using racks of baleen (Mysticeti - baleen whales). A broad-scale reconstruction of the evolutionary remodeling that culminated in extant cetaceans has not yet been based on integration of genomic and paleontological information. Here, we first place Cetacea relative to extant mammalian diversity, and assess the distribution of support among molecular datasets for relationships within Artiodactyla (even-toed ungulates, including Cetacea). We then merge trees derived from three large concatenations of molecular and fossil data to yield a composite hypothesis that encompasses many critical events in the evolutionary history of Cetacea. By combining diverse evidence, we infer a phylogenetic blueprint that outlines the stepwise evolutionary development of modern whales. This hypothesis represents a starting point for more detailed, comprehensive phylogenetic reconstructions in the future, and also highlights the synergistic interaction between modern (genomic) and traditional (morphological+paleontological) approaches that ultimately must be exploited to provide a rich understanding of evolutionary history across the entire tree of Life.

  7. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees

  8. Phylogenetic relationships among Boleosoma darter species (Percidae: Etheostoma).

    PubMed

    Heckman, K L; Near, T J; Alonzo, S H

    2009-10-01

    Darters represent a species rich group of North American freshwater fishes studied in the context of their diverse morphology, behavior, and geographic distribution. We report the first molecular phylogenetic analyses of the Boleosoma darter clade that includes complete species sampling. We estimated the relationship among the species of Boleosoma using DNA sequence data from a mitochondrial (cytochrome b) and a nuclear gene (S7 ribosomal protein intron 1). Our analyses discovered that the two Boleosoma species with large geographic distributions (E. nigrum and E. olmstedi) do not form reciprocally monophyletic groups in either gene trees. Etheostoma susanae and E. perlongum were phylogenetically nested in E. nigrum and E. olmstedi, respectively. While analysis of the nuclear gene resulted in a phylogeny where E. longimanum and E. podostemone were sister species, the mitochondrial gene tree did not support this relationship. Etheostoma vitreum was phylogenetically nested within Boleosoma in the mitochondrial DNA and nuclear gene trees. Our analyses suggest that current concepts of species diversity underestimate phylogenetic diversity in Boleosoma and that Boleosoma species likely provide another example of the growing number of discovered instances of mitochondrial genome transfer between darter species.

  9. Geometric morphometric character suites as phylogenetic data: extracting phylogenetic signal from gastropod shells.

    PubMed

    Smith, Ursula E; Hendricks, Jonathan R

    2013-05-01

    Despite being the objects of numerous macroevolutionary studies, many of the best represented constituents of the fossil record-including diverse examples such as foraminifera, brachiopods, and mollusks-have mineralized skeletons with limited discrete characteristics, making morphological phylogenies difficult to construct. In contrast to their paucity of phylogenetic characters, the mineralized structures (tests and shells) of these fossil groups frequently have distinctive shapes that have long proved useful for their classification. The recent introduction of methodologies for including continuous data directly in a phylogenetic analysis has increased the number of available characters, making it possible to produce phylogenies based, in whole or part, on continuous character data collected from such taxa. Geometric morphometric methods provide tools for accurately characterizing shape variation and can produce quantitative data that can therefore now be included in a phylogenetic matrix in a nonarbitrary manner. Here, the marine gastropod genus Conus is used to evaluate the ability of continuous characters-generated from a geometric morphometric analysis of shell shape-to contribute to a total evidence phylogenetic hypothesis constructed using molecular and morphological data. Furthermore, the ability of continuous characters derived from geometric morphometric analyses to place fossil taxa with limited discrete characters into a phylogeny with their extant relatives was tested by simulating the inclusion of fossil taxa. This was done by removing the molecular partition of individual extant species to produce a "cladistic pseudofossil" with only the geometric morphometric derived characters coded. The phylogenetic position of each cladistic pseudofossil taxon was then compared with its placement in the total evidence tree and a symmetric resampling tree to evaluate the degree to which morphometric characters alone can correctly place simulated fossil species

  10. Nuclear Ribosomal ITS Functional Paralogs Resolve the Phylogenetic Relationships of a Late-Miocene Radiation Cycad Cycas (Cycadaceae)

    PubMed Central

    Xiao, Long-Qian; Möller, Michael

    2015-01-01

    Cycas is the most widespread and diverse genus among the ancient cycads, but the extant species could be the product of late Miocene rapid radiations. Taxonomic treatments to date for this genus are quite controversial, which makes it difficult to elucidate its evolutionary history. We cloned 161 genomic ITS sequences from 31 species representing all sections of Cycas. The divergent ITS paralogs were examined within each species and identified as putative pseudogenes, recombinants and functional paralogs. Functional paralogs were used to reconstruct phylogenetic relationships with pseudogene sequences as molecular outgroups, since an unambiguous ITS sequence alignment with their closest relatives, the Zamiaceae, is unachievable. A fully resolved and highly supported tree topology was obtained at the section level, with two major clades including six minor clades. The results fully supported the classification scheme proposed by Hill (2004) at the section level, with the minor clades representing his six sections. The two major clades could be recognised as two subgenera. The obtained pattern of phylogenetic relationships, combined with the different seed dispersal capabilities and paleogeography, allowed us to propose a late Miocene rapid radiation of Cycas that might have been promoted by vicariant events associated with the complex topography and orogeny of South China and adjacent regions. In contrast, transoceanic dispersals might have played an important role in the rapid diversification of sect. Cycas, whose members have evolved a spongy layer in their seeds aiding water dispersals. PMID:25635842

  11. Reconstructible Phylogenetic Networks: Do Not Distinguish the Indistinguishable

    PubMed Central

    Pardi, Fabio; Scornavacca, Celine

    2015-01-01

    Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are indistinguishable. This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks. PMID:25849429

  12. Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

    PubMed

    Pardi, Fabio; Scornavacca, Celine

    2015-04-01

    Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

  13. Indel Reliability in Indel-Based Phylogenetic Inference

    PubMed Central

    Ashkenazy, Haim; Cohen, Ofir; Pupko, Tal; Huchon, Dorothée

    2014-01-01

    It is often assumed that it is unlikely that the same insertion or deletion (indel) event occurred at the same position in two independent evolutionary lineages, and thus, indel-based inference of phylogeny should be less subject to homoplasy compared with standard inference which is based on substitution events. Indeed, indels were successfully used to solve debated evolutionary relationships among various taxonomical groups. However, indels are never directly observed but rather inferred from the alignment and thus indel-based inference may be sensitive to alignment errors. It is hypothesized that phylogenetic reconstruction would be more accurate if it relied only on a subset of reliable indels instead of the entire indel data. Here, we developed a method to quantify the reliability of indel characters by measuring how often they appear in a set of alternative multiple sequence alignments. Our approach is based on the assumption that indels that are consistently present in most alternative alignments are more reliable compared with indels that appear only in a small subset of these alignments. Using simulated and empirical data, we studied the impact of filtering and weighting indels by their reliability scores on the accuracy of indel-based phylogenetic reconstruction. The new method is available as a web-server at http://guidance.tau.ac.il/RELINDEL/. PMID:25409663

  14. An enhanced calibration of a recently released megatree for the analysis of phylogenetic diversity.

    PubMed

    Gastauer, M; Meira-Neto, J A A

    2016-04-19

    Dated or calibrated phylogenetic trees, in which branch lengths correspond to evolutionary divergence times between nodes, are important requirements for computing measures of phylogenetic diversity or phylogenetic community structure. The increasing knowledge about the diversification and evolutionary divergence times of vascular plants requires a revision of the age estimates used for the calibration of phylogenetic trees by the bladj algorithm of the Phylocom 4.2 package. Comparing the recently released megatree R20120829.new with two calibrated vascular plant phylogenies provided in the literature, we found 242 corresponding nodes. We modified the megatree (R20120829mod.new), inserting names for all corresponding nodes. Furthermore, we provide files containing age estimates from both sources for the updated calibration of R20120829mod.new. Applying these files consistently in analyses of phylogenetic community structure or diversity serves to avoid erroneous measures and ecological misinterpretation.

  15. Mammalian phylogenetic diversity-area relationships at a continental scale.

    PubMed

    Mazel, Florent; Renaud, Julien; Guilhaumon, François; Mouillot, David; Gravel, Dominique; Thuiller, Wilfried

    2015-10-01

    In analogy to the species-area relationship (SAR), one of the few laws in ecology, the phylogenetic diversity-area relationship (PDAR) describes the tendency of phylogenetic diversity (PD) to increase with area. Although investigating PDAR has the potential to unravel the underlying processes shaping assemblages across spatial scales and to predict PD loss through habitat reduction, it has been little investigated so far. Focusing on PD has noticeable advantages compared to species richness (SR), since PD also gives insights on processes such as speciation/extinction, assembly rules and ecosystem functioning. Here we investigate the universality and pervasiveness of the PDAR at continental scale using terrestrial mammals as study case. We define the relative robustness of PD (compared to SR) to habitat loss as the area between the standardized PDAR and standardized SAR (i.e., standardized by the diversity of the largest spatial window) divided by the area under the standardized SAR only. This metric quantifies the relative increase of PD robustness compared to SR robustness. We show that PD robustness is higher than SR robustness but that it varies among continents. We further use a null model approach to disentangle the relative effect of phylogenetic tree shape and nonrandom spatial distribution of evolutionary history on the PDAR. We find that, for most spatial scales and for all continents except Eurasia, PDARs are not different from expected by a model using only the observed SAR and the shape of the phylogenetic tree at continental scale. Interestingly, we detect a strong phylogenetic structure of the Eurasian PDAR that can be predicted by a model that specifically account for a finer biogeographical delineation of this continent. In conclusion, the relative robustness of PD to habitat loss compared to species richness is determined by the phylogenetic tree shape but also depends on the spatial structure of PD.

  16. Mammalian phylogenetic diversity-area relationships at a continental scale

    PubMed Central

    Mazel, Florent; Renaud, Julien; Guilhaumon, François; Mouillot, David; Gravel, Dominique; Thuiller, Wilfried

    2015-01-01

    In analogy to the species-area relationship (SAR), one of the few laws in Ecology, the phylogenetic diversity-area relationship (PDAR) describes the tendency of phylogenetic diversity (PD) to increase with area. Although investigating PDAR has the potential to unravel the underlying processes shaping assemblages across spatial scales and to predict PD loss through habitat reduction, it has been little investigated so far. Focusing on PD has noticeable advantages compared to species richness (SR) since PD also gives insights on processes such as speciation/extinction, assembly rules and ecosystem functioning. Here we investigate the universality and pervasiveness of the PDAR at continental scale using terrestrial mammals as study case. We define the relative robustness of PD (compared to SR) to habitat loss as the area between the standardized PDAR and standardized SAR (i.e. standardized by the diversity of the largest spatial window) divided by the area under the standardized SAR only. This metric quantifies the relative increase of PD robustness compared to SR robustness. We show that PD robustness is higher than SR robustness but that it varies among continents. We further use a null model approach to disentangle the relative effect of phylogenetic tree shape and non random spatial distribution of evolutionary history on the PDAR. We find that for most spatial scales and for all continents except Eurasia, PDARs are not different from expected by a model using only the observed SAR and the shape of the phylogenetic tree at continental scale. Interestingly, we detect a strong phylogenetic structure of the Eurasian PDAR that can be predicted by a model that specifically account for a finer biogeographical delineation of this continent. In conclusion, the relative robustness of PD to habitat loss compared to species richness is determined by the phylogenetic tree shape but also depends on the spatial structure of PD. PMID:26649401

  17. GB Virus C/Hepatitis G Virus Groups and Subgroups: Classification by a Restriction Fragment Length Polymorphism Method Based on Phylogenetic Analysis of the 5′ Untranslated Region

    PubMed Central

    Quarleri, J. F.; Mathet, V. L.; Feld, M.; Ferrario, D.; della Latta, M. P.; Verdun, R.; Sánchez, D. O.; Oubiña, J. R.

    1999-01-01

    A phylogenetic tree based on 150 5′ untranslated region sequences deposited in GenBank database allowed segregation of the sequences into three major groups, including two subgroups, i.e., 1, 2a, 2b, and 3, supported by bootstrap analysis. Restriction site analysis of these sequences predicted that HinfI and either AatII or AciI could be used for genomic typing with 99.4% accuracy. cDNA sequencing and subsequent alignment of 21 Argentine GB virus C/hepatitis G virus strains confirmed restriction fragment length polymorphism patterns theoretically predicted. This method may be useful for a rapid screening of samples when either epidemiological or transmission studies of this agent are carried out. PMID:10203483

  18. Effects of memory on the shapes of simple outbreak trees.

    PubMed

    Plazzotta, Giacomo; Kwan, Christopher; Boyd, Michael; Colijn, Caroline

    2016-02-18

    Genomic tools, including phylogenetic trees derived from sequence data, are increasingly used to understand outbreaks of infectious diseases. One challenge is to link phylogenetic trees to patterns of transmission. Particularly in bacteria that cause chronic infections, this inference is affected by variable infectious periods and infectivity over time. It is known that non-exponential infectious periods can have substantial effects on pathogens' transmission dynamics. Here we ask how this non-Markovian nature of an outbreak process affects the branching trees describing that process, with particular focus on tree shapes. We simulate Crump-Mode-Jagers branching processes and compare different patterns of infectivity over time. We find that memory (non-Markovian-ness) in the process can have a pronounced effect on the shapes of the outbreak's branching pattern. However, memory also has a pronounced effect on the sizes of the trees, even when the duration of the simulation is fixed. When the sizes of the trees are constrained to a constant value, memory in our processes has little direct effect on tree shapes, but can bias inference of the birth rate from trees. We compare simulated branching trees to phylogenetic trees from an outbreak of tuberculosis in Canada, and discuss the relevance of memory to this dataset.

  19. RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem

    PubMed Central

    Taheri, Javid; Zomaya, Albert Y

    2009-01-01

    Background Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. Results This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. Conclusion RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences. PMID:19594869

  20. Genetic relatedness and phylogenetics of five Indian pufferfishes.

    PubMed

    Lakra, Wazir S; Goswami, Mukunda; Singh, Akhilesh

    2013-10-01

    The taxonomy and phylogeny of the pufferfishes belonging to the family Tetraodontidae found in India are poorly understood. We investigated five species of freshwater and marine pufferfishes using partial sequences of 16S rRNA and cytochrome c oxidase subunit I (COI) of mitochondrial genes. The sequence alignment of 16S rRNA yielded 573 bp, whereas COI gene sequence alignment yielded 614 bp. The sequence analysis of the genes revealed two distinct groups of freshwater and marine origin, which are genetically distinct from each other and exhibit identical phylogenetic resolution. The partial sequences of both the genes provided sufficient phylogenetic resolution to distinguish all the five species of pufferfishes. The COI sequences could be used as DNA barcodes for identification of the pufferfishes.

  1. Bayesian phylogenetic estimation of fossil ages

    PubMed Central

    Drummond, Alexei J.; Stadler, Tanja

    2016-01-01

    Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular, the fossilized birth–death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized datasets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two datasets of 5.7% and 13.2%, respectively. The median relative standard error (RSD) was 9.2% and 7.2%, respectively, suggesting good precision, although with some outliers. In fact, in the two datasets we analyse, the phylogenetic estimate of fossil age is on average less than 2 Myr from the mid-point age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the ‘morphological clock', and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses. This article is part of the themed issue ‘Dating species divergences

  2. Phylogenetic Comparative Assembly

    NASA Astrophysics Data System (ADS)

    Husemann, Peter; Stoye, Jens

    Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a contig adjacency graph. From this a layout graph can be computed which indicates putative adjacencies of the contigs in order to aid biologists in finishing the complete genomic sequence.

  3. RAG-1 sequences resolve phylogenetic relationships within Charadriiform birds.

    PubMed

    Paton, Tara A; Baker, Allan J; Groth, Jeff G; Barrowclough, George F

    2003-11-01

    The Charadriiformes is a large and diverse order of shorebirds currently classified into 19 families, including morphologically aberrant forms that are of uncertain phylogenetic placement within non-passerine birds in general. Recent attempts using morphological characters have failed to recover a well-supported phylogeny depicting higher level relationships within Charadriiformes and the limits to the order, primarily because of inconsistency and homoplasy in these data. Moreover, these trees are incongruent with the relationships presented in the DNA hybridization tapestry of, including the location of the root and the branching order of major clades within the shorebirds. To help clarify this systematic confusion we therefore sequenced the large RAG-1 nuclear exon (2850 bp) from 36 species representing 17 families of shorebirds for which DNA was available. Trees built with maximum parsimony, maximum likelihood or Bayesian methods are topologically identical and fully resolved, with high support at basal nodes. This further attests to the phylogenetic utility of the RAG-1 sequences at higher taxonomic levels within birds. The RAG-1 tree is topologically similar to the DNA hybridization tree in depicting three major subordinal clades of shorebirds, the Charadrii (thick-knees, sheathbills, plovers, oystercatchers, and allies), Scolopaci (sandpipers and jacanas) and the Lari (coursers, pratincoles, gulls, terns, skimmers, and skuas). However, the basal split in the RAG-1 tree is between Charadrii and (Scolopaci+Lari), whereas in the DNA hybridization tree Scolopaci is the sister group to the (Charadrii+Lari). Thus in both of these DNA-based trees the Alcidae (auks, murres, and allies) are not basal among shorebirds as hypothesized in morphological trees, but instead are placed as a tip clade within Lari. The enigmatic buttonquails (Turnicidae), variously hypothesized as being allied to either the Galliformes, Gruiformes, or Charadriiformes, are shown to be a basal

  4. Molecular phylogenetic perspectives for character classification and convergence: Framing some issues with nematode vulval appendages and telotylenchid tail termini

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Characters flagged as convergent based on newer molecular phylogenetic trees inform both practical identification and more esoteric classification. Nematode morphological characters such as lateral lines, bullae and laciniae are quite independent structures from those similarly named in other organi...

  5. Precision alignment device

    DOEpatents

    Jones, N.E.

    1988-03-10

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam. 5 figs.

  6. Precision alignment device

    DOEpatents

    Jones, Nelson E.

    1990-01-01

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam.

  7. Hybrid vehicle motor alignment

    DOEpatents

    Levin, Michael Benjamin

    2001-07-03

    A rotor of an electric motor for a motor vehicle is aligned to an axis of rotation for a crankshaft of an internal combustion engine having an internal combustion engine and an electric motor. A locator is provided on the crankshaft, a piloting tool is located radially by the first locator to the crankshaft. A stator of the electric motor is aligned to a second locator provided on the piloting tool. The stator is secured to the engine block. The rotor is aligned to the crankshaft and secured thereto.

  8. Tree disagreement: measuring and testing incongruence in phylogenies.

    PubMed

    Planet, Paul J

    2006-02-01

    The branching patterns of phylogenetic trees often disagree even when they have been constructed using different portions of the same data. This phylogenetic discord (incongruence) can be explained by real differences in evolutionary process or history, but also may be due simply to random chance or sampling error. Techniques for measuring and testing the significance of phylogenetic incongruence are used widely in systematic biology, and are necessary when considering genome-scale datasets composed of multiple genes that may or may not have different histories. They are also applicable wherever tree algorithms are used for ordering and interpreting data (e.g., DNA microarrays). Here, I review the different incongruence tests and use them to test the phylogenetic discord of a potentially mobile genetic element (the widespread colonization Island) in the gamma-proteobacteria. I then consider how incongruence tests may be used as a starting point for phylogenetic analysis that accounts for horizontal transfer and duplication events as explanations for homoplasy.

  9. Phylogenetic relationships and new genetic tools for the detection and discrimination of the three feline Demodex mites.

    PubMed

    Silbermayr, Katja; Horvath-Ungerboeck, Christa; Eigner, Barbara; Joachim, Anja; Ferrer, Lluis

    2015-02-01

    Two feline Demodex mite species have been described as causative agents of feline demodicosis, until recently a third species was detected. We provide an updated analysis on the phylogenetic relationship of Demodex mites. In addition, we present the first qPCR assay for the detection and differentiation of all three feline mite species in a single reaction. Specimen of Demodex cati, Demodex gatoi, and the recently discovered third species were collected from skin scrapings and fecal flotation for DNA extraction, conventional PCR, sequencing, and alignment. A total of 24 sequences of the partial 16S rRNA gene were used to estimate the evolutionary divergence in a p-distance model and a maximum likelihood phylogenetic tree. For the qPCR assay, new primers and fluorescent probes for the simultaneous detection of all three feline Demodex mites were designed. A consensus fragment of 351 bp was phylogenetically analyzed. The third species sequence of our study shares 98.6 % similarity to the available sequence in GenBank®. It is most similar to D. gatoi (82.41 %) and most distant to the canine Demodex injai (78.28 %). In contrast, D. gatoi is most similar to human Demodex brevis (87.01 %). The multiplex qPCR detected and discriminated the three different mite species in one reaction. The detection limit is ≤1.4 ng of mite DNA. The three feline Demodex species have distinct genotypes and did not cluster in one genetic clade. The species differentiation and assessment of evolutionary relationships will ultimately support correct diagnostics and treatment approaches.

  10. Alignment of CEBAF cryomodules

    SciTech Connect

    Schneider, W.J.; Bisognano, J.J.; Fischer, J.

    1993-06-01

    CEBAF, the Continuous Electron Beam Accelerator Facility, when completed, will house a 4 GeV recirculating accelerator. Each of the accelerator`s two linacs contains 160 superconducting radio frequency (SRF) 1497 MHz niobium cavities in 20 cryomodules. Alignments of the cavities within the cryomodule with respect to beam axis is critical to achieving the optimum accelerator performance. This paper discusses the rationale for the current specification on cavity mechanical alignment: 2 mrad (rms) applied to the 0.5 m active length cavities. We describe the tooling that was developed to achieve the tolerance at the time of cavity pair assembly, to preserve and integrate alignment during cryomodule assembly, and to translate alignment to appropriate installation in the beam line.

  11. The phylogenetic diversity of metagenomes.

    PubMed

    Kembel, Steven W; Eisen, Jonathan A; Pollard, Katherine S; Green, Jessica L

    2011-01-01

    Phylogenetic diversity--patterns of phylogenetic relatedness among organisms in ecological communities--provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context.

  12. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  13. Audubon Tree Study Program.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    Included are an illustrated student reader, "The Story of Trees," a leaders' guide, and a large tree chart with 37 colored pictures. The student reader reviews several aspects of trees: a definition of a tree; where and how trees grow; flowers, pollination and seed production; how trees make their food; how to recognize trees; seasonal changes;…

  14. An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace.

    PubMed

    Nye, Tom M W

    2014-01-01

    Most phylogenetic analyses result in a sample of trees, but summarizing and visualizing these samples can be challenging. Consensus trees often provide limited information about a sample, and so methods such as consensus networks, clustering and multidimensional scaling have been developed and applied to tree samples. This paper describes a stochastic algorithm for constructing a principal geodesic or line through treespace which is analogous to the first principal component in standard principal components analysis. A principal geodesic summarizes the most variable features of a sample of trees, in terms of both tree topology and branch lengths, and it can be visualized as an animation of smoothly changing trees. The algorithm performs a stochastic search through parameter space for a geodesic which minimizes the sum of squared projected distances of the data points. This procedure aims to identify the globally optimal principal geodesic, though convergence to locally optimal geodesics is possible. The methodology is illustrated by constructing principal geodesics for experimental and simulated data sets, demonstrating the insight into samples of trees that can be gained and how the method improves on a previously published approach. A java package called GeoPhytter for constructing and visualizing principal geodesics is freely available from www.ncl.ac.uk/ ntmwn/geophytter.

  15. Phylogenetic Relationships and Coaggregation Ability of Freshwater Biofilm Bacteria

    PubMed Central

    Rickard, Alex H.; Leach, Stephen A.; Hall, Laurence S.; Buswell, Clive M.; High, Nicola J.; Handley, Pauline S.

    2002-01-01

    Nineteen numerically dominant heterotrophic bacteria from a freshwater biofilm were identified by 16S ribosomal DNA gene sequencing, and their coaggregation partnerships were determined. Phylogenetic trees showed that both distantly related and closely related strains coaggregated at intergeneric, intrageneric, and intraspecies levels. One strain, Blastomonas natatoria 2.1, coaggregated with all 18 other strains and may function as a bridging organism in biofilm development. PMID:12089055

  16. Data on taxonomic status and phylogenetic relationship of tits.

    PubMed

    Li, Xue-Juan; Lin, Li-Liang; Cui, Ai-Ming; Bai, Jie; Wang, Xiao-Yang; Xin, Chao; Zhang, Zhen; Yang, Chao; Gao, Rui-Rui; Huang, Yuan; Lei, Fu-Min

    2017-02-01

    The data in this paper are related to the research article entitled "Taxonomic status and phylogenetic relationship of tits based on mitogenomes and nuclear segments" (X.J. Li et al., 2016) [1]. The mitochondrial genomes and nuclear segments of tits were sequenced to analyze mitochondrial characteristics and phylogeny. In the data, the analyzed results are presented. The data holds the resulting files of mitochondrial characteristics, heterogeneity, best schemes, and trees.

  17. Molecular phylogenetics and the origins of placental mammals.

    PubMed

    Murphy, W J; Eizirik, E; Johnson, W E; Zhang, Y P; Ryder, O A; O'Brien, S J

    2001-02-01

    The precise hierarchy of ancient divergence events that led to the present assemblage of modern placental mammals has been an area of controversy among morphologists, palaeontologists and molecular evolutionists. Here we address the potential weaknesses of limited character and taxon sampling in a comprehensive molecular phylogenetic analysis of 64 species sampled across all extant orders of placental mammals. We examined sequence variation in 18 homologous gene segments (including nearly 10,000 base pairs) that were selected for maximal phylogenetic informativeness in resolving the hierarchy of early mammalian divergence. Phylogenetic analyses identify four primary superordinal clades: (I) Afrotheria (elephants, manatees, hyraxes, tenrecs, aardvark and elephant shrews); (II) Xenarthra (sloths, anteaters and armadillos); (III) Glires (rodents and lagomorphs), as a sister taxon to primates, flying lemurs and tree shrews; and (IV) the remaining orders of placental mammals (cetaceans, artiodactyls, perissodactyls, carnivores, pangolins, bats and core insectivores). Our results provide new insight into the pattern of the early placental mammal radiation.

  18. Constructing circular phylogenetic networks from weighted quartets using simulated annealing.

    PubMed

    Eslahchi, Changiz; Hassanzadeh, Reza; Mottaghi, Ehsan; Habibi, Mahnaz; Pezeshk, Hamid; Sadeghi, Mehdi

    2012-02-01

    In this paper, we present a heuristic algorithm based on the simulated annealing, SAQ-Net, as a method for constructing phylogenetic networks from weighted quartets. Similar to QNet algorithm, SAQ-Net constructs a collection of circular weighted splits of the taxa set. This collection is represented by a split network. In order to show that SAQ-Net performs better than QNet, we apply these algorithm to both the simulated and actual data sets containing salmonella, Bees, Primates and Rubber data sets. Then we draw phylogenetic networks corresponding to outputs of these algorithms using SplitsTree4 and compare the results. We find that SAQ-Net produces a better circular ordering and phylogenetic networks than QNet in most cases. SAQ-Net has been implemented in Matlab and is available for download at http://bioinf.cs.ipm.ac.ir/softwares/saq.net.

  19. Constructing Phylogenetic Networks Based on the Isomorphism of Datasets

    PubMed Central

    Zhang, Zhibin; Li, Yanjuan

    2016-01-01

    Constructing rooted phylogenetic networks from rooted phylogenetic trees has become an important problem in molecular evolution. So far, many methods have been presented in this area, in which most efficient methods are based on the incompatible graph, such as the CASS, the LNETWORK, and the BIMLR. This paper will research the commonness of the methods based on the incompatible graph, the relationship between incompatible graph and the phylogenetic network, and the topologies of incompatible graphs. We can find out all the simplest datasets for a topology G and construct a network for every dataset. For any one dataset 𝒞, we can compute a network from the network representing the simplest dataset which is isomorphic to 𝒞. This process will save more time for the algorithms when constructing networks. PMID:27547759

  20. Identification of Bacteria Using Phylogenetic Relationships, Revealed by MS/MS Sequencing of Tryptic Peptides Derived from Cellular Proteins

    DTIC Science & Technology

    2004-11-17

    Universal Phylogenetic Tree of Bacteria Based on SSU rRNA Sequences Aquificae Termotogae Planctomycetes Actinobacteria Firmicutes Cyanobacteria...Identification of Bacteria Using Phylogenetic Relationships Revealed by MS/MS Sequencing of Tryptic Peptides Derived from Cellular Proteins Jacek P...Bacteria Using Phylogenetic Relationships Revealed by MS/MS Sequencing of Tryptic Peptides Derived from Cellular Proteins 5a. CONTRACT NUMBER 5b. GRANT

  1. 7. ALIGNMENT OF ABANDONED COULTERVILLE ROAD IN FORESTA AT FALLEN ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    7. ALIGNMENT OF ABANDONED COULTERVILLE ROAD IN FORESTA AT FALLEN TREE IN CENTER REAR. FOREGROUND MARKS TURN OF NEW ROAD FROM FORESTA TO HIGHWAY 120. LOOKING E. GIS: N-37 42 16.6 / W-119 44 00.3 - Coulterville Road, Between Foresta & All-Weather Highway, Yosemite Village, Mariposa County, CA

  2. 18. VIEW OF GRAND CANAL, SHOWING OLD ALIGNMENT BEFORE 1989 ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    18. VIEW OF GRAND CANAL, SHOWING OLD ALIGNMENT BEFORE 1989 REALIGNMENT, LOOKING NORTH TOWARD RAILROAD CROSSING AND CROSSCUT STEAM PLANT LARGE WHITE BUILDING. THE CROSSCUT HYDRO PLANT IS HIDDEN BY TREES TO RIGHT OF STEAM PLANT. Photographer: Mark Durben, April 1989 - Grand Canal, North side of Salt River, Tempe, Maricopa County, AZ

  3. Combinatorics of distance-based tree inference

    PubMed Central

    Pardi, Fabio; Gascuel, Olivier

    2012-01-01

    Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sci USA 105:13206–13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances. PMID:23012403

  4. Combinatorics of distance-based tree inference.

    PubMed

    Pardi, Fabio; Gascuel, Olivier

    2012-10-09

    Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sci USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.

  5. Phylogenetic plant community structure along elevation is lineage specific

    PubMed Central

    Ndiribe, Charlotte; Pellissier, Loïc; Antonelli, Silvia; Dubuis, Anne; Pottier, Julien; Vittoz, Pascal; Guisan, Antoine; Salamin, Nicolas

    2013-01-01

    The trend of closely related taxa to retain similar environmental preferences mediated by inherited traits suggests that several patterns observed at the community scale originate from longer evolutionary processes. While the effects of phylogenetic relatedness have been previously studied within a single genus or family, lineage-specific effects on the ecological processes governing community assembly have rarely been studied for entire communities or flora. Here, we measured how community phylogenetic structure varies across a wide elevation gradient for plant lineages represented by 35 families, using a co-occurrence index and net relatedness index (NRI). We propose a framework that analyses each lineage separately and reveals the trend of ecological assembly at tree nodes. We found prevailing phylogenetic clustering for more ancient nodes and overdispersion in more recent tree nodes. Closely related species may thus rapidly evolve new environmental tolerances to radiate into distinct communities, while older lineages likely retain inherent environmental tolerances to occupy communities in similar environments, either through efficient dispersal mechanisms or the exclusion of older lineages with more divergent environmental tolerances. Our study illustrates the importance of disentangling the patterns of community assembly among lineages to better interpret the ecological role of traits. It also sheds light on studies reporting absence of phylogenetic signal, and opens new perspectives on the analysis of niche and trait conservatism across lineages. PMID:24455126

  6. Using directed phylogenetic networks to retrace species dispersal history.

    PubMed

    Layeghifard, Mehdi; Peres-Neto, Pedro R; Makarenkov, Vladimir

    2012-07-01

    Methods designed for inferring phylogenetic trees have been widely applied to reconstruct biogeographic history. Because traditional phylogenetic methods used in biogeographic reconstruction are based on trees rather than networks, they follow the strict assumption in which dispersal among geographical units have occurred on the basis of single dispersal routes across regions and are, therefore, incapable of modelling multiple alternative dispersal scenarios. The goal of this study is to describe a new method that allows for retracing species dispersal by means of directed phylogenetic networks obtained using a horizontal gene transfer (HGT) detection method as well as to draw parallels between the processes of HGT and biogeographic reconstruction. In our case study, we reconstructed the biogeographic history of the postglacial dispersal of freshwater fishes in the Ontario province of Canada. This case study demonstrated the utility and robustness of the new method, indicating that the most important events were south-to-north dispersal patterns, as one would expect, with secondary faunal interchange among regions. Finally, we showed how our method can be used to explore additional questions regarding the commonalities in dispersal history patterns and phylogenetic similarities among species.

  7. Systematic conservation planning for groundwater ecosystems using phylogenetic diversity.

    PubMed

    Asmyhr, Maria G; Linke, Simon; Hose, Grant; Nipperess, David A

    2014-01-01

    Aquifer ecosystems provide a range of important services including clean drinking water. These ecosystems, which are largely inaccessible to humans, comprise a distinct invertebrate fauna (stygofauna), which is characterized by narrow distributions, high levels of endemism and cryptic species. Although being under enormous anthropogenic pressure, aquifers have rarely been included in conservation planning because of the general lack of knowledge of species diversity and distribution. Here we use molecular sequence data and phylogenetic diversity as surrogates for stygofauna diversity in aquifers of New South Wales, Australia. We demonstrate how to incorporate these data as conservation features in the systematic conservation planning software Marxan. We designated each branch of the phylogenetic tree as a conservation feature, with the branch length as a surrogate for the number of distinct characters represented by each branch. Two molecular markers (nuclear 18S ribosomal DNA and mitochondrial cytochrome oxidase subunit I) were used to evaluate how marker variability and the resulting tree topology affected the site-selection process. We found that the sites containing the deepest phylogenetic branches were deemed the most irreplaceable by Marxan. By integrating phylogenetic data, we provide a method for including taxonomically undescribed groundwater fauna in systematic conservation planning.

  8. DNA barcoding and phylogenetic relationships of Ardeidae (Aves: Ciconiiformes).

    PubMed

    Huang, Z H; Li, M F; Qin, J W

    2016-08-19

    The avian family Ardeidae comprises long-legged freshwater and coastal birds. There has been considerable disagreement concerning the intrafamilial relationships of Ardeidae. Mitochondrial cytochrome c oxidase subunit I (COI) was used as a marker for the identification and phylogenetic analysis of avian species. In the present study, we analyzed the COI barcodes of 32 species from 17 genera belonging to the family Ardeidae. Each bird species possessed a barcode distinct from that of other bird species except for Egretta thula and E. garzetta, which shared one barcoding sequence. Kimura two-parameter distances were calculated between barcodes. The average genetic distance between species was 34-fold higher than the average genetic distance within species. Neighbor-joining and maximum likelihood methods were used to construct phylogenetic trees. Most species could be discriminated by their distinct clades in the phylogenetic tree. Both methods of phylogenetic reconstruction suggested that Zebrilus, Tigrisoma, and Cochlearius were an offshoot of the primitive herons. COI gene analysis suggested that the other herons could be divided into two clades: Botaurinae and Ardeinae. Our results support the Great Egret and Intermediate Egret being in separate genera, Casmerodius and Mesophoyx, respectively.

  9. Systematic Conservation Planning for Groundwater Ecosystems Using Phylogenetic Diversity

    PubMed Central

    Asmyhr, Maria G.; Linke, Simon; Hose, Grant; Nipperess, David A.

    2014-01-01

    Aquifer ecosystems provide a range of important services including clean drinking water. These ecosystems, which are largely inaccessible to humans, comprise a distinct invertebrate fauna (stygofauna), which is characterized by narrow distributions, high levels of endemism and cryptic species. Although being under enormous anthropogenic pressure, aquifers have rarely been included in conservation planning because of the general lack of knowledge of species diversity and distribution. Here we use molecular sequence data and phylogenetic diversity as surrogates for stygofauna diversity in aquifers of New South Wales, Australia. We demonstrate how to incorporate these data as conservation features in the systematic conservation planning software Marxan. We designated each branch of the phylogenetic tree as a conservation feature, with the branch length as a surrogate for the number of distinct characters represented by each branch. Two molecular markers (nuclear 18S ribosomal DNA and mitochondrial cytochrome oxidase subunit I) were used to evaluate how marker variability and the resulting tree topology affected the site-selection process. We found that the sites containing the deepest phylogenetic branches were deemed the most irreplaceable by Marxan. By integrating phylogenetic data, we provide a method for including taxonomically undescribed groundwater fauna in systematic conservation planning. PMID:25514422

  10. Curriculum Alignment Research Suggests that Alignment Can Improve Student Achievement

    ERIC Educational Resources Information Center

    Squires, David

    2012-01-01

    Curriculum alignment research has developed showing the relationship among three alignment categories: the taught curriculum, the tested curriculum and the written curriculum. Each pair (for example, the taught and the written curriculum) shows a positive impact for aligning those results. Following this, alignment results from the Third…

  11. Tree harvesting

    SciTech Connect

    Badger, P.C.

    1995-12-31

    Short rotation intensive culture tree plantations have been a major part of biomass energy concepts since the beginning. One aspect receiving less attention than it deserves is harvesting. This article describes an method of harvesting somewhere between agricultural mowing machines and huge feller-bunchers of the pulpwood and lumber industries.

  12. The rapidly changing landscape of insect phylogenetics.

    PubMed

    Maddison, David R

    2016-12-01

    Insect phylogenetics is being profoundly changed by many innovations. Although rapid developments in genomics have center stage, key progress has been made in phenomics, field and museum science, digital databases and pipelines, analytical tools, and the culture of science. The importance of these methodological and cultural changes to the pace of inference of the hexapod Tree of Life is discussed. The innovations have the potential, when synthesized and mobilized in ways as yet unforeseen, to shine light on the million or more clades in insects, and infer their composition with confidence. There are many challenges to overcome before insects can enter the 'phylocognisant age', but because of the promise of genomics, phenomics, and informatics, that is now an imaginable future.

  13. Dating human cultural capacity using phylogenetic principles

    PubMed Central

    Lind, J.; Lindenfors, P.; Ghirlanda, S.; Lidén, K.; Enquist, M.

    2013-01-01

    Humans have genetically based unique abilities making complex culture possible; an assemblage of traits which we term “cultural capacity”. The age of this capacity has for long been subject to controversy. We apply phylogenetic principles to date this capacity, integrating evidence from archaeology, genetics, paleoanthropology, and linguistics. We show that cultural capacity is older than the first split in the modern human lineage, and at least 170,000 years old, based on data on hyoid bone morphology, FOXP2 alleles, agreement between genetic and language trees, fire use, burials, and the early appearance of tools comparable to those of modern hunter-gatherers. We cannot exclude that Neanderthals had cultural capacity some 500,000 years ago. A capacity for complex culture, therefore, must have existed before complex culture itself. It may even originated long before. This seeming paradox is resolved by theoretical models suggesting that cultural evolution is exceedingly slow in its initial stages. PMID:23648831

  14. Dating human cultural capacity using phylogenetic principles.

    PubMed

    Lind, J; Lindenfors, P; Ghirlanda, S; Lidén, K; Enquist, M

    2013-01-01

    Humans have genetically based unique abilities making complex culture possible; an assemblage of traits which we term "cultural capacity". The age of this capacity has for long been subject to controversy. We apply phylogenetic principles to date this capacity, integrating evidence from archaeology, genetics, paleoanthropology, and linguistics. We show that cultural capacity is older than the first split in the modern human lineage, and at least 170,000 years old, based on data on hyoid bone morphology, FOXP2 alleles, agreement between genetic and language trees, fire use, burials, and the early appearance of tools comparable to those of modern hunter-gatherers. We cannot exclude that Neanderthals had cultural capacity some 500,000 years ago. A capacity for complex culture, therefore, must have existed before complex culture itself. It may even originated long before. This seeming paradox is resolved by theoretical models suggesting that cultural evolution is exceedingly slow in its initial stages.

  15. Nearly complete rRNA genes assembled from across the metazoan animals: effects of more taxa, a structure-based alignment, and paired-sites evolutionary models on phylogeny reconstruction.

    PubMed

    Mallatt, Jon; Craig, Catherine Waggoner; Yoder, Matthew J

    2010-04-01

    This study (1) uses nearly complete rRNA-gene sequences from across Metazoa (197 taxa) to reconstruct animal phylogeny; (2) presents a highly annotated, manual alignment of these sequences with special reference to rRNA features including paired sites (http://purl.oclc.org/NET/rRNA/Metazoan_alignment) and (3) tests, after eliminating as few disruptive, rogue sequences as possible, if a likelihood framework can recover the main metazoan clades. We found that systematic elimination of approximately 6% of the sequences, including the divergent or unstably placed sequences of cephalopods, arrowworm, symphylan and pauropod myriapods, and of myzostomid and nemertodermatid worms, led to a tree that supported Ecdysozoa, Lophotrochozoa, Protostomia, and Bilateria. Deuterostomia, however, was never recovered, because the rRNA of urochordates goes (nonsignificantly) near the base of the Bilateria. Counterintuitively, when we modeled the evolution of the paired sites, phylogenetic resolution was not increased over traditional tree-building models that assume all sites in rRNA evolve independently. The rRNA genes of non-bilaterians contain a higher % AT than do those of most bilaterians. The rRNA genes of Acoela and Myzostomida were found to be secondarily shortened, AT-enriched, and highly modified, throwing some doubt on the location of these worms at the base of Bilateria in the rRNA tree--especially myzostomids, which other evidence suggests are annelids instead. Other findings are marsupial-with-placental mammals, arrowworms in Ecdysozoa (well supported here but contradicted by morphology), and Placozoa as sister to Cnidaria. Finally, despite the difficulties, the rRNA-gene trees are in strong concordance with trees derived from multiple protein-coding genes in supporting the new animal phylogeny.

  16. Phylogenetic relationship among horseshoe crab species: effect of substitution models on phylogenetic analyses.

    PubMed

    Xia, X

    2000-03-01

    The horseshoe crabs, known as living fossils, have maintained their morphology almost unchanged for the past 150 million years. The little morphological differentiation among horseshoe crab lineages has resulted in substantial controversy concerning the phylogenetic relationship among the extant species of horseshoe crabs, especially among the three species in the Indo-Pacific region. Previous studies suggest that the three species constitute a phylogenetically unresolvable trichotomy, the result of a cladogenetic process leading to the formation of all three Indo-Pacific species in a short geological time. Data from two mitochondrial genes (for 16S ribosomal rRNA and cytochrome oxidase subunit I) and one nuclear gene (for coagulogen) in the four species of horseshoe crabs and outgroup species were used in a phylogenetic analysis with various substitution models. All three genes yield the same tree topology, with Tachypleus-gigas and Carcinoscorpius-rotundicauda grouped together as a monophyletic taxon. This topology is significantly better than all the alternatives when evaluated with the RELL (resampling estimated log-likelihood) method.

  17. TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

    SciTech Connect

    Gu, Shengyin; Anderson, Iain; Kunin, Victor; Cipriano, Michael; Minovitsky, Simon; Weber, Gunther; Amenta, Nina; Hamann, Bernd; Dubchak,Inna

    2007-05-07

    Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.

  18. Phylogenetic constraints on ecosystem functioning.

    PubMed

    Gravel, Dominique; Bell, Thomas; Barbera, Claire; Combe, Marine; Pommier, Thomas; Mouquet, Nicolas

    2012-01-01

    There is consensus that biodiversity losses will result in declining ecosystem functioning if species have different functional traits. Phylogenetic diversity has recently been suggested as a predictor of ecosystem functioning because it could approximate the functional complementarity among species. Here we describe an experiment that takes advantage of the rapid evolutionary response of bacteria to disentangle the role of phylogenetic and species diversity. We impose a strong selection regime on marine bacterial lineages and assemble the ancestral and evolved lines in microcosms of varying lineage and phylogenetic diversity. We find that the relationship between phylogenetic diversity and productivity is strong for the ancestral lineages but brakes down for the evolved lineages. Our results not only emphasize the potential of using phylogeny to evaluate ecosystem functioning, but also they warn against using phylogenetics as a proxy for functional diversity without good information on species evolutionary history.

  19. Continental monophyly of cichlid fishes and the phylogenetic position of Heterochromis multidens.

    PubMed

    Keck, Benjamin P; Hulsey, C Darrin

    2014-04-01

    The incredibly species-rich cichlid fish faunas of both the Neotropics and Africa are generally thought to be reciprocally monophyletic. However, the phylogenetic affinity of the African cichlid Heterochromis multidens is ambiguous, and this distinct lineage could make African cichlids paraphyletic. In past studies, Heterochromis has been variously suggested to be one of the earliest diverging lineages within either the Neotropical or the African cichlid radiations, and it has even been hypothesized to be the sister lineage to a clade containing all Neotropical and African cichlids. We examined the phylogenetic relationships among a representative sample of cichlids with a dataset of 29 nuclear loci to assess the support for the different hypotheses of the phylogenetic position of Heterochromis. Although individual gene trees in some instances supported alternative relationships, a majority of gene trees, integration of genes into species trees, and hypothesis testing of putative topologies all supported Heterochromis as belonging to the clade of African cichlids.

  20. Phylogenetic analysis of honey bee behavioral evolution.

    PubMed

    Raffiudin, Rika; Crozier, Ross H

    2007-05-01

    DNA sequences from three mitochondrial (rrnL, cox2, nad2) and one nuclear gene (itpr) from all 9 known honey bee species (Apis), a 10th possible species, Apis dorsata binghami, and three outgroup species (Bombus terrestris, Melipona bicolor and Trigona fimbriata) were used to infer Apis phylogenetic relationships using Bayesian analysis. The dwarf honey bees were confirmed as basal, and the giant and cavity-nesting species to be monophyletic. All nodes were strongly supported except that grouping Apis cerana with A. nigrocincta. Two thousand post-burnin trees from the phylogenetic analysis were used in a Bayesian comparative analysis to explore the evolution of dance type, nest structure, comb structure and dance sound within Apis. The ancestral honey bee species was inferred with high support to have nested in the open, and to have more likely than not had a silent vertical waggle dance and a single comb. The common ancestor of the giant and cavity-dwelling bees is strongly inferred to have had a buzzing vertical directional dance. All pairwise combinations of characters showed strong association, but the multiple comparisons problem reduces the ability to infer associations between states between characters. Nevertheless, a buzzing dance is significantly associated with cavity-nesting, several vertical combs, and dancing vertically, a horizontal dance is significantly associated with a nest with a single comb wrapped around the support, and open nesting with a single pendant comb and a silent waggle dance.

  1. Phylesystem: a git-based data store for community-curated phylogenetic estimates

    PubMed Central

    McTavish, Emily Jane; Hinchliff, Cody E.; Allman, James F.; Brown, Joseph W.; Cranston, Karen A.; Rees, Jonathan A.; Smith, Stephen A.

    2015-01-01

    Motivation: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. Results: Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git’s version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the ‘phylesystem-api’, which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. Availability and implementation: Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web

  2. Charles Darwin, beetles and phylogenetics.

    PubMed

    Beutel, Rolf G; Friedrich, Frank; Leschen, Richard A B

    2009-11-01

    Here, we review Charles Darwin's relation to beetles and developments in coleopteran systematics in the last two centuries. Darwin was an enthusiastic beetle collector. He used beetles to illustrat