Science.gov

Sample records for alignments phylogenetic trees

  1. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    PubMed

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  2. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    PubMed

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  3. Phylogenetic characterization of transport protein superfamilies: superiority of SuperfamilyTree programs over those based on multiple alignments.

    PubMed

    Chen, Jonathan S; Reddy, Vamsee; Chen, Joshua H; Shlykov, Maksim A; Zheng, Wei Hao; Cho, Jaehoon; Yen, Ming Ren; Saier, Milton H

    2011-01-01

    Transport proteins function in the translocation of ions, solutes and macromolecules across cellular and organellar membranes. These integral membrane proteins fall into >600 families as tabulated in the Transporter Classification Database (www.tcdb.org). Recent studies, some of which are reported here, define distant phylogenetic relationships between families with the creation of superfamilies. Several of these are analyzed using a novel set of programs designed to allow reliable prediction of phylogenetic trees when sequence divergence is too great to allow the use of multiple alignments. These new programs, called SuperfamilyTree1 and 2 (SFT1 and 2), allow display of protein and family relationships, respectively, based on thousands of comparative BLAST scores rather than multiple alignments. Superfamilies analyzed include: (1) Aerolysins, (2) RTX Toxins, (3) Defensins, (4) Ion Transporters, (5) Bile/Arsenite/Riboflavin Transporters, (6) Cation:Proton Antiporters, and (7) the Glucose/Fructose/Lactose superfamily within the prokaryotic phosphoenol pyruvate-dependent Phosphotransferase System. In addition to defining the phylogenetic relationships of the proteins and families within these seven superfamilies, evidence is provided showing that the SFT programs outperform programs that are based on multiple alignments whenever sequence divergence of superfamily members is extensive. The SFT programs should be applicable to virtually any superfamily of proteins or nucleic acids.

  4. Phylogenetic Inference From Conserved sites Alignments

    SciTech Connect

    grundy, W.N.; Naylor, G.J.P.

    1999-08-15

    Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements.

  5. Interim Report on Multiple Sequence Alignments and TaqMan Signature Mapping to Phylogenetic Trees

    SciTech Connect

    Gardner, S; Jaing, C

    2012-03-27

    The goal of this project is to develop forensic genotyping assays for select agent viruses, addressing a significant capability gap for the viral bioforensics and law enforcement community. We used a multipronged approach combining bioinformatics analysis, PCR-enriched samples, microarrays and TaqMan assays to develop high resolution and cost effective genotyping methods for strain level forensic discrimination of viruses. We have leveraged substantial experience and efficiency gained through year 1 on software development, SNP discovery, TaqMan signature design and phylogenetic signature mapping to scale up the development of forensics signatures in year 2. In this report, we have summarized the Taqman signature development for South American hemorrhagic fever viruses, tick-borne encephalitis viruses and henipaviruses, Old World Arenaviruses, filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus and Japanese encephalitis virus.

  6. A Universal Phylogenetic Tree.

    ERIC Educational Resources Information Center

    Offner, Susan

    2001-01-01

    Presents a universal phylogenetic tree suitable for use in high school and college-level biology classrooms. Illustrates the antiquity of life and that all life is related, even if it dates back 3.5 billion years. Reflects important evolutionary relationships and provides an exciting way to learn about the history of life. (SAH)

  7. Phylogenetic trees in bioinformatics

    SciTech Connect

    Burr, Tom L

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  8. Visualizing phylogenetic trees using TreeView.

    PubMed

    Page, Roderic D M

    2002-08-01

    TreeView provides a simple way to view the phylogenetic trees produced by a range of programs, such as PAUP*, PHYLIP, TREE-PUZZLE, and ClustalX. While some phylogenetic programs (such as the Macintosh version of PAUP*) have excellent tree printing facilities, many programs do not have the ability to generate publication quality trees. TreeView addresses this need. The program can read and write a range of tree file formats, display trees in a variety of styles, print trees, and save the tree as a graphic file. Protocols in this unit cover both displaying and printing a tree. Support protocols describe how to download and install TreeView, and how to display bootstrap values in trees generated by ClustalX and PAUP*. PMID:18792942

  9. Terraces in phylogenetic tree space.

    PubMed

    Sanderson, Michael J; McMahon, Michelle M; Steel, Mike

    2011-07-22

    A key step in assembling the tree of life is the construction of species-rich phylogenies from multilocus--but often incomplete--sequence data sets. We describe previously unknown structure in the landscape of solutions to the tree reconstruction problem, comprising sometimes vast "terraces" of trees with identical quality, arranged on islands of phylogenetically similar trees. Phylogenetic ambiguity within a terrace can be characterized efficiently and then ameliorated by new algorithms for obtaining a terrace's maximum-agreement subtree or by identifying the smallest set of new targets for additional sequencing. Algorithms to find optimal trees or estimate Bayesian posterior tree distributions may need to navigate strategically in the neighborhood of large terraces in tree space.

  10. Interpreting the universal phylogenetic tree

    NASA Technical Reports Server (NTRS)

    Woese, C. R.

    2000-01-01

    The universal phylogenetic tree not only spans all extant life, but its root and earliest branchings represent stages in the evolutionary process before modern cell types had come into being. The evolution of the cell is an interplay between vertically derived and horizontally acquired variation. Primitive cellular entities were necessarily simpler and more modular in design than are modern cells. Consequently, horizontal gene transfer early on was pervasive, dominating the evolutionary dynamic. The root of the universal phylogenetic tree represents the first stage in cellular evolution when the evolving cell became sufficiently integrated and stable to the erosive effects of horizontal gene transfer that true organismal lineages could exist.

  11. On Tree-Based Phylogenetic Networks.

    PubMed

    Zhang, Louxin

    2016-07-01

    A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.

  12. Transforming phylogenetic networks: Moving beyond tree space.

    PubMed

    Huber, Katharina T; Moulton, Vincent; Wu, Taoyang

    2016-09-01

    Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks.

  13. Visual exploration of parameter influence on phylogenetic trees.

    PubMed

    Hess, Martin; Bremm, Sebastian; Weissgraeber, Stephanie; Hamacher, Kay; Goesele, Michael; Wiemeyer, Josef; von Landesberger, Tatiana

    2014-01-01

    Evolutionary relationships between organisms are frequently derived as phylogenetic trees inferred from multiple sequence alignments (MSAs). The MSA parameter space is exponentially large, so tens of thousands of potential trees can emerge for each dataset. A proposed visual-analytics approach can reveal the parameters' impact on the trees. Given input trees created with different parameter settings, it hierarchically clusters the trees according to their structural similarity. The most important clusters of similar trees are shown together with their parameters. This view offers interactive parameter exploration and automatic identification of relevant parameters. Biologists applied this approach to real data of 16S ribosomal RNA and protein sequences of ion channels. It revealed which parameters affected the tree structures. This led to a more reliable selection of the best trees.

  14. Encoding phylogenetic trees in terms of weighted quartets.

    PubMed

    Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles

    2008-04-01

    One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.

  15. Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs

    PubMed Central

    Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.

    2013-01-01

    Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118

  16. Using tree diversity to compare phylogenetic heuristics

    PubMed Central

    Sul, Seung-Jin; Matthews, Suzanne; Williams, Tiffani L

    2009-01-01

    Background Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic heuristics are used to search stochastically for the best-scoring trees in tree space. Given that better tree scores are believed to be better approximations of the true phylogeny, traditional evaluation techniques have used tree scores to determine the heuristics that find the best scores in the fastest time. We develop new techniques to evaluate phylogenetic heuristics based on both tree scores and topologies to compare Pauprat and Rec-I-DCM3, two popular Maximum Parsimony search algorithms. Results Our results show that although Pauprat and Rec-I-DCM3 find the trees with the same best scores, topologically these trees are quite different. Furthermore, the Rec-I-DCM3 trees cluster distinctly from the Pauprat trees. In addition to our heatmap visualizations of using parsimony scores and the Robinson-Foulds distance to compare best-scoring trees found by the two heuristics, we also develop entropy-based methods to show the diversity of the trees found. Overall, Pauprat identifies more diverse trees than Rec-I-DCM3. Conclusion Overall, our work shows that there is value to comparing heuristics beyond the parsimony scores that they find. Pauprat is a slower heuristic than Rec-I-DCM3. However, our work shows that there is tremendous value in using Pauprat to reconstruct trees—especially since it finds identical scoring but topologically distinct trees. Hence, instead of discounting Pauprat, effort should go in improving its implementation. Ultimately, improved performance measures lead to better phylogenetic heuristics and will result in better approximations of the true evolutionary history of the organisms of interest. PMID:19426451

  17. Terrestrial apes and phylogenetic trees

    PubMed Central

    Arsuaga, Juan Luis

    2010-01-01

    The image that best expresses Darwin’s thinking is the tree of life. However, Darwin’s human evolutionary tree lacked almost everything because only the Neanderthals were known at the time and they were considered one extreme expression of our own species. Darwin believed that the root of the human tree was very deep and in Africa. It was not until 1962 that the root was shown to be much more recent in time and definitively in Africa. On the other hand, some neo-Darwinians believed that our family tree was not a tree, because there were no branches, but, rather, a straight stem. The recent years have witnessed spectacular discoveries in Africa that take us close to the origin of the human tree and in Spain at Atapuerca that help us better understand the origin of the Neanderthals as well as our own species. The final form of the tree, and the number of branches, remains an object of passionate debate. PMID:20445090

  18. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

    PubMed Central

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-01-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. PMID:26031838

  19. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

    PubMed

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-09-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. PMID:26031838

  20. Relating phylogenetic trees to transmission trees of infectious disease outbreaks.

    PubMed

    Ypma, Rolf J F; van Ballegooijen, W Marijn; Wallinga, Jacco

    2013-11-01

    Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology.

  1. Constructing Student Problems in Phylogenetic Tree Construction.

    ERIC Educational Resources Information Center

    Brewer, Steven D.

    Evolution is often equated with natural selection and is taught from a primarily functional perspective while comparative and historical approaches, which are critical for developing an appreciation of the power of evolutionary theory, are often neglected. This report describes a study of expert problem-solving in phylogenetic tree construction.…

  2. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  3. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing. PMID:27377322

  4. CVTree: a phylogenetic tree reconstruction tool based on whole genomes.

    PubMed

    Qi, Ji; Luo, Hong; Hao, Bailin

    2004-07-01

    Composition Vector Tree (CVTree) implements a systematic method of inferring evolutionary relatedness of microbial organisms from the oligopeptide content of their complete proteomes (http://cvtree.cbi.pku.edu.cn). Since the first bacterial genomes were sequenced in 1995 there have been several attempts to infer prokaryote phylogeny from complete genomes. Most of them depend on sequence alignment directly or indirectly and, in some cases, need fine-tuning and adjustment. The composition vector method circumvents the ambiguity of choosing the genes for phylogenetic reconstruction and avoids the necessity of aligning sequences of essentially different length and gene content. This new method does not contain 'free' parameter and 'fine-tuning'. A bootstrap test for a phylogenetic tree of 139 organisms has shown the stability of the branchings, which support the small subunit ribosomal RNA (SSU rRNA) tree of life in its overall structure and in many details. It may provide a quick reference in prokaryote phylogenetics whenever the proteome of an organism is available, a situation that will become commonplace in the near future.

  5. Quantifying MCMC exploration of phylogenetic tree space.

    PubMed

    Whidden, Chris; Matsen, Frederick A

    2015-05-01

    In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks.

  6. Quantifying MCMC Exploration of Phylogenetic Tree Space

    PubMed Central

    Whidden, Chris; Matsen, Frederick A.

    2015-01-01

    In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks. PMID:25631175

  7. A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method.

    PubMed

    Hatje, Klas; Kollmar, Martin

    2012-01-01

    Phylogenetic analyses reveal the evolutionary derivation of species. A phylogenetic tree can be inferred from multiple sequence alignments of proteins or genes. The alignment of whole genome sequences of higher eukaryotes is a computational intensive and ambitious task as is the computation of phylogenetic trees based on these alignments. To overcome these limitations, we here used an alignment-free method to compare genomes of the Brassicales clade. For each nucleotide sequence a Chaos Game Representation (CGR) can be computed, which represents each nucleotide of the sequence as a point in a square defined by the four nucleotides as vertices. Each CGR is therefore a unique fingerprint of the underlying sequence. If the CGRs are divided by grid lines each grid square denotes the occurrence of oligonucleotides of a specific length in the sequence (Frequency Chaos Game Representation, FCGR). Here, we used distance measures between FCGRs to infer phylogenetic trees of Brassicales species. Three types of data were analyzed because of their different characteristics: (A) Whole genome assemblies as far as available for species belonging to the Malvidae taxon. (B) EST data of species of the Brassicales clade. (C) Mitochondrial genomes of the Rosids branch, a supergroup of the Malvidae. The trees reconstructed based on the Euclidean distance method are in general agreement with single gene trees. The Fitch-Margoliash and Neighbor joining algorithms resulted in similar to identical trees. Here, for the first time we have applied the bootstrap re-sampling concept to trees based on FCGRs to determine the support of the branchings. FCGRs have the advantage that they are fast to calculate, and can be used as additional information to alignment based data and morphological characteristics to improve the phylogenetic classification of species in ambiguous cases.

  8. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution

    PubMed Central

    Kendall, Michelle; Colijn, Caroline

    2016-01-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. Key words: phylogenetics, evolution, tree metrics, genetics, sequencing. PMID:27343287

  9. Matching split distance for unrooted binary phylogenetic trees.

    PubMed

    Bogdanowicz, Damian; Giaro, Krzysztof

    2012-01-01

    The reconstruction of evolutionary trees is one of the primary objectives in phylogenetics. Such a tree represents the historical evolutionary relationship between different species or organisms. Tree comparisons are used for multiple purposes, from unveiling the history of species to deciphering evolutionary associations among organisms and geographical areas. In this paper, we propose a new method of defining distances between unrooted binary phylogenetic trees that is especially useful for relatively large phylogenetic trees. Next, we investigate in detail the properties of one example of these metrics, called the Matching Split distance, and describe how the general method can be extended to nonbinary trees.

  10. Student Interpretations of Phylogenetic Trees in an Introductory Biology Course

    ERIC Educational Resources Information Center

    Dees, Jonathan; Momsen, Jennifer L.; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa…

  11. TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction.

    PubMed

    Chang, Jia-Ming; Di Tommaso, Paolo; Lefort, Vincent; Gascuel, Olivier; Notredame, Cedric

    2015-07-01

    This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate the local reliability of protein multiple sequence alignments (MSAs) using the TCS index. The evaluation can be used to identify the aligned positions most likely to contain structurally analogous residues and also most likely to support an accurate phylogenetic reconstruction. The TCS scoring scheme has been shown to be accurate predictor of structural alignment correctness among commonly used methods. It has also been shown to outperform common filtering schemes like Gblocks or trimAl when doing MSA post-processing prior to phylogenetic tree reconstruction. The web server is available from http://tcoffee.crg.cat/tcs.

  12. Student interpretations of phylogenetic trees in an introductory biology course.

    PubMed

    Dees, Jonathan; Momsen, Jennifer L; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa relatedness on phylogenetic trees, to measure the prevalence of correct taxa-relatedness interpretations, and to determine how student reasoning and correctness change in response to instruction and over time. Counting synapomorphies and nodes between taxa were the most common forms of incorrect reasoning, which presents a pedagogical dilemma concerning labeled synapomorphies on phylogenetic trees. Students also independently generated an alternative form of correct reasoning using monophyletic groups, the use of which decreased in popularity over time. Approximately half of all students were able to correctly interpret taxa relatedness on phylogenetic trees, and many memorized correct reasoning without understanding its application. Broad initial instruction that allowed students to generate inferences on their own contributed very little to phylogenetic tree understanding, while targeted instruction on evolutionary relationships improved understanding to some extent. Phylogenetic trees, which can directly affect student understanding of evolution, appear to offer introductory biology instructors a formidable pedagogical challenge.

  13. Enumerating all maximal frequent subtrees in collections of phylogenetic trees

    PubMed Central

    2014-01-01

    Background A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events. Results We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/. Conclusions Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees. PMID:25061474

  14. Student Interpretations of Phylogenetic Trees in an Introductory Biology Course

    PubMed Central

    Dees, Jonathan; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa relatedness on phylogenetic trees, to measure the prevalence of correct taxa-relatedness interpretations, and to determine how student reasoning and correctness change in response to instruction and over time. Counting synapomorphies and nodes between taxa were the most common forms of incorrect reasoning, which presents a pedagogical dilemma concerning labeled synapomorphies on phylogenetic trees. Students also independently generated an alternative form of correct reasoning using monophyletic groups, the use of which decreased in popularity over time. Approximately half of all students were able to correctly interpret taxa relatedness on phylogenetic trees, and many memorized correct reasoning without understanding its application. Broad initial instruction that allowed students to generate inferences on their own contributed very little to phylogenetic tree understanding, while targeted instruction on evolutionary relationships improved understanding to some extent. Phylogenetic trees, which can directly affect student understanding of evolution, appear to offer introductory biology instructors a formidable pedagogical challenge. PMID:25452489

  15. Trinets encode tree-child and level-2 phylogenetic networks.

    PubMed

    van Iersel, Leo; Moulton, Vincent

    2014-06-01

    Phylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to develop methods to construct rooted phylogenetic networks from triplets, that is rooted trees on three species. However, although triplets determine or encode rooted phylogenetic trees, they do not in general encode rooted phylogenetic networks, which is a potential issue for any such method. Motivated by this fact, Huber and Moulton recently introduced trinets as a natural extension of rooted triplets to networks. In particular, they showed that [Formula: see text] phylogenetic networks are encoded by their trinets, and also conjectured that all "recoverable" rooted phylogenetic networks are encoded by their trinets. Here we prove that recoverable binary level-2 networks and binary tree-child networks are also encoded by their trinets. To do this we prove two decomposition theorems based on trinets which hold for all recoverable binary rooted phylogenetic networks. Our results provide some additional evidence in support of the conjecture that trinets encode all recoverable rooted phylogenetic networks, and could also lead to new approaches to construct phylogenetic networks from trinets.

  16. Estimating phylogenetic trees from genome-scale data.

    PubMed

    Liu, Liang; Xi, Zhenxiang; Wu, Shaoyuan; Davis, Charles C; Edwards, Scott V

    2015-12-01

    The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data. PMID:25873435

  17. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis

    PubMed Central

    Trifinopoulos, Jana; Nguyen, Lam-Tung; von Haeseler, Arndt; Minh, Bui Quang

    2016-01-01

    This article presents W-IQ-TREE, an intuitive and user-friendly web interface and server for IQ-TREE, an efficient phylogenetic software for maximum likelihood analysis. W-IQ-TREE supports multiple sequence types (DNA, protein, codon, binary and morphology) in common alignment formats and a wide range of evolutionary models including mixture and partition models. W-IQ-TREE performs fast model selection, partition scheme finding, efficient tree reconstruction, ultrafast bootstrapping, branch tests, and tree topology tests. All computations are conducted on a dedicated computer cluster and the users receive the results via URL or email. W-IQ-TREE is available at http://iqtree.cibiv.univie.ac.at. It is free and open to all users and there is no login requirement. PMID:27084950

  18. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis.

    PubMed

    Trifinopoulos, Jana; Nguyen, Lam-Tung; von Haeseler, Arndt; Minh, Bui Quang

    2016-07-01

    This article presents W-IQ-TREE, an intuitive and user-friendly web interface and server for IQ-TREE, an efficient phylogenetic software for maximum likelihood analysis. W-IQ-TREE supports multiple sequence types (DNA, protein, codon, binary and morphology) in common alignment formats and a wide range of evolutionary models including mixture and partition models. W-IQ-TREE performs fast model selection, partition scheme finding, efficient tree reconstruction, ultrafast bootstrapping, branch tests, and tree topology tests. All computations are conducted on a dedicated computer cluster and the users receive the results via URL or email. W-IQ-TREE is available at http://iqtree.cibiv.univie.ac.at It is free and open to all users and there is no login requirement.

  19. Phylogenetic tree construction based on 2D graphical representation

    NASA Astrophysics Data System (ADS)

    Liao, Bo; Shan, Xinzhou; Zhu, Wen; Li, Renfa

    2006-04-01

    A new approach based on the two-dimensional (2D) graphical representation of the whole genome sequence [Bo Liao, Chem. Phys. Lett., 401(2005) 196.] is proposed to analyze the phylogenetic relationships of genomes. The evolutionary distances are obtained through measuring the differences among the 2D curves. The fuzzy theory is used to construct phylogenetic tree. The phylogenetic relationships of H5N1 avian influenza virus illustrate the utility of our approach.

  20. Reconstruction of phylogenetic trees of prokaryotes using maximal common intervals.

    PubMed

    Heydari, Mahdi; Marashi, Sayed-Amir; Tusserkani, Ruzbeh; Sadeghi, Mehdi

    2014-10-01

    One of the fundamental problems in bioinformatics is phylogenetic tree reconstruction, which can be used for classifying living organisms into different taxonomic clades. The classical approach to this problem is based on a marker such as 16S ribosomal RNA. Since evolutionary events like genomic rearrangements are not included in reconstructions of phylogenetic trees based on single genes, much effort has been made to find other characteristics for phylogenetic reconstruction in recent years. With the increasing availability of completely sequenced genomes, gene order can be considered as a new solution for this problem. In the present work, we applied maximal common intervals (MCIs) in two or more genomes to infer their distance and to reconstruct their evolutionary relationship. Additionally, measures based on uncommon segments (UCS's), i.e., those genomic segments which are not detected as part of any of the MCIs, are also used for phylogenetic tree reconstruction. We applied these two types of measures for reconstructing the phylogenetic tree of 63 prokaryotes with known COG (clusters of orthologous groups) families. Similarity between the MCI-based (resp. UCS-based) reconstructed phylogenetic trees and the phylogenetic tree obtained from NCBI taxonomy browser is as high as 93.1% (resp. 94.9%). We show that in the case of this diverse dataset of prokaryotes, tree reconstruction based on MCI and UCS outperforms most of the currently available methods based on gene orders, including breakpoint distance and DCJ. We additionally tested our new measures on a dataset of 13 closely-related bacteria from the genus Prochlorococcus. In this case, distances like rearrangement distance, breakpoint distance and DCJ proved to be useful, while our new measures are still appropriate for phylogenetic reconstruction.

  1. FootPrinter3: phylogenetic footprinting in partially alignable sequences.

    PubMed

    Fang, Fei; Blanchette, Mathieu

    2006-07-01

    FootPrinter3 is a web server for predicting transcription factor binding sites by using phylogenetic footprinting. Until now, phylogenetic footprinting approaches have been based either on multiple alignment analysis (e.g. PhyloVista, PhastCons), or on motif-discovery algorithms (e.g. FootPrinter2). FootPrinter3 integrates these two approaches, making use of local multiple sequence alignment blocks when those are available and reliable, but also allowing finding motifs in unalignable regions. The result is a set of predictions that joins the advantages of alignment-based methods (good specificity) to those of motif-based methods (good sensitivity, even in the presence of highly diverged species). FootPrinter3 is thus a tool of choice to exploit the wealth of vertebrate genomes being sequenced, as it allows taking full advantage of the sequences of highly diverged species (e.g. chicken, zebrafish), as well as those of more closely related species (e.g. mammals). The FootPrinter3 web server is available at: http://www.mcb.mcgill.ca/~blanchem/FootPrinter3.

  2. Minimizing phylogenetic number to find good evolutionary trees

    SciTech Connect

    Goldberg, L.A.; Goldberg, P.W.; Phillips, C.A.; Sweedyk, E.; Warnow, T.

    1995-05-01

    Inferring phylogenetic trees is a fundamental problem in computational-biology. We present a new objective criterion, the phylogenetic number, for evaluating evolutionary trees for species defined by biomolecular sequences or other qualitative characters. The phylogenetic number of a tree T is the maximum number of times that any given character state arises in T. By contrast, the classical parsimony criterion measures the total number of times that different character states arise in T. We consider the following related problems: finding the tree with minimum phylogenetic number, and computing the phylogenetic number of a given topology in which only the leaves are labeled by species. When the number of states is bounded (as is the case for biomolecular sequence characters), we can solve the second problem in polynomial time. We can also compute a fixed-topology 2-phylogeny (when one exists) for an arbitrary number of states. This algorithm can be used to further distinguish trees that are equal under parsimony. We also consider a number of other related problems.

  3. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference

    PubMed Central

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-01-01

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  4. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    PubMed

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-01-01

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  5. Reconstruction of phyletic trees by global alignment of multiple metabolic networks

    PubMed Central

    2013-01-01

    Background In the last decade, a considerable amount of research has been devoted to investigating the phylogenetic properties of organisms from a systems-level perspective. Most studies have focused on the classification of organisms based on structural comparison and local alignment of metabolic pathways. In contrast, global alignment of multiple metabolic networks complements sequence-based phylogenetic analyses and provides more comprehensive information. Results We explored the phylogenetic relationships between microorganisms through global alignment of multiple metabolic networks. The proposed approach integrates sequence homology data with topological information of metabolic networks. In general, compared to recent studies, the resulting trees reflect the living style of organisms as well as classical taxa. Moreover, for phylogenetically closely related organisms, the classification results are consistent with specific metabolic characteristics, such as the light-harvesting systems, fermentation types, and sources of electrons in photosynthesis. Conclusions We demonstrate the usefulness of global alignment of multiple metabolic networks to infer phylogenetic relationships between species. In addition, our exhaustive analysis of microbial metabolic pathways reveals differences in metabolic features between phylogenetically closely related organisms. With the ongoing increase in the number of genomic sequences and metabolic annotations, the proposed approach will help identify phenotypic variations that may not be apparent based solely on sequence-based classification. PMID:23368411

  6. Phylogenetic classification and the universal tree.

    PubMed

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished. PMID:10381871

  7. New algorithms for reconstructing phylogenetic trees

    SciTech Connect

    Dress, A.

    1994-12-31

    Since the time of Linne, classification of living beings into subspecies, species, orders, families etc. has been an important task in biology. With the advent of molecular biology, many more data have become available which can be exploited for this purpose using comparative sequence analysis, while the sheer amount of these data stored presently in biomolecular data bases make automated classification procedures unavoidable. Consequently, many algorithms have been developed in the last 25 years to support this task. In the lecture, an amazingly successful polynomial algorithm for analysing all sorts of distance data derived from sequence analysis (or elsewhere) will be presented which simultaneously highlights phylogenetic similarity and similarity caused by convergent evolution. In addition to sketching the mathematics on which the algorithm is based and discussing its implementation (including some interesting computer graphics aspects), various proper biological examples will be presented which stretch from the analysis of data relating to the origin of life and the first bifurcations into the various {open_quote}kingdoms of life{close_quote} to the analysis of data relating to, say, the phylogenetic history of mammals or that of the AIDS or the Influenca virus family.

  8. Tree thinking cannot taken for granted: challenges for teaching phylogenetics

    PubMed Central

    2008-01-01

    Tree thinking is an integral part of modern evolutionary biology, and a necessary precondition for phylogenetics and comparative analyses. Tree thinking has during the 20th century largely replaced group thinking, developmental thinking and anthropocentricism in biology. Unfortunately, however, this does not imply that tree thinking can be taken for granted. The findings reported here indicate that tree thinking is very much an acquired ability which needs extensive training. I tested a sample of undergraduate and graduate students of biology by means of questionnaires. Not a single student was able to correctly interpret a simple tree drawing. Several other findings demonstrate that tree thinking is virtually absent in students unless they are explicitly taught how to read evolutionary trees. Possible causes and implications of this mental bias are discussed. It seems that biological textbooks can be an important source of confusion for students. While group and developmental thinking have disappeared from most textual representations of evolution, they have survived in the evolutionary tree drawings of many textbooks. It is quite common for students to encounter anthropocentric trees and even trees containing stem groups and paraphyla. While these biases originate from the unconscious philosophical assumptions made by authors, the findings suggest that presenting unbiased evolutionary trees in biological publications is not merely a philosophical virtue but has also clear practical implications. PMID:18247075

  9. Tree thinking cannot taken for granted: challenges for teaching phylogenetics.

    PubMed

    Sandvik, Hanno

    2008-03-01

    Tree thinking is an integral part of modern evolutionary biology, and a necessary precondition for phylogenetics and comparative analyses. Tree thinking has during the 20th century largely replaced group thinking, developmental thinking and anthropocentrism in biology. Unfortunately, however, this does not imply that tree thinking can be taken for granted. The findings reported here indicate that tree thinking is very much an acquired ability which needs extensive training. I tested a sample of undergraduate and graduate students of biology by means of questionnaires. Not a single student was able to correctly interpret a simple tree drawing. Several other findings demonstrate that tree thinking is virtually absent in students unless they are explicitly taught how to read evolutionary trees. Possible causes and implications of this mental bias are discussed. It seems that biological textbooks can be an important source of confusion for students. While group and developmental thinking have disappeared from most textual representations of evolution, they have survived in the evolutionary tree drawings of many textbooks. It is quite common for students to encounter anthropocentric trees and even trees containing stem groups and paraphyla. While these biases originate from the unconscious philosophical assumptions made by authors, the findings suggest that presenting unbiased evolutionary trees in biological publications is not merely a philosophical virtue but has also clear practical implications.

  10. Which Phylogenetic Networks are Merely Trees with Additional Arcs?

    PubMed Central

    Francis, Andrew R.; Steel, Mike

    2015-01-01

    A binary phylogenetic network may or may not be obtainable from a tree by the addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily tested criterion (based on “2-SAT”) that efficiently determines whether or not any given network can be realized in this way. Moreover, the proof provides a polynomial-time algorithm for finding one or more trees (when they exist) on which the network can be based. A number of interesting consequences are presented as corollaries; these lead to some further relevant questions and observations, which we outline in the conclusion. PMID:26070685

  11. Which Phylogenetic Networks are Merely Trees with Additional Arcs?

    PubMed

    Francis, Andrew R; Steel, Mike

    2015-09-01

    A binary phylogenetic network may or may not be obtainable from a tree by the addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily tested criterion (based on "2-SAT") that efficiently determines whether or not any given network can be realized in this way. Moreover, the proof provides a polynomial-time algorithm for finding one or more trees (when they exist) on which the network can be based. A number of interesting consequences are presented as corollaries; these lead to some further relevant questions and observations, which we outline in the conclusion.

  12. A new algorithm to construct phylogenetic networks from trees.

    PubMed

    Wang, J

    2014-03-06

    Developing appropriate methods for constructing phylogenetic networks from tree sets is an important problem, and much research is currently being undertaken in this area. BIMLR is an algorithm that constructs phylogenetic networks from tree sets. The algorithm can construct a much simpler network than other available methods. Here, we introduce an improved version of the BIMLR algorithm, QuickCass. QuickCass changes the selection strategy of the labels of leaves below the reticulate nodes, i.e., the nodes with an indegree of at least 2 in BIMLR. We show that QuickCass can construct simpler phylogenetic networks than BIMLR. Furthermore, we show that QuickCass is a polynomial-time algorithm when the output network that is constructed by QuickCass is binary.

  13. Reconstruction of phylogenetic trees using the ant colony optimization paradigm.

    PubMed

    Perretto, Mauricio; Lopes, Heitor Silvério

    2005-01-01

    We developed a new approach for the reconstruction of phylogenetic trees using ant colony optimization metaheuristics. A tree is constructed using a fully connected graph and the problem is approached similarly to the well-known traveling salesman problem. This methodology was used to develop an algorithm for constructing a phylogenetic tree using a pheromone matrix. Two data sets were tested with the algorithm: complete mitochondrial genomes from mammals and DNA sequences of the p53 gene from several eutherians. This new methodology was found to be superior to other well-known softwares, at least for this data set. These results are very promising and suggest more efforts for further developments. PMID:16342043

  14. Tree phylogenetic diversity promotes host-parasitoid interactions.

    PubMed

    Staab, Michael; Bruelheide, Helge; Durka, Walter; Michalski, Stefan; Purschke, Oliver; Zhu, Chao-Dong; Klein, Alexandra-Maria

    2016-07-13

    Evidence from grassland experiments suggests that a plant community's phylogenetic diversity (PD) is a strong predictor of ecosystem processes, even stronger than species richness per se This has, however, never been extended to species-rich forests and host-parasitoid interactions. We used cavity-nesting Hymenoptera and their parasitoids collected in a subtropical forest as a model system to test whether hosts, parasitoids, and their interactions are influenced by tree PD and a comprehensive set of environmental variables, including tree species richness. Parasitism rate and parasitoid abundance were positively correlated with tree PD. All variables describing parasitoids decreased with elevation, and were, except parasitism rate, dependent on host abundance. Quantitative descriptors of host-parasitoid networks were independent of the environment. Our study indicates that host-parasitoid interactions in species-rich forests are related to the PD of the tree community, which influences parasitism rates through parasitoid abundance. We show that effects of tree community PD are much stronger than effects of tree species richness, can cascade to high trophic levels, and promote trophic interactions. As during habitat modification phylogenetic information is usually lost non-randomly, even species-rich habitats may not be able to continuously provide the ecosystem process parasitism if the evolutionarily most distinct plant lineages vanish. PMID:27383815

  15. Tree phylogenetic diversity promotes host-parasitoid interactions.

    PubMed

    Staab, Michael; Bruelheide, Helge; Durka, Walter; Michalski, Stefan; Purschke, Oliver; Zhu, Chao-Dong; Klein, Alexandra-Maria

    2016-07-13

    Evidence from grassland experiments suggests that a plant community's phylogenetic diversity (PD) is a strong predictor of ecosystem processes, even stronger than species richness per se This has, however, never been extended to species-rich forests and host-parasitoid interactions. We used cavity-nesting Hymenoptera and their parasitoids collected in a subtropical forest as a model system to test whether hosts, parasitoids, and their interactions are influenced by tree PD and a comprehensive set of environmental variables, including tree species richness. Parasitism rate and parasitoid abundance were positively correlated with tree PD. All variables describing parasitoids decreased with elevation, and were, except parasitism rate, dependent on host abundance. Quantitative descriptors of host-parasitoid networks were independent of the environment. Our study indicates that host-parasitoid interactions in species-rich forests are related to the PD of the tree community, which influences parasitism rates through parasitoid abundance. We show that effects of tree community PD are much stronger than effects of tree species richness, can cascade to high trophic levels, and promote trophic interactions. As during habitat modification phylogenetic information is usually lost non-randomly, even species-rich habitats may not be able to continuously provide the ecosystem process parasitism if the evolutionarily most distinct plant lineages vanish.

  16. A metric for phylogenetic trees based on matching.

    PubMed

    Lin, Yu; Rajan, Vaibhav; Moret, Bernard M E

    2012-01-01

    Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes--reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance. PMID:22184263

  17. Walking tree heuristics for biological string alignment, gene location, and phylogenies

    NASA Astrophysics Data System (ADS)

    Cull, P.; Holloway, J. L.; Cavener, J. D.

    1999-03-01

    Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids (proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and aligning strings brings out their shared characteristics. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model's assumptions. We propose a family of heuristics called walking trees to align biologically reasonable strings. Both edit-distance and walking tree methods can locate specific genes within a large string when the genes' sequences are given. When we attempt to match whole strings, the walking tree matches most genes, while the edit-distance method fails. We also give examples in which the walking tree matches substrings even if they have been moved or inverted. The edit-distance method was not designed to handle these problems. We include an example in which the walking tree "discovered" a gene. Calculating scores for whole genome matches gives a method for approximating evolutionary distance. We show two evolutionary trees for the picornaviruses which were computed by the walking tree heuristic. Both of these trees show great similarity to previously constructed trees. The point of this demonstration is that WHOLE genomes can be matched and distances calculated. The first tree was created on a Sequent parallel computer and demonstrates that the walking tree heuristic can be efficiently parallelized. The second tree was created using a network of work stations and demonstrates that there is suffient parallelism in the phylogenetic tree calculation that the sequential walking tree can be used effectively on a network.

  18. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

    PubMed Central

    2010-01-01

    Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. PMID:21034504

  19. Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. Using the relative complexity measure

    PubMed Central

    2013-01-01

    Background Most phylogeny analysis methods based on molecular sequences use multiple alignment where the quality of the alignment, which is dependent on the alignment parameters, determines the accuracy of the resulting trees. Different parameter combinations chosen for the multiple alignment may result in different phylogenies. A new non-alignment based approach, Relative Complexity Measure (RCM), has been introduced to tackle this problem and proven to work in fungi and mitochondrial DNA. Result In this work, we present an application of the RCM method to reconstruct robust phylogenetic trees using sequence data for genus Galanthus obtained from different regions in Turkey. Phylogenies have been analyzed using nuclear and chloroplast DNA sequences. Results showed that, the tree obtained from nuclear ribosomal RNA gene sequences was more robust, while the tree obtained from the chloroplast DNA showed a higher degree of variation. Conclusions Phylogenies generated by Relative Complexity Measure were found to be robust and results of RCM were more reliable than the compared techniques. Particularly, to overcome MSA-based problems, RCM seems to be a reasonable way and a good alternative to MSA-based phylogenetic analysis. We believe our method will become a mainstream phylogeny construction method especially for the highly variable sequence families where the accuracy of the MSA heavily depends on the alignment parameters. PMID:23323678

  20. Reliable Phylogenetic Trees Building: A New Web Interface for FIGENIX.

    PubMed

    Paganini, Julien; Gouret, Philippe

    2012-01-01

    The community needed a reliable and user friendly tool to quickly produce robust phylogenetic trees which are crucial in evolutionary studies and genomes' functional annotation. FIGENIX is software dedicated to this and was published in 2005. Several laboratories around the world use it in their research, but it was difficult to use for non-expert users, thus we developed a new graphical user interface for the benefit of all biologists.

  1. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy.

    PubMed

    Letunic, Ivica; Bork, Peer

    2011-07-01

    Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. In addition to classical tree viewer functions, iTOL offers many novel ways of annotating trees with various additional data. Current version introduces numerous new features and greatly expands the number of supported data set types. Trees can be interactively manipulated and edited. A free personal account system is available, providing management and sharing of trees in user defined workspaces and projects. Export to various bitmap and vector graphics formats is supported. Batch access interface is available for programmatic access or inclusion of interactive trees into other web services.

  2. Phylometrics: a pipeline for inferring phylogenetic trees from a sequence relationship network perspective

    PubMed Central

    2010-01-01

    Background Comparative sequence analysis of the 16S rRNA gene is frequently used to characterize the microbial diversity of environmental samples. However, sequence similarities do not always imply functional or evolutionary relatedness due to many factors, including unequal rates of change and convergence. Thus, relying on top BLASTN hits for phylogenetic studies may misrepresent the diversity of these constituents. Furthermore, attempts to circumvent this issue by including a large number of BLASTN hits per sequence in one tree to explore their relatedness presents other problems. For instance, the multiple sequence alignment will be poor and computationally costly if not relying on manual alignment, and it may be difficult to derive meaningful relationships from the resulting tree. Analyzing sequence relationship networks within collective BLASTN results, however, reveal sequences that are closely related despite low rank. Results We have developed a web application, Phylometrics, that relies on networks of collective BLASTN results (rather than single BLASTN hits) to facilitate the process of building phylogenetic trees in an automated, high-throughput fashion while offering novel tools to find sequences that are of significant phylogenetic interest with minimal human involvement. The application, which can be installed locally in a laboratory or hosted remotely, utilizes a simple wizard-style format to guide the user through the pipeline without necessitating a background in programming. Furthermore, Phylometrics implements an independent job queuing system that enables users to continue to use the system while jobs are run with little or no degradation in performance. Conclusions Phylometrics provides a novel data mining method to screen supplied DNA sequences and to identify sequences that are of significant phylogenetic interest using powerful analytical tools. Sequences that are identified as being similar to a number of supplied sequences may provide key

  3. A web-based Tree View (TV) program for the visualization of phylogenetic trees.

    PubMed

    Zhai, Yufeng; Tchieu, Jason; Saier, Milton H

    2002-01-01

    We designed a web-based program, Tree View (TV), which uses a dynamic data structure algorithm to draw the phylogenetic tree for a family of homologous proteins. This program has a user-friendly interface and can be easily implemented into other programs for convenient protein sequence analysis. It is available at our web site: http://www.biology.ucsd.edu-yzhai/biotools.html.

  4. Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction.

    PubMed

    Yang, Kuan; Zhang, Liqing

    2008-03-01

    Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast 'guide tree' to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes-Cantor, Kimura, F84 and Tamura-Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences. PMID:18296485

  5. Mesoamerican tree squirrels evolution (Rodentia: Sciuridae): a molecular phylogenetic analysis.

    PubMed

    Villalobos, Federico; Gutierrez-Espeleta, Gustavo

    2014-06-01

    The tribe Sciurini comprehends the genera Sciurus, Syntheosiurus, Microsciurus, Tamiasciurus and Rheinthrosciurus. The phylogenetic relationships within Sciurus have been only partially done, and the relationship between Mesoamerican species remains unsolved. The phylogenetic relationships of the Mesoamerican tree squirrels were examined using molecular data. Sequence data publicly available (12S, 16S, CYTB mitochondrial genes and IRBP nuclear gene) and cytochrome B gene sequences of four previously not sampled Mesoamerican Sciurus species were analyzed under a Bayesian multispecies coalescence model. Phylogenetic analysis of the multilocus data set showed the neotropical tree squirrels as a monophyletic clade. The genus Sciurus was paraphyletic due to the inclusion of Microsciurus species (M. alfari and M. flaviventer). The South American species S. aestuans and S. stramineus showed a sister taxa relationship. Single locus analysis based on the most compact and complete data set (i.e. CYTB gene sequences), supported the monophyly of the South American species and recovered a Mesoamerican clade including S. aureogaster, S. granatensis and S. variegatoides. These results corroborated previous findings based on cladistic analysis of cranial and post-cranial characters. Our data support a close relationship between Mesoamerican Sciurus species and a sister relationship with South American species, and corroborates previous findings in relation to the polyphyly of Microsciurus and Syntheosciurus paraphyly.

  6. Improved Phylogenetic Analyses Corroborate a Plausible Position of Martialis heureka in the Ant Tree of Life

    PubMed Central

    Kück, Patrick; Hita Garcia, Francisco; Misof, Bernhard; Meusemann, Karen

    2011-01-01

    Martialinae are pale, eyeless and probably hypogaeic predatory ants. Morphological character sets suggest a close relationship to the ant subfamily Leptanillinae. Recent analyses based on molecular sequence data suggest that Martialinae are the sister group to all extant ants. However, by comparing molecular studies and different reconstruction methods, the position of Martialinae remains ambiguous. While this sister group relationship was well supported by Bayesian partitioned analyses, Maximum Likelihood approaches could not unequivocally resolve the position of Martialinae. By re-analysing a previous published molecular data set, we show that the Maximum Likelihood approach is highly appropriate to resolve deep ant relationships, especially between Leptanillinae, Martialinae and the remaining ant subfamilies. Based on improved alignments, alignment masking, and tree reconstructions with a sufficient number of bootstrap replicates, our results strongly reject a placement of Martialinae at the first split within the ant tree of life. Instead, we suggest that Leptanillinae are a sister group to all other extant ant subfamilies, whereas Martialinae branch off as a second lineage. This assumption is backed by approximately unbiased (AU) tests, additional Bayesian analyses and split networks. Our results demonstrate clear effects of improved alignment approaches, alignment masking and data partitioning. We hope that our study illustrates the importance of thorough, comprehensible phylogenetic analyses using the example of ant relationships. PMID:21731644

  7. Improved phylogenetic analyses corroborate a plausible position of Martialis heureka in the ant tree of life.

    PubMed

    Kück, Patrick; Hita Garcia, Francisco; Misof, Bernhard; Meusemann, Karen

    2011-01-01

    Martialinae are pale, eyeless and probably hypogaeic predatory ants. Morphological character sets suggest a close relationship to the ant subfamily Leptanillinae. Recent analyses based on molecular sequence data suggest that Martialinae are the sister group to all extant ants. However, by comparing molecular studies and different reconstruction methods, the position of Martialinae remains ambiguous. While this sister group relationship was well supported by Bayesian partitioned analyses, Maximum Likelihood approaches could not unequivocally resolve the position of Martialinae. By re-analysing a previous published molecular data set, we show that the Maximum Likelihood approach is highly appropriate to resolve deep ant relationships, especially between Leptanillinae, Martialinae and the remaining ant subfamilies. Based on improved alignments, alignment masking, and tree reconstructions with a sufficient number of bootstrap replicates, our results strongly reject a placement of Martialinae at the first split within the ant tree of life. Instead, we suggest that Leptanillinae are a sister group to all other extant ant subfamilies, whereas Martialinae branch off as a second lineage. This assumption is backed by approximately unbiased (AU) tests, additional Bayesian analyses and split networks. Our results demonstrate clear effects of improved alignment approaches, alignment masking and data partitioning. We hope that our study illustrates the importance of thorough, comprehensible phylogenetic analyses using the example of ant relationships.

  8. LifePrint: a novel k-tuple distance method for construction of phylogenetic trees

    PubMed Central

    Reyes-Prieto, Fabián; García-Chéquer, Adda J; Jaimes-Díaz, Hueman; Casique-Almazán, Janet; Espinosa-Lara, Juana M; Palma-Orozco, Rosaura; Méndez-Tenorio, Alfonso; Maldonado-Rodríguez, Rogelio; Beattie, Kenneth L

    2011-01-01

    Purpose Here we describe LifePrint, a sequence alignment-independent k-tuple distance method to estimate relatedness between complete genomes. Methods We designed a representative sample of all possible DNA tuples of length 9 (9-tuples). The final sample comprises 1878 tuples (called the LifePrint set of 9-tuples; LPS9) that are distinct from each other by at least two internal and noncontiguous nucleotide differences. For validation of our k-tuple distance method, we analyzed several real and simulated viroid genomes. Using different distance metrics, we scrutinized diverse viroid genomes to estimate the k-tuple distances between these genomic sequences. Then we used the estimated genomic k-tuple distances to construct phylogenetic trees using the neighbor-joining algorithm. A comparison of the accuracy of LPS9 and the previously reported 5-tuple method was made using symmetric differences between the trees estimated from each method and a simulated “true” phylogenetic tree. Results The identified optimal search scheme for LPS9 allows only up to two nucleotide differences between each 9-tuple and the scrutinized genome. Similarity search results of simulated viroid genomes indicate that, in most cases, LPS9 is able to detect single-base substitutions between genomes efficiently. Analysis of simulated genomic variants with a high proportion of base substitutions indicates that LPS9 is able to discern relationships between genomic variants with up to 40% of nucleotide substitution. Conclusion Our LPS9 method generates more accurate phylogenetic reconstructions than the previously proposed 5-tuples strategy. LPS9-reconstructed trees show higher bootstrap proportion values than distance trees derived from the 5-tuple method. PMID:21918634

  9. Why abundant tropical tree species are phylogenetically old

    PubMed Central

    Wang, Shaopeng; Chen, Anping; Fang, Jingyun; Pacala, Stephen W.

    2013-01-01

    Neutral models of species diversity predict patterns of abundance for communities in which all individuals are ecologically equivalent. These models were originally developed for Panamanian trees and successfully reproduce observed distributions of abundance. Neutral models also make macroevolutionary predictions that have rarely been evaluated or tested. Here we show that neutral models predict a humped or flat relationship between species age and population size. In contrast, ages and abundances of tree species in the Panamanian Canal watershed are found to be positively correlated, which falsifies the models. Speciation rates vary among phylogenetic lineages and are partially heritable from mother to daughter species. Variable speciation rates in an otherwise neutral model lead to a demographic advantage for species with low speciation rate. This demographic advantage results in a positive correlation between species age and abundance, as found in the Panamanian tropical forest community. PMID:24043767

  10. Why abundant tropical tree species are phylogenetically old.

    PubMed

    Wang, Shaopeng; Chen, Anping; Fang, Jingyun; Pacala, Stephen W

    2013-10-01

    Neutral models of species diversity predict patterns of abundance for communities in which all individuals are ecologically equivalent. These models were originally developed for Panamanian trees and successfully reproduce observed distributions of abundance. Neutral models also make macroevolutionary predictions that have rarely been evaluated or tested. Here we show that neutral models predict a humped or flat relationship between species age and population size. In contrast, ages and abundances of tree species in the Panamanian Canal watershed are found to be positively correlated, which falsifies the models. Speciation rates vary among phylogenetic lineages and are partially heritable from mother to daughter species. Variable speciation rates in an otherwise neutral model lead to a demographic advantage for species with low speciation rate. This demographic advantage results in a positive correlation between species age and abundance, as found in the Panamanian tropical forest community. PMID:24043767

  11. Auto-validating von Neumann rejection sampling from small phylogenetic tree spaces

    PubMed Central

    2009-01-01

    Background In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces. Results The posterior samples from the auto-validating sampler are used to rigorously (i) estimate posterior probabilities for different rooted topologies based on mitochondrial DNA from human, chimpanzee and gorilla, (ii) conduct a non-parametric test of rate variation between protein-coding and tRNA-coding sites from three primates and (iii) obtain a posterior estimate of the human-neanderthal divergence time. Conclusion This solves the open problem of rigorously drawing independent and identically distributed samples from the posterior distribution over rooted and unrooted small tree spaces (3 or 4 taxa) based on any multiply-aligned sequence data. PMID:19128477

  12. Phylogenetic Tree Reconstruction Accuracy and Model Fit when Proportions of Variable Sites Change across the Tree

    PubMed Central

    Grievink, Liat Shavit; Penny, David; Hendy, Michael D.; Holland, Barbara R.

    2010-01-01

    Commonly used phylogenetic models assume a homogeneous process through time in all parts of the tree. However, it is known that these models can be too simplistic as they do not account for nonhomogeneous lineage-specific properties. In particular, it is now widely recognized that as constraints on sequences evolve, the proportion and positions of variable sites can vary between lineages causing heterotachy. The extent to which this model misspecification affects tree reconstruction is still unknown. Here, we evaluate the effect of changes in the proportions and positions of variable sites on model fit and tree estimation. We consider 5 current models of nucleotide sequence evolution in a Bayesian Markov chain Monte Carlo framework as well as maximum parsimony (MP). We show that for a tree with 4 lineages where 2 nonsister taxa undergo a change in the proportion of variable sites tree reconstruction under the best-fitting model, which is chosen using a relative test, often results in the wrong tree. In this case, we found that an absolute test of model fit is a better predictor of tree estimation accuracy. We also found further evidence that MP is not immune to heterotachy. In addition, we show that increased sampling of taxa that have undergone a change in proportion and positions of variable sites is critical for accurate tree reconstruction. PMID:20525636

  13. How Ecology and Landscape Dynamics Shape Phylogenetic Trees.

    PubMed

    Gascuel, Fanny; Ferrière, Régis; Aguilée, Robin; Lambert, Amaury

    2015-07-01

    Whether biotic or abiotic factors are the dominant drivers of clade diversification is a long-standing question in evolutionary biology. The ubiquitous patterns of phylogenetic imbalance and branching slowdown have been taken as supporting the role of ecological niche filling and spatial heterogeneity in ecological features, and thus of biotic processes, in diversification. However, a proper theoretical assessment of the relative roles of biotic and abiotic factors in macroevolution requires models that integrate both types of factors, and such models have been lacking. In this study, we use an individual-based model to investigate the temporal patterns of diversification driven by ecological speciation in a stochastically fluctuating geographic landscape. The model generates phylogenies whose shape evolves as the clade ages. Stabilization of tree shape often occurs after ecological saturation, revealing species turnover caused by competition and demographic stochasticity. In the initial phase of diversification (allopatric radiation into an empty landscape), trees tend to be unbalanced and branching slows down. As diversification proceeds further due to landscape dynamics, balance and branching tempo may increase and become positive. Three main conclusions follow. First, the phylogenies of ecologically saturated clades do not always exhibit branching slowdown. Branching slowdown requires that competition be wide or heterogeneous across the landscape, or that the characteristics of landscape dynamics vary geographically. Conversely, branching acceleration is predicted under narrow competition or frequent local catastrophes. Second, ecological heterogeneity does not necessarily cause phylogenies to be unbalanced--short time in geographical isolation or frequent local catastrophes may lead to balanced trees despite spatial heterogeneity. Conversely, unbalanced trees can emerge without spatial heterogeneity, notably if competition is wide. Third, short isolation time

  14. Characterization of a branch of the phylogenetic tree

    SciTech Connect

    Samuel, Stuart A.; Weng, Gezhi

    2003-04-11

    We use a combination of analytic models and computer simulations to gain insight into the dynamics of evolution. Our results suggest that certain interesting phenomena should eventually emerge from the fossil record. For example, there should be a ''tortoise and hare effect'': Those genera with the smallest species death rate are likely to survive much longer than genera with large species birth and death rates. A complete characterization of the behavior of a branch of the phylogenetic tree corresponding to a genus and accurate mathematical representations of the various stages are obtained. We apply our results to address certain controversial issues that have arisen in paleontology such as the importance of punctuated equilibrium and whether unique Cambrian phyla have survived to the present.

  15. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses

    PubMed Central

    Capella-Gutiérrez, Salvador; Silla-Martínez, José M.; Gabaldón, Toni

    2009-01-01

    Summary: Multiple sequence alignments are central to many areas of bioinformatics. It has been shown that the removal of poorly aligned regions from an alignment increases the quality of subsequent analyses. Such an alignment trimming phase is complicated in large-scale phylogenetic analyses that deal with thousands of alignments. Here, we present trimAl, a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses. trimAl can consider several parameters, alone or in multiple combinations, for selecting the most reliable positions in the alignment. These include the proportion of sequences with a gap, the level of amino acid similarity and, if several alignments for the same set of sequences are provided, the level of consistency across different alignments. Moreover, trimAl can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized. Availability: trimAl has been written in C++, it is portable to all platforms. trimAl is freely available for download (http://trimal.cgenomics.org) and can be used online through the Phylemon web server (http://phylemon2.bioinfo.cipf.es/). Supplementary Material is available at http://trimal.cgenomics.org/publications. Contact: tgabaldon@crg.es PMID:19505945

  16. Phylogenetics.

    PubMed

    Sleator, Roy D

    2011-04-01

    The recent rapid expansion in the DNA and protein databases, arising from large-scale genomic and metagenomic sequence projects, has forced significant development in the field of phylogenetics: the study of the evolutionary relatedness of the planet's inhabitants. Advances in phylogenetic analysis have greatly transformed our view of the landscape of evolutionary biology, transcending the view of the tree of life that has shaped evolutionary theory since Darwinian times. Indeed, modern phylogenetic analysis no longer focuses on the restricted Darwinian-Mendelian model of vertical gene transfer, but must also consider the significant degree of lateral gene transfer, which connects and shapes almost all living things. Herein, I review the major tree-building methods, their strengths, weaknesses and future prospects. PMID:21249334

  17. A Model of Desired Performance in Phylogenetic Tree Construction for Teaching Evolution.

    ERIC Educational Resources Information Center

    Brewer, Steven D.

    This research paper examines phylogenetic tree construction-a form of problem solving in biology-by studying the strategies and heuristics used by experts. One result of the research is the development of a model of desired performance for phylogenetic tree construction. A detailed description of the model and the sample problems which illustrate…

  18. Unrooted unordered homeomorphic subtree alignment of RNA trees.

    PubMed

    Milo, Nimrod; Zakov, Shay; Katzenelson, Erez; Bachmat, Eitan; Dinitz, Yefim; Ziv-Ukelson, Michal

    2013-01-01

    : We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(nTnS + min(dT,dS)LTLS) time complexity, where nT,LT and dT are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying dT ≤ LT ≤ nT), and similarly for nS,LS and dS with respect to the tree S. This improves the time complexity of previous algorithms for less general variants of the problem.In order to obtain this time bound for HSA, we developed new algorithms for a generalized variant of the Min-Cost Bipartite Matching problem (MCM), as well as to two derivatives of this problem, entitled All-Cavity-MCM and All-Pairs-Cavity-MCM. For two input sets of size n and m, where n ≤ m, MCM and both its cavity derivatives are solved in O(n3 + nm) time, without the usage of priority queues (e.g. Fibonacci heaps) or other complex data structures. This gives the first cubic time algorithm for All-Pairs-Cavity-MCM, and improves the running times of MCM and All-Cavity-MCM problems in the unbalanced case where n ≪ m.We implemented the algorithm (in all modes mentioned above) as a graphical software tool which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran all-against-all inter-family pairwise alignments of RNAse P and Hammerhead RNA

  19. Species Divergence and Phylogenetic Variation of Ecophysiological Traits in Lianas and Trees

    PubMed Central

    Rios, Rodrigo S.; Salgado-Luarte, Cristian; Gianoli, Ernesto

    2014-01-01

    The climbing habit is an evolutionary key innovation in plants because it is associated with enhanced clade diversification. We tested whether patterns of species divergence and variation of three ecophysiological traits that are fundamental for plant adaptation to light environments (maximum photosynthetic rate [Amax], dark respiration rate [Rd], and specific leaf area [SLA]) are consistent with this key innovation. Using data reported from four tropical forests and three temperate forests, we compared phylogenetic distance among species as well as the evolutionary rate, phylogenetic distance and phylogenetic signal of those traits in lianas and trees. Estimates of evolutionary rates showed that Rd evolved faster in lianas, while SLA evolved faster in trees. The mean phylogenetic distance was 1.2 times greater among liana species than among tree species. Likewise, estimates of phylogenetic distance indicated that lianas were less related than by chance alone (phylogenetic evenness across 63 species), and trees were more related than expected by chance (phylogenetic clustering across 71 species). Lianas showed evenness for Rd, while trees showed phylogenetic clustering for this trait. In contrast, for SLA, lianas exhibited phylogenetic clustering and trees showed phylogenetic evenness. Lianas and trees showed patterns of ecophysiological trait variation among species that were independent of phylogenetic relatedness. We found support for the expected pattern of greater species divergence in lianas, but did not find consistent patterns regarding ecophysiological trait evolution and divergence. Rd followed the species-level pattern, i.e., greater divergence/evolution in lianas compared to trees, while the opposite occurred for SLA and no pattern was detected for Amax. Rd may have driven lianas' divergence across forest environments, and might contribute to diversification in climber clades. PMID:24914958

  20. Species divergence and phylogenetic variation of ecophysiological traits in lianas and trees.

    PubMed

    Rios, Rodrigo S; Salgado-Luarte, Cristian; Gianoli, Ernesto

    2014-01-01

    The climbing habit is an evolutionary key innovation in plants because it is associated with enhanced clade diversification. We tested whether patterns of species divergence and variation of three ecophysiological traits that are fundamental for plant adaptation to light environments (maximum photosynthetic rate [A(max)], dark respiration rate [R(d)], and specific leaf area [SLA]) are consistent with this key innovation. Using data reported from four tropical forests and three temperate forests, we compared phylogenetic distance among species as well as the evolutionary rate, phylogenetic distance and phylogenetic signal of those traits in lianas and trees. Estimates of evolutionary rates showed that R(d) evolved faster in lianas, while SLA evolved faster in trees. The mean phylogenetic distance was 1.2 times greater among liana species than among tree species. Likewise, estimates of phylogenetic distance indicated that lianas were less related than by chance alone (phylogenetic evenness across 63 species), and trees were more related than expected by chance (phylogenetic clustering across 71 species). Lianas showed evenness for R(d), while trees showed phylogenetic clustering for this trait. In contrast, for SLA, lianas exhibited phylogenetic clustering and trees showed phylogenetic evenness. Lianas and trees showed patterns of ecophysiological trait variation among species that were independent of phylogenetic relatedness. We found support for the expected pattern of greater species divergence in lianas, but did not find consistent patterns regarding ecophysiological trait evolution and divergence. R(d) followed the species-level pattern, i.e., greater divergence/evolution in lianas compared to trees, while the opposite occurred for SLA and no pattern was detected for A(max). R(d) may have driven lianas' divergence across forest environments, and might contribute to diversification in climber clades.

  1. One Tree to Link Them All: A Phylogenetic Dataset for the European Tetrapoda

    PubMed Central

    Roquet, Cristina; Lavergne, Sébastien; Thuiller, Wilfried

    2014-01-01

    Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty. PMID:25685620

  2. Community Phylogenetics: Assessing Tree Reconstruction Methods and the Utility of DNA Barcodes

    PubMed Central

    Boyle, Elizabeth E.; Adamowicz, Sarah J.

    2015-01-01

    Studies examining phylogenetic community structure have become increasingly prevalent, yet little attention has been given to the influence of the input phylogeny on metrics that describe phylogenetic patterns of co-occurrence. Here, we examine the influence of branch length, tree reconstruction method, and amount of sequence data on measures of phylogenetic community structure, as well as the phylogenetic signal (Pagel’s λ) in morphological traits, using Trichoptera larval communities from Churchill, Manitoba, Canada. We find that model-based tree reconstruction methods and the use of a backbone family-level phylogeny improve estimations of phylogenetic community structure. In addition, trees built using the barcode region of cytochrome c oxidase subunit I (COI) alone accurately predict metrics of phylogenetic community structure obtained from a multi-gene phylogeny. Input tree did not alter overall conclusions drawn for phylogenetic signal, as significant phylogenetic structure was detected in two body size traits across input trees. As the discipline of community phylogenetics continues to expand, it is important to investigate the best approaches to accurately estimate patterns. Our results suggest that emerging large datasets of DNA barcode sequences provide a vast resource for studying the structure of biological communities. PMID:26110886

  3. T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks.

    PubMed

    Boc, Alix; Diallo, Alpha Boubacar; Makarenkov, Vladimir

    2012-07-01

    T-REX (Tree and reticulogram REConstruction) is a web server dedicated to the reconstruction of phylogenetic trees, reticulation networks and to the inference of horizontal gene transfer (HGT) events. T-REX includes several popular bioinformatics applications such as MUSCLE, MAFFT, Neighbor Joining, NINJA, BioNJ, PhyML, RAxML, random phylogenetic tree generator and some well-known sequence-to-distance transformation models. It also comprises fast and effective methods for inferring phylogenetic trees from complete and incomplete distance matrices as well as for reconstructing reticulograms and HGT networks, including the detection and validation of complete and partial gene transfers, inference of consensus HGT scenarios and interactive HGT identification, developed by the authors. The included methods allows for validating and visualizing phylogenetic trees and networks which can be built from distance or sequence data. The web server is available at: www.trex.uqam.ca.

  4. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.

    PubMed

    Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. PMID:26140928

  5. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

    PubMed Central

    Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928

  6. PhyloExplorer: a web server to validate, explore and query phylogenetic trees

    PubMed Central

    Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent

    2009-01-01

    Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253

  7. Edge-Related Loss of Tree Phylogenetic Diversity in the Severely Fragmented Brazilian Atlantic Forest

    PubMed Central

    Santos, Bráulio A.; Arroyo-Rodríguez, Víctor; Moreno, Claudia E.; Tabarelli, Marcelo

    2010-01-01

    Deforestation and forest fragmentation are known major causes of nonrandom extinction, but there is no information about their impact on the phylogenetic diversity of the remaining species assemblages. Using a large vegetation dataset from an old hyper-fragmented landscape in the Brazilian Atlantic rainforest we assess whether the local extirpation of tree species and functional impoverishment of tree assemblages reduce the phylogenetic diversity of the remaining tree assemblages. We detected a significant loss of tree phylogenetic diversity in forest edges, but not in core areas of small (<80 ha) forest fragments. This was attributed to a reduction of 11% in the average phylogenetic distance between any two randomly chosen individuals from forest edges; an increase of 17% in the average phylogenetic distance to closest non-conspecific relative for each individual in forest edges; and to the potential manifestation of late edge effects in the core areas of small forest remnants. We found no evidence supporting fragmentation-induced phylogenetic clustering or evenness. This could be explained by the low phylogenetic conservatism of key life-history traits corresponding to vulnerable species. Edge effects must be reduced to effectively protect tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest. PMID:20838613

  8. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  9. morePhyML: improving the phylogenetic tree space exploration with PhyML 3.

    PubMed

    Criscuolo, Alexis

    2011-12-01

    PhyML is a widely used Maximum Likelihood (ML) phylogenetic tree inference software based on a standard hill-climbing method. Starting from an initial tree, the version 3 of PhyML explores the tree space by using "Nearest Neighbor Interchange" (NNI) or "Subtree Pruning and Regrafting" (SPR) tree swapping techniques in order to find the ML phylogenetic tree. NNI-based local searches are fast but can often get trapped in local optima, whereas it is expected that the larger (but slower to cover) SPR-based neighborhoods will lead to trees with higher likelihood. Here, I verify that PhyML infers more likely trees with SPRs than with NNIs in almost all cases. However, I also show that the SPR-based local search of PhyML often does not succeed at locating the ML tree. To improve the tree space exploration, I deliver a script, named morePhyML, which allows escaping from local optima by performing character reweighting. This ML tree search strategy, named ratchet, often leads to higher likelihood estimates. Based on the analysis of a large number of amino acid and nucleotide data, I show that morePhyML allows inferring more accurate phylogenetic trees than several other recently developed ML tree inference softwares in many cases.

  10. [A bird's eye view of the algorithms and software packages for reconstructing phylogenetic trees].

    PubMed

    Zhang, Li-Na; Rong, Chang-He; He, Yuan; Guan, Qiong; He, Bin; Zhu, Xing-Wen; Liu, Jia-Ni; Chen, Hong-Ju

    2013-12-01

    The prototype phylogenetic tree, i.e., evolutionary "tree" or "tree of life", was first conceived by Charles Darwin in his seminal book "The Origin of Species", and its reconstructions have been approached by generations of biologists ever since. In this article, we briefly reviewed the major algorithms and software packages for reconstructing phylogenetic trees. Specifically we discuss four categories of phylogeny algorithms including distance-matrix, maximum parsimony, maximum likelihood, and Bayesian framework, as well as software packages (PHYLIP, MEGA, MrBayes) based on them. PMID:24415699

  11. [A bird's eye view of the algorithms and software packages for reconstructing phylogenetic trees].

    PubMed

    Zhang, Li-Na; Rong, Chang-He; He, Yuan; Guan, Qiong; He, Bin; Zhu, Xing-Wen; Liu, Jia-Ni; Chen, Hong-Ju

    2013-12-01

    The prototype phylogenetic tree, i.e., evolutionary "tree" or "tree of life", was first conceived by Charles Darwin in his seminal book "The Origin of Species", and its reconstructions have been approached by generations of biologists ever since. In this article, we briefly reviewed the major algorithms and software packages for reconstructing phylogenetic trees. Specifically we discuss four categories of phylogeny algorithms including distance-matrix, maximum parsimony, maximum likelihood, and Bayesian framework, as well as software packages (PHYLIP, MEGA, MrBayes) based on them.

  12. PTreeRec: Phylogenetic Tree Reconstruction based on genome BLAST distance.

    PubMed

    Deng, Riqiang; Huang, Mingsong; Wang, Jinwen; Huang, Yuansen; Yang, Jie; Feng, Jinghua; Wang, Xunzhang

    2006-08-01

    Phylogenetic Tree Reconstruction (PTreeRec) is a web-based tool for automatic phylogeny inferences from whole-genome sequences, which accepts files of DNA sequences in the FASTA format and allows users to save the output tree file, and displays the inferred tree through an applet in a web browser. PTreeRec involves three basic steps. First, regions of maximal segment pairs (MSPs) based on an all-against-all pairwise comparison of genomes are located. Second, a distance matrix is calculated from MSP scores or coverage. Finally, a phylogenetic tree is reconstructed by the neighbor-joining method.

  13. Reverse transcriptase domain sequences from tree peony (Paeonia suffruticosa) long terminal repeat retrotransposons: sequence characterization and phylogenetic analysis

    PubMed Central

    Guo, Da-Long; Hou, Xiao-Gai; Jia, Tian

    2014-01-01

    Tree peony is an important horticultural plant worldwide of great ornamental and medicinal value. Long terminal repeat retrotransposons (LTR-retrotransposons) are the major components of most plant genomes and can substantially impact the genome in many ways. It is therefore crucial to understand their sequence characteristics, genetic distribution and transcriptional activity; however, no information about them is available in tree peony. Ty1-copia-like reverse transcriptase sequences were amplified from tree peony genomic DNA by polymerase chain reaction (PCR) with degenerate oligonucleotide primers corresponding to highly conserved domains of the Ty1-copia-like retrotransposons in this study. PCR fragments of roughly 270 bp were isolated and cloned, and 33 sequences were obtained. According to alignment and phylogenetic analysis, all sequences were divided into six families. The observed difference in the degree of nucleotide sequence similarity is an indication for high level of sequence heterogeneity among these clones. Most of these sequences have a frame shift, a stop codon, or both. Dot-blot analysis revealed distribution of these sequences in all the studied tree peony species. However, different hybridization signals were detected among them, which is in agreement with previous systematics studies. Reverse transcriptase PCR (RT-PCR) indicated that Ty1-copia retrotransposons in tree peony were transcriptionally inactive. The results provide basic genetic and evolutionary information of tree peony genome, and will provide valuable information for the further utilization of retrotransposons in tree peony. PMID:26019529

  14. PhySortR: a fast, flexible tool for sorting phylogenetic trees in R

    PubMed Central

    Stephens, Timothy G.; Bhattacharya, Debashish; Ragan, Mark A.

    2016-01-01

    A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, particularly robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for classifying phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa based on tip labels (i.e., leaves) on a tree, with customisable options to assess clades within the context of the whole tree. Using simulated and empirical datasets, we demonstrate the potential and scalability of PhySortR in analysis of thousands of phylogenetic trees without a priori assumption of tree-rooting, and in yielding readily interpretable trees that unambiguously satisfy the query. PhySortR is a command-line tool that is freely available and easily automatable. PMID:27190724

  15. PhySortR: a fast, flexible tool for sorting phylogenetic trees in R.

    PubMed

    Stephens, Timothy G; Bhattacharya, Debashish; Ragan, Mark A; Chan, Cheong Xin

    2016-01-01

    A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, particularly robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for classifying phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa based on tip labels (i.e., leaves) on a tree, with customisable options to assess clades within the context of the whole tree. Using simulated and empirical datasets, we demonstrate the potential and scalability of PhySortR in analysis of thousands of phylogenetic trees without a priori assumption of tree-rooting, and in yielding readily interpretable trees that unambiguously satisfy the query. PhySortR is a command-line tool that is freely available and easily automatable. PMID:27190724

  16. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees.

    PubMed

    Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J; Hu, Songnian; Chen, Wei-Hua

    2012-07-01

    EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at: http://www.evolgenius.info/evolview.html. PMID:22695796

  17. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees

    PubMed Central

    Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J.; Hu, Songnian; Chen, Wei-Hua

    2012-01-01

    EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at: http://www.evolgenius.info/evolview.html. PMID:22695796

  18. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees.

    PubMed

    Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J; Hu, Songnian; Chen, Wei-Hua

    2012-07-01

    EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at: http://www.evolgenius.info/evolview.html.

  19. Alignment-free genome tree inference by learning group-specific distance metrics.

    PubMed

    Patil, Kaustubh R; McHardy, Alice C

    2013-01-01

    Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.

  20. The vestigial olfactory receptor subgenome of odontocete whales: phylogenetic congruence between gene-tree reconciliation and supermatrix methods.

    PubMed

    McGowen, Michael R; Clark, Clay; Gatesy, John

    2008-08-01

    The macroevolutionary transition of whales (cetaceans) from a terrestrial quadruped to an obligate aquatic form involved major changes in sensory abilities. Compared to terrestrial mammals, the olfactory system of baleen whales is dramatically reduced, and in toothed whales is completely absent. We sampled the olfactory receptor (OR) subgenomes of eight cetacean species from four families. A multigene tree of 115 newly characterized OR sequences from these eight species and published data for Bos taurus revealed a diverse array of class II OR paralogues in Cetacea. Evolution of the OR gene superfamily in toothed whales (Odontoceti) featured a multitude of independent pseudogenization events, supporting anatomical evidence that odontocetes have lost their olfactory sense. We explored the phylogenetic utility of OR pseudogenes in Cetacea, concentrating on delphinids (oceanic dolphins), the product of a rapid evolutionary radiation that has been difficult to resolve in previous studies of mitochondrial DNA sequences. Phylogenetic analyses of OR pseudogenes using both gene-tree reconciliation and supermatrix methods yielded fully resolved, consistently supported relationships among members of four delphinid subfamilies. Alternative minimizations of gene duplications, gene duplications plus gene losses, deep coalescence events, and nucleotide substitutions plus indels returned highly congruent phylogenetic hypotheses. Novel DNA sequence data for six single-copy nuclear loci and three mitochondrial genes (> 5000 aligned nucleotides) provided an independent test of the OR trees. Nucleotide substitutions and indels in OR pseudogenes showed a very low degree of homoplasy in comparison to mitochondrial DNA and, on average, provided more variation than single-copy nuclear DNA. Our results suggest that phylogenetic analysis of the large OR superfamily will be effective for resolving relationships within Cetacea whether supermatrix or gene-tree reconciliation procedures are

  1. A first step toward computing all hybridization networks for two rooted binary phylogenetic trees.

    PubMed

    Scornavacca, Celine; Linz, Simone; Albrecht, Benjamin

    2012-11-01

    Recently, considerable effort has been put into developing fast algorithms to reconstruct a rooted phylogenetic network that explains two rooted phylogenetic trees and has a minimum number of hybridization vertices. With the standard app1235roach to tackle this problem being combinatorial, the reconstructed network is rarely unique. From a biological point of view, it is therefore of importance to not only compute one network, but all possible networks. In this article, we make a first step toward approaching this goal by presenting the first algorithm--called ALLMAAFs--that calculates all maximum-acyclic-agreement forests for two rooted binary phylogenetic trees on the same set of taxa.

  2. Climate-driven extinctions shape the phylogenetic structure of temperate tree floras.

    PubMed

    Eiserhardt, Wolf L; Borchsenius, Finn; Plum, Christoffer M; Ordonez, Alejandro; Svenning, Jens-Christian

    2015-03-01

    When taxa go extinct, unique evolutionary history is lost. If extinction is selective, and the intrinsic vulnerabilities of taxa show phylogenetic signal, more evolutionary history may be lost than expected under random extinction. Under what conditions this occurs is insufficiently known. We show that late Cenozoic climate change induced phylogenetically selective regional extinction of northern temperate trees because of phylogenetic signal in cold tolerance, leading to significantly and substantially larger than random losses of phylogenetic diversity (PD). The surviving floras in regions that experienced stronger extinction are phylogenetically more clustered, indicating that non-random losses of PD are of increasing concern with increasing extinction severity. Using simulations, we show that a simple threshold model of survival given a physiological trait with phylogenetic signal reproduces our findings. Our results send a strong warning that we may expect future assemblages to be phylogenetically and possibly functionally depauperate if anthropogenic climate change affects taxa similarly.

  3. GeneAlign: a coding exon prediction tool based on phylogenetical comparisons.

    PubMed

    Hsieh, Shu Ju; Lin, Chun Yuan; Liu, Ning Han; Chow, Wei Yuan; Tang, Chuan Yi

    2006-07-01

    GeneAlign is a coding exon prediction tool for predicting protein coding genes by measuring the homologies between a sequence of a genome and related sequences, which have been annotated, of other genomes. Identifying protein coding genes is one of most important tasks in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in the newly sequenced genomes by comparing to annotated genes of phylogenetically close organisms. GeneAlign applies CORAL, a heuristic linear time alignment tool, to determine if regions flanked by the candidate signals (initiation codon-GT, AG-GT and AG-STOP codon) are similar to annotated coding exons. Employing the conservation of gene structures and sequence homologies between protein coding regions increases the prediction accuracy. GeneAlign was tested on Projector dataset of 491 human-mouse homologous sequence pairs. At the gene level, both the average sensitivity and the average specificity of GeneAlign are 81%, and they are larger than 96% at the exon level. The rates of missing exons and wrong exons are smaller than 1%. GeneAlign is a free tool available at http://genealign.hccvs.hc.edu.tw.

  4. Phylogenetic Structure of Tree Species across Different Life Stages from Seedlings to Canopy Trees in a Subtropical Evergreen Broad-Leaved Forest.

    PubMed

    Jin, Yi; Qian, Hong; Yu, Mingjian

    2015-01-01

    Investigating patterns of phylogenetic structure across different life stages of tree species in forests is crucial to understanding forest community assembly, and investigating forest gap influence on the phylogenetic structure of forest regeneration is necessary for understanding forest community assembly. Here, we examine the phylogenetic structure of tree species across life stages from seedlings to canopy trees, as well as forest gap influence on the phylogenetic structure of forest regeneration in a forest of the subtropical region in China. We investigate changes in phylogenetic relatedness (measured as NRI) of tree species from seedlings, saplings, treelets to canopy trees; we compare the phylogenetic turnover (measured as βNRI) between canopy trees and seedlings in forest understory with that between canopy trees and seedlings in forest gaps. We found that phylogenetic relatedness generally increases from seedlings through saplings and treelets up to canopy trees, and that phylogenetic relatedness does not differ between seedlings in forest understory and those in forest gaps, but phylogenetic turnover between canopy trees and seedlings in forest understory is lower than that between canopy trees and seedlings in forest gaps. We conclude that tree species tend to be more closely related from seedling to canopy layers, and that forest gaps alter the seedling phylogenetic turnover of the studied forest. It is likely that the increasing trend of phylogenetic clustering as tree stem size increases observed in this subtropical forest is primarily driven by abiotic filtering processes, which select a set of closely related evergreen broad-leaved tree species whose regeneration has adapted to the closed canopy environments of the subtropical forest developed under the regional monsoon climate.

  5. Phylogenetic tree and community structure from a Tangled Nature model.

    PubMed

    Canko, Osman; Taşkın, Ferhat; Argın, Kamil

    2015-10-01

    In evolutionary biology, the taxonomy and origination of species are widely studied subjects. An estimation of the evolutionary tree can be done via available DNA sequence data. The calculation of the tree is made by well-known and frequently used methods such as maximum likelihood and neighbor-joining. In order to examine the results of these methods, an evolutionary tree is pursued computationally by a mathematical model, called Tangled Nature. A relatively small genome space is investigated due to computational burden and it is found that the actual and predicted trees are in reasonably good agreement in terms of shape. Moreover, the speciation and the resulting community structure of the food-web are investigated by modularity.

  6. MrEnt: an editor for publication-quality phylogenetic tree illustrations.

    PubMed

    Zuccon, Alessandro; Zuccon, Dario

    2014-09-01

    We developed MrEnt, a Windows-based, user-friendly software that allows the production of complex, high-resolution, publication-quality phylogenetic trees in few steps, directly from the analysis output. The program recognizes the standard Nexus tree format and the annotated tree files produced by BEAST and MrBayes. MrEnt combines in a single software a large suite of tree manipulation functions (e.g. handling of multiple trees, tree rotation, character mapping, node collapsing, compression of large clades, handling of time scale and error bars for chronograms) with drawing tools typical of standard graphic editors, including handling of graphic elements and images. The tree illustration can be printed or exported in several standard formats suitable for journal publication, PowerPoint presentation or Web publication.

  7. An approximately unbiased test of phylogenetic tree selection.

    PubMed

    Shimodaira, Hidetoshi

    2002-06-01

    An approximately unbiased (AU) test that uses a newly devised multiscale bootstrap technique was developed for general hypothesis testing of regions in an attempt to reduce test bias. It was applied to maximum-likelihood tree selection for obtaining the confidence set of trees. The AU test is based on the theory of Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434; 1996), but the new method provides higher-order accuracy yet simpler implementation. The AU test, like the Shimodaira-Hasegawa (SH) test, adjusts the selection bias overlooked in the standard use of the bootstrap probability and Kishino-Hasegawa tests. The selection bias comes from comparing many trees at the same time and often leads to overconfidence in the wrong trees. The SH test, though safe to use, may exhibit another type of bias such that it appears conservative. Here I show that the AU test is less biased than other methods in typical cases of tree selection. These points are illustrated in a simulation study as well as in the analysis of mammalian mitochondrial protein sequences. The theoretical argument provides a simple formula that covers the bootstrap probability test, the Kishino-Hasegawa test, the AU test, and the Zharkikh-Li test. A practical suggestion is provided as to which test should be used under particular circumstances. PMID:12079646

  8. Phylogenetic Trees and Networks Reduce to Phylogenies on Binary States: Does It Furnish an Explanation to the Robustness of Phylogenetic Trees against Lateral Transfers.

    PubMed

    Thuillard, Marc; Fraix-Burnet, Didier

    2015-01-01

    This article presents an innovative approach to phylogenies based on the reduction of multistate characters to binary-state characters. We show that the reduction to binary characters' approach can be applied to both character- and distance-based phylogenies and provides a unifying framework to explain simply and intuitively the similarities and differences between distance- and character-based phylogenies. Building on these results, this article gives a possible explanation on why phylogenetic trees obtained from a distance matrix or a set of characters are often quite reasonable despite lateral transfers of genetic material between taxa. In the presence of lateral transfers, outer planar networks furnish a better description of evolution than phylogenetic trees. We present a polynomial-time reconstruction algorithm for perfect outer planar networks with a fixed number of states, characters, and lateral transfers.

  9. Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web

    PubMed Central

    Robinson, Oscar; Dylus, David; Dessimoz, Christophe

    2016-01-01

    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license. PMID:27189561

  10. Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

    PubMed

    Robinson, Oscar; Dylus, David; Dessimoz, Christophe

    2016-08-01

    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license. PMID:27189561

  11. Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

    PubMed

    Robinson, Oscar; Dylus, David; Dessimoz, Christophe

    2016-08-01

    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license.

  12. A simulation approach for change-points on phylogenetic trees.

    PubMed

    Persing, Adam; Jasra, Ajay; Beskos, Alexandros; Balding, David; De Iorio, Maria

    2015-01-01

    We observe n sequences at each of m sites and assume that they have evolved from an ancestral sequence that forms the root of a binary tree of known topology and branch lengths, but the sequence states at internal nodes are unknown. The topology of the tree and branch lengths are the same for all sites, but the parameters of the evolutionary model can vary over sites. We assume a piecewise constant model for these parameters, with an unknown number of change-points and hence a transdimensional parameter space over which we seek to perform Bayesian inference. We propose two novel ideas to deal with the computational challenges of such inference. Firstly, we approximate the model based on the time machine principle: the top nodes of the binary tree (near the root) are replaced by an approximation of the true distribution; as more nodes are removed from the top of the tree, the cost of computing the likelihood is reduced linearly in n. The approach introduces a bias, which we investigate empirically. Secondly, we develop a particle marginal Metropolis-Hastings (PMMH) algorithm, that employs a sequential Monte Carlo (SMC) sampler and can use the first idea. Our time-machine PMMH algorithm copes well with one of the bottle-necks of standard computational algorithms: the transdimensional nature of the posterior distribution. The algorithm is implemented on simulated and real data examples, and we empirically demonstrate its potential to outperform competing methods based on approximate Bayesian computation (ABC) techniques. PMID:25506749

  13. A simulation approach for change-points on phylogenetic trees.

    PubMed

    Persing, Adam; Jasra, Ajay; Beskos, Alexandros; Balding, David; De Iorio, Maria

    2015-01-01

    We observe n sequences at each of m sites and assume that they have evolved from an ancestral sequence that forms the root of a binary tree of known topology and branch lengths, but the sequence states at internal nodes are unknown. The topology of the tree and branch lengths are the same for all sites, but the parameters of the evolutionary model can vary over sites. We assume a piecewise constant model for these parameters, with an unknown number of change-points and hence a transdimensional parameter space over which we seek to perform Bayesian inference. We propose two novel ideas to deal with the computational challenges of such inference. Firstly, we approximate the model based on the time machine principle: the top nodes of the binary tree (near the root) are replaced by an approximation of the true distribution; as more nodes are removed from the top of the tree, the cost of computing the likelihood is reduced linearly in n. The approach introduces a bias, which we investigate empirically. Secondly, we develop a particle marginal Metropolis-Hastings (PMMH) algorithm, that employs a sequential Monte Carlo (SMC) sampler and can use the first idea. Our time-machine PMMH algorithm copes well with one of the bottle-necks of standard computational algorithms: the transdimensional nature of the posterior distribution. The algorithm is implemented on simulated and real data examples, and we empirically demonstrate its potential to outperform competing methods based on approximate Bayesian computation (ABC) techniques.

  14. Assessing statistical reliability of phylogenetic trees via a speedy double bootstrap method.

    PubMed

    Ren, Aizhen; Ishida, Takashi; Akiyama, Yutaka

    2013-05-01

    Evaluating the reliability of estimated phylogenetic trees is of critical importance in the field of molecular phylogenetics, and for other endeavors that depend on accurate phylogenetic reconstruction. The bootstrap method is a well-known computational approach to phylogenetic tree assessment, and more generally for assessing the reliability of statistical models. However, it is known to be biased under certain circumstances, calling into question the accuracy of the method. Several advanced bootstrap methods have been developed to achieve higher accuracy, one of which is the double bootstrap approach, but the computational burden of this method has precluded its application to practical problems of phylogenetic tree selection. We address this issue by proposing a simple method called the speedy double bootstrap, which circumvents the second-tier resampling step in the regular double bootstrap approach. We also develop an implementation of the regular double bootstrap for comparison with our speedy method. The speedy double bootstrap suffers no significant loss of accuracy compared with the regular double bootstrap, while performing calculations significantly more rapidly (at minimum around 371 times faster, based on analysis of mammalian mitochondrial amino acid sequences and 12S and 16S rRNA genes). Our method thus enables, for the first time, the practical application of the double bootstrap technique in the context of molecular phylogenetics. The approach can also be used more generally for model selection problems wherever the maximum likelihood criterion is used.

  15. Aligned 18S for Zoraptera (Insecta): phylogenetic position and molecular evolution.

    PubMed

    Yoshizawa, Kazunori; Johnson, Kevin P

    2005-11-01

    The order Zoraptera (angel insects) is one of the least known insect groups, containing only 32 extant species. The phylogenetic position of Zoraptera is poorly understood, but it is generally thought to be closely related to either Paraneoptera (hemipteroid orders: booklice, lice, thrips, and bugs), Dictyoptera (blattoid orders: cockroaches, termites, and mantis), or Embioptera (web spinners). We inferred the phylogenetic position of Zoraptera by analyzing nuclear 18S rDNA sequences, which we aligned according to a secondary structure model. Maximum likelihood and Bayesian analyses both supported a close relationship between Zoraptera and Dictyoptera with relatively high posterior probability. The 18S sequences of Zoraptera exhibited several unusual properties: (1) a dramatically increased substitution rate, which resulted in very long branches; (2) long insertions at helix E23; and (3) modifications of secondary structures at helices 12 and 18.

  16. Analyzing phylogenetic trees with timed and probabilistic model checking: the lactose persistence case study.

    PubMed

    Requeno, José Ignacio; Colom, José Manuel

    2014-01-01

    Model checking is a generic verification technique that allows the phylogeneticist to focus on models and specifications instead of on implementation issues. Phylogenetic trees are considered as transition systems over which we interrogate phylogenetic questions written as formulas of temporal logic. Nonetheless, standard logics become insufficient for certain practices of phylogenetic analysis since they do not allow the inclusion of explicit time and probabilities. The aim of this paper is to extend the application of model checking techniques beyond qualitative phylogenetic properties and adapt the existing logical extensions and tools to the field of phylogeny. The introduction of time and probabilities in phylogenetic specifications is motivated by the study of a real example: the analysis of the ratio of lactose intolerance in some populations and the date of appearance of this phenotype. PMID:25339082

  17. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees.

    PubMed

    He, Zilong; Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J; Chen, Wei-Hua; Hu, Songnian

    2016-07-01

    Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its 'dataset system' contains not only the data to be visualized on the tree, but also 'modifiers' that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new 'Demo' trees to demonstrate the basic functionalities of Evolview, and five new 'Showcase' trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/. PMID:27131786

  18. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees.

    PubMed

    He, Zilong; Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J; Chen, Wei-Hua; Hu, Songnian

    2016-07-01

    Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its 'dataset system' contains not only the data to be visualized on the tree, but also 'modifiers' that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new 'Demo' trees to demonstrate the basic functionalities of Evolview, and five new 'Showcase' trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/.

  19. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees

    PubMed Central

    He, Zilong; Zhang, Huangkai; Gao, Shenghan; Lercher, Martin J.; Chen, Wei-Hua; Hu, Songnian

    2016-01-01

    Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its ‘dataset system’ contains not only the data to be visualized on the tree, but also ‘modifiers’ that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new ‘Demo’ trees to demonstrate the basic functionalities of Evolview, and five new ‘Showcase’ trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/. PMID:27131786

  20. Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

    PubMed Central

    Lanfear, Robert; Hua, Xia; Warren, Dan L.

    2016-01-01

    Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794

  1. PhyloPen: Phylogenetic Tree Browsing Using a Pen and Touch Interface

    PubMed Central

    Wehrer, Anthony; Yee, Andrew; Lisle, Curtis; Hughes, Charles

    2015-01-01

    Phylogenetic trees are used by researchers across multiple fields of study to display historical relationships between organisms or genes. Trees are used to examine the speciation process in evolutionary biology, to classify families of viruses in epidemiology, to demonstrate co-speciation in host and pathogen studies, and to explore genetic changes occurring during the disease process in cancer, among other applications. Due to their complexity and the amount of data they present in visual form, phylogenetic trees have generally been difficult to render for publication and challenging to directly interact with in digital form. To address these limitations, we developed PhyloPen, an experimental novel multi-touch and pen application that renders a phylogenetic tree and allows users to interactively navigate within the tree, examining nodes, branches, and auxiliary information, and annotate the tree for note-taking and collaboration. We present a discussion of the interactions implemented in PhyloPen and the results of a formative study that examines how the application was received after use by practicing biologists -- faculty members and graduate students in the discipline. These results are to be later used for a fully supported implementation of the software where the community will be welcomed to participate in its development. PMID:26693078

  2. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased.

    PubMed

    Xi, Zhenxiang; Liu, Liang; Davis, Charles C

    2015-11-01

    The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).

  3. Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis

    PubMed Central

    2012-01-01

    Background Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use. Findings Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree. Conclusions The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting

  4. Nearly complete rRNA genes from 371 Animalia: updated structure-based alignment and detailed phylogenetic analysis.

    PubMed

    Mallatt, Jon; Craig, Catherine Waggoner; Yoder, Matthew J

    2012-09-01

    This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ~33 of the ~36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/~jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A-F, or 6A-C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly

  5. Climate Change Impacts on the Tree of Life: Changes in Phylogenetic Diversity Illustrated for Acropora Corals

    PubMed Central

    Faith, Daniel P.; Richards, Zoe T.

    2012-01-01

    The possible loss of whole branches from the tree of life is a dramatic, but under-studied, biological implication of climate change. The tree of life represents an evolutionary heritage providing both present and future benefits to humanity, often in unanticipated ways. Losses in this evolutionary (evo) life-support system represent losses in “evosystem” services, and are quantified using the phylogenetic diversity (PD) measure. High species-level biodiversity losses may or may not correspond to high PD losses. If climate change impacts are clumped on the phylogeny, then loss of deeper phylogenetic branches can mean disproportionately large PD loss for a given degree of species loss. Over time, successive species extinctions within a clade each may imply only a moderate loss of PD, until the last species within that clade goes extinct, and PD drops precipitously. Emerging methods of “phylogenetic risk analysis” address such phylogenetic tipping points by adjusting conservation priorities to better reflect risk of such worst-case losses. We have further developed and explored this approach for one of the most threatened taxonomic groups, corals. Based on a phylogenetic tree for the corals genus Acropora, we identify cases where worst-case PD losses may be avoided by designing risk-averse conservation priorities. We also propose spatial heterogeneity measures changes to assess possible changes in the geographic distribution of corals PD. PMID:24832524

  6. Building a Phylogenetic Tree of the Human and Ape Superfamily Using DNA-DNA Hybridization Data

    ERIC Educational Resources Information Center

    Maier, Caroline Alexander

    2004-01-01

    The study describes the process of DNA-DNA hybridization and the history of its use by Sibley and Alquist in simple, straightforward, and interesting language that students easily understand to create their own phylogenetic tree of the hominoid superfamily. They calibrate the DNA clock and use it to estimate the divergence dates of the various…

  7. Model checking software for phylogenetic trees using distribution and database methods.

    PubMed

    Requeno, José Ignacio; Colom, José Manuel

    2013-01-01

    Model checking, a generic and formal paradigm stemming from computer science based on temporal logics, has been proposed for the study of biological properties that emerge from the labeling of the states defined over the phylogenetic tree. This strategy allows us to use generic software tools already present in the industry. However, the performance of traditional model checking is penalized when scaling the system for large phylogenies. To this end, two strategies are presented here. The first one consists of partitioning the phylogenetic tree into a set of subgraphs each one representing a subproblem to be verified so as to speed up the computation time and distribute the memory consumption. The second strategy is based on uncoupling the information associated to each state of the phylogenetic tree (mainly, the DNA sequence) and exporting it to an external tool for the management of large information systems. The integration of all these approaches outperforms the results of monolithic model checking and helps us to execute the verification of properties in a real phylogenetic tree. PMID:24231143

  8. Building Phylogenetic Trees from DNA Sequence Data: Investigating Polar Bear and Giant Panda Ancestry.

    ERIC Educational Resources Information Center

    Maier, Caroline Alexandra

    2001-01-01

    Presents an activity in which students seek answers to questions about evolutionary relationships by using genetic databases and bioinformatics software. Students build genetic distance matrices and phylogenetic trees based on molecular sequence data using web-based resources. Provides a flowchart of steps involved in accessing, retrieving, and…

  9. Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics.

    PubMed

    Yang, Ziheng

    2007-08-01

    The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n --> infinity, the posterior tree probabilities do not converge to 1/3 each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.

  10. Phylogenetic isolation of host trees affects assembly of local Heteroptera communities.

    PubMed

    Vialatte, A; Bailey, R I; Vasseur, C; Matocq, A; Gossner, M M; Everhart, D; Vitrac, X; Belhadj, A; Ernoult, A; Prinzing, A

    2010-07-22

    A host may be physically isolated in space and then may correspond to a geographical island, but it may also be separated from its local neighbours by hundreds of millions of years of evolutionary history, and may form in this case an evolutionarily distinct island. We test how this affects the assembly processes of the host's colonizers, this question being until now only invoked at the scale of physically distinct islands or patches. We studied the assembly of true bugs in crowns of oaks surrounded by phylogenetically more or less closely related trees. Despite the short distances (less than 150 m) between phylogenetically isolated and non-isolated trees, we found major differences between their Heteroptera faunas. We show that phylogenetically isolated trees support smaller numbers and fewer species of Heteroptera, an increasing proportion of phytophages and a decreasing proportion of omnivores, and proportionally more non-host-specialists. These differences were not due to changes in the nutritional quality of the trees, i.e. species sorting, which we accounted for. Comparison with predictions from meta-community theories suggests that the assembly of local Heteroptera communities may be strongly driven by independent metapopulation processes at the level of the individual species. We conclude that the assembly of communities on hosts separated from their neighbours by long periods of evolutionary history is qualitatively and quantitatively different from that on hosts established surrounded by closely related trees. Potentially, the biotic selection pressure on a host might thus change with the evolutionary proximity of the surrounding hosts.

  11. Reversible polymorphism-aware phylogenetic models and their application to tree inference.

    PubMed

    Schrempf, Dominik; Minh, Bui Quang; De Maio, Nicola; von Haeseler, Arndt; Kosiol, Carolin

    2016-10-21

    We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction. PMID:27480613

  12. Equality of Shapley value and fair proportion index in phylogenetic trees.

    PubMed

    Fuchs, Michael; Jin, Emma Yu

    2015-11-01

    The Shapley value and the fair proportion index of phylogenetic trees have been introduced recently for the purpose of making conservation decisions in genetics. Moreover, also very recently, Hartmann (J Math Biol 67:1163-1170, 2013) has presented data which shows that there is a strong correlation between a slightly modified version of the Shapley value (which we call the modified Shapley value) and the fair proportion index. He gave an explanation of this correlation by showing that the contribution of both indices to an edge of the tree becomes identical as the number of taxa tends to infinity. In this note, we show that the Shapley value and the fair proportion index are in fact the same. Moreover, we also consider the modified Shapley value and show that its covariance with the fair proportion index in random phylogenetic trees under the Yule-Harding model and uniform model is indeed close to one.

  13. Influence of tree shape and evolutionary time-scale on phylogenetic diversity metrics

    PubMed Central

    Mazel, F.; Davies, T.J; Gallien, L.; Renaud, J.; Groussin, M.; Münkemüller, T.; Thuiller, W.

    2016-01-01

    During the last decades, describing, analysing and understanding the phylogenetic structure of species assemblages has been a central theme in both community ecology and macro-ecology. Among the wide variety of phylogenetic structure metrics, three have been predominant in the literature: Faith’s phylogenetic diversity (PDFaith), which represents the sum of the branch lengths of the phylogenetic tree linking all species of a particular assemblage, the mean pairwise distance between all species in an assemblage (MPD) and the pairwise distance between the closest relatives in an assemblage (MNTD). Comparisons between studies using one or several of these metrics are difficult because there has been no comprehensive evaluation of the phylogenetic properties each metric captures. In particular it is unknown how PDFaith relates to MDP and MNTD. Consequently, it is possible that apparently opposing patterns in different studies might simply reflect differences in metric properties. Here, we aim to fill this gap by comparing these metrics using simulations and empirical data. We first used simulation experiments to test the influence of community structure and size on the mismatch between metrics whilst varying the shape and size of the phylogenetic tree of the species pool. Second we investigated the mismatch between metrics for two empirical datasets (gut microbes and global carnivoran assemblages). We show that MNTD and PDFaith provide similar information on phylogenetic structure, and respond similarly to variation in species richness and assemblage structure. However, MPD demonstrate a very different behaviour, and is highly sensitive to deep branching structure. We suggest that by combining complementary metrics that are sensitive to processes operating at different phylogenetic depths (i.e. MPD and MNTD or PDFaith) we can obtain a better understanding of assemblage structure. PMID:27713599

  14. Genetic Distances and Reconstruction of Phylogenetic Trees from Microsatellite DNA

    PubMed Central

    Takezaki, N.; Nei, M.

    1996-01-01

    Recently many investigators have used microsatellite DNA loci for studying the evolutionary relationships of closely related populations or species, and some authors proposed new genetic distance measures for this purpose. However, the efficiencies of these distance measures in obtaining the correct tree topology remains unclear. We therefore investigated the probability of obtaining the correct topology (P(C)) for these new distances as well as traditional distance measures by using computer simulation. We used both the infinite-allele model (IAM) and the stepwise mutation model (SMM), which seem to be appropriate for classical markers and microsatellite loci, respectively. The results show that in both the IAM and SMM CAVALLI-SFORZA and EDWARDS' chord distance (D(C)) and NEI et al.'s D(A) distance generally show higher P(C) values than other distance measures, whether the bottleneck effect exists or not. For estimating evolutionary times, however, NEI's standard distance and GOLDSTEIN et al.'s (δ μ)(2) are more appropriate than other distances. Microsatellite DNA seems to be very useful for clarifying the evolutionary relationships of closely related populations. PMID:8878702

  15. Molecular Phylogenetics and Systematics of the Bivalve Family Ostreidae Based on rRNA Sequence-Structure Models and Multilocus Species Tree

    PubMed Central

    Salvi, Daniele; Macali, Armando; Mariottini, Paolo

    2014-01-01

    The bivalve family Ostreidae has a worldwide distribution and includes species of high economic importance. Phylogenetics and systematic of oysters based on morphology have proved difficult because of their high phenotypic plasticity. In this study we explore the phylogenetic information of the DNA sequence and secondary structure of the nuclear, fast-evolving, ITS2 rRNA and the mitochondrial 16S rRNA genes from the Ostreidae and we implemented a multi-locus framework based on four loci for oyster phylogenetics and systematics. Sequence-structure rRNA models aid sequence alignment and improved accuracy and nodal support of phylogenetic trees. In agreement with previous molecular studies, our phylogenetic results indicate that none of the currently recognized subfamilies, Crassostreinae, Ostreinae, and Lophinae, is monophyletic. Single gene trees based on Maximum likelihood (ML) and Bayesian (BA) methods and on sequence-structure ML were congruent with multilocus trees based on a concatenated (ML and BA) and coalescent based (BA) approaches and consistently supported three main clades: (i) Crassostrea, (ii) Saccostrea, and (iii) an Ostreinae-Lophinae lineage. Therefore, the subfamily Crassotreinae (including Crassostrea), Saccostreinae subfam. nov. (including Saccostrea and tentatively Striostrea) and Ostreinae (including Ostreinae and Lophinae taxa) are recognized. Based on phylogenetic and biogeographical evidence the Asian species of Crassostrea from the Pacific Ocean are assigned to Magallana gen. nov., whereas an integrative taxonomic revision is required for the genera Ostrea and Dendostrea. This study pointed out the suitability of the ITS2 marker for DNA barcoding of oyster and the relevance of using sequence-structure rRNA models and features of the ITS2 folding in molecular phylogenetics and taxonomy. The multilocus approach allowed inferring a robust phylogeny of Ostreidae providing a broad molecular perspective on their systematics. PMID:25250663

  16. An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees

    PubMed Central

    2013-01-01

    Abstract A phylogenetic network is a model for reticulate evolution. A hybridization network is one type of phylogenetic network for a set of discordant gene trees and “displays” each gene tree. A central computational problem on hybridization networks is: given a set of gene trees, reconstruct the minimum (i.e., most parsimonious) hybridization network that displays each given gene tree. This problem is known to be NP-hard, and existing approaches for this problem are either heuristics or making simplifying assumptions (e.g., work with only two input trees or assume some topological properties). In this article, we develop an exact algorithm (called PIRNC) for inferring the minimum hybridization networks from multiple gene trees. The PIRNC algorithm does not rely on structural assumptions (e.g., the so-called galled networks). To the best of our knowledge, PIRNC is the first exact algorithm implemented for this formulation. When the number of reticulation events is relatively small (say, four or fewer), PIRNC runs reasonably efficient even for moderately large datasets. For building more complex networks, we also develop a heuristic version of PIRNC called PIRNCH. Simulation shows that PIRNCH usually produces networks with fewer reticulation events than those by an existing method. PIRNC and PIRNCH have been implemented as part of the software package called PIRN and is available online. PMID:24093230

  17. A phylogenetic perspective on the individual species-area relationship in temperate and tropical tree communities.

    PubMed

    Yang, Jie; Swenson, Nathan G; Cao, Min; Chuyong, George B; Ewango, Corneille E N; Howe, Robert; Kenfack, David; Thomas, Duncan; Wolf, Amy; Lin, Luxiang

    2013-01-01

    Ecologists have historically used species-area relationships (SARs) as a tool to understand the spatial distribution of species. Recent work has extended SARs to focus on individual-level distributions to generate individual species area relationships (ISARs). The ISAR approach quantifies whether individuals of a species tend have more or less species richness surrounding them than expected by chance. By identifying richness 'accumulators' and 'repellers', respectively, the ISAR approach has been used to infer the relative importance of abiotic and biotic interactions and neutrality. A clear limitation of the SAR and ISAR approaches is that all species are treated as evolutionarily independent and that a large amount of work has now shown that local tree neighborhoods exhibit non-random phylogenetic structure given the species richness. Here, we use nine tropical and temperate forest dynamics plots to ask: (i) do ISARs change predictably across latitude?; (ii) is the phylogenetic diversity in the neighborhood of species accumulators and repellers higher or lower than that expected given the observed species richness?; and (iii) do species accumulators, repellers distributed non-randomly on the community phylogenetic tree? The results indicate no clear trend in ISARs from the temperate zone to the tropics and that the phylogenetic diversity surrounding the individuals of species is generally only non-random on very local scales. Interestingly the distribution of species accumulators and repellers was non-random on the community phylogenies suggesting the presence of phylogenetic signal in the ISAR across latitude.

  18. Characterizing the Phylogenetic Tree Community Structure of a Protected Tropical Rain Forest Area in Cameroon

    PubMed Central

    Munoz, François; Couteron, Pierre; Hardy, Olivier J.; Sonké, Bonaventure

    2014-01-01

    Tropical rain forests, the richest terrestrial ecosystems in biodiversity on Earth are highly threatened by global changes. This paper aims to infer the mechanisms governing species tree assemblages by characterizing the phylogenetic structure of a tropical rain forest in a protected area of the Congo Basin, the Dja Faunal Reserve (Cameroon). We re-analyzed a dataset of 11538 individuals belonging to 372 taxa found along nine transects spanning five habitat types. We generated a dated phylogenetic tree including all sampled taxa to partition the phylogenetic diversity of the nine transects into alpha and beta components at the level of the transects and of the habitat types. The variation in phylogenetic composition among transects did not deviate from a random pattern at the scale of the Dja Faunal Reserve, probably due to a common history and weak environmental variation across the park. This lack of phylogenetic structure combined with an isolation-by-distance pattern of taxonomic diversity suggests that neutral dispersal limitation is a major driver of community assembly in the Dja. To assess any lack of sensitivity to the variation in habitat types, we restricted the analyses of transects to the terra firme primary forest and found results consistent with those of the whole dataset at the level of the transects. Additionally to previous analyses, we detected a weak but significant phylogenetic turnover among habitat types, suggesting that species sort in varying environments, even though it is not predominating on the overall phylogenetic structure. Finer analyses of clades indicated a signal of clustering for species from the Annonaceae family, while species from the Apocynaceae family indicated overdispersion. These results can contribute to the conservation of the park by improving our understanding of the processes dictating community assembly in these hyperdiverse but threatened regions of the world. PMID:24936786

  19. Characterizing the phylogenetic tree community structure of a protected tropical rain forest area in Cameroon.

    PubMed

    Manel, Stéphanie; Couvreur, Thomas L P; Munoz, François; Couteron, Pierre; Hardy, Olivier J; Sonké, Bonaventure

    2014-01-01

    Tropical rain forests, the richest terrestrial ecosystems in biodiversity on Earth are highly threatened by global changes. This paper aims to infer the mechanisms governing species tree assemblages by characterizing the phylogenetic structure of a tropical rain forest in a protected area of the Congo Basin, the Dja Faunal Reserve (Cameroon). We re-analyzed a dataset of 11538 individuals belonging to 372 taxa found along nine transects spanning five habitat types. We generated a dated phylogenetic tree including all sampled taxa to partition the phylogenetic diversity of the nine transects into alpha and beta components at the level of the transects and of the habitat types. The variation in phylogenetic composition among transects did not deviate from a random pattern at the scale of the Dja Faunal Reserve, probably due to a common history and weak environmental variation across the park. This lack of phylogenetic structure combined with an isolation-by-distance pattern of taxonomic diversity suggests that neutral dispersal limitation is a major driver of community assembly in the Dja. To assess any lack of sensitivity to the variation in habitat types, we restricted the analyses of transects to the terra firme primary forest and found results consistent with those of the whole dataset at the level of the transects. Additionally to previous analyses, we detected a weak but significant phylogenetic turnover among habitat types, suggesting that species sort in varying environments, even though it is not predominating on the overall phylogenetic structure. Finer analyses of clades indicated a signal of clustering for species from the Annonaceae family, while species from the Apocynaceae family indicated overdispersion. These results can contribute to the conservation of the park by improving our understanding of the processes dictating community assembly in these hyperdiverse but threatened regions of the world.

  20. Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.

    PubMed

    Mirzaei, Sajad; Wu, Yufeng

    2016-01-01

    Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets. PMID:27295640

  1. Local-scale Partitioning of Functional and Phylogenetic Beta Diversity in a Tropical Tree Assemblage.

    PubMed

    Yang, Jie; Swenson, Nathan G; Zhang, Guocheng; Ci, Xiuqin; Cao, Min; Sha, Liqing; Li, Jie; Ferry Slik, J W; Lin, Luxiang

    2015-01-01

    The relative degree to which stochastic and deterministic processes underpin community assembly is a central problem in ecology. Quantifying local-scale phylogenetic and functional beta diversity may shed new light on this problem. We used species distribution, soil, trait and phylogenetic data to quantify whether environmental distance, geographic distance or their combination are the strongest predictors of phylogenetic and functional beta diversity on local scales in a 20-ha tropical seasonal rainforest dynamics plot in southwest China. The patterns of phylogenetic and functional beta diversity were generally consistent. The phylogenetic and functional dissimilarity between subplots (10 × 10 m, 20 × 20 m, 50 × 50 m and 100 × 100 m) was often higher than that expected by chance. The turnover of lineages and species function within habitats was generally slower than that across habitats. Partitioning the variation in phylogenetic and functional beta diversity showed that environmental distance was generally a better predictor of beta diversity than geographic distance thereby lending relatively more support for deterministic environmental filtering over stochastic processes. Overall, our results highlight that deterministic processes play a stronger role than stochastic processes in structuring community composition in this diverse assemblage of tropical trees. PMID:26235237

  2. Local-scale Partitioning of Functional and Phylogenetic Beta Diversity in a Tropical Tree Assemblage

    PubMed Central

    Yang, Jie; Swenson, Nathan G.; Zhang, Guocheng; Ci, Xiuqin; Cao, Min; Sha, Liqing; Li, Jie; Ferry Slik, J. W.; Lin, Luxiang

    2015-01-01

    The relative degree to which stochastic and deterministic processes underpin community assembly is a central problem in ecology. Quantifying local-scale phylogenetic and functional beta diversity may shed new light on this problem. We used species distribution, soil, trait and phylogenetic data to quantify whether environmental distance, geographic distance or their combination are the strongest predictors of phylogenetic and functional beta diversity on local scales in a 20-ha tropical seasonal rainforest dynamics plot in southwest China. The patterns of phylogenetic and functional beta diversity were generally consistent. The phylogenetic and functional dissimilarity between subplots (10 × 10 m, 20 × 20 m, 50 × 50 m and 100 × 100 m) was often higher than that expected by chance. The turnover of lineages and species function within habitats was generally slower than that across habitats. Partitioning the variation in phylogenetic and functional beta diversity showed that environmental distance was generally a better predictor of beta diversity than geographic distance thereby lending relatively more support for deterministic environmental filtering over stochastic processes. Overall, our results highlight that deterministic processes play a stronger role than stochastic processes in structuring community composition in this diverse assemblage of tropical trees. PMID:26235237

  3. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity

    PubMed Central

    Kuraku, Shigehiro; Zmasek, Christian M.; Nishimura, Osamu; Katoh, Kazutaka

    2013-01-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614

  4. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.

    PubMed

    Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka

    2013-07-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614

  5. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.

    PubMed

    Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka

    2013-07-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology.

  6. Novel information theory-based measures for quantifying incongruence among phylogenetic trees.

    PubMed

    Salichos, Leonidas; Stamatakis, Alexandros; Rokas, Antonis

    2014-05-01

    Phylogenies inferred from different data matrices often conflict with each other necessitating the development of measures that quantify this incongruence. Here, we introduce novel measures that use information theory to quantify the degree of conflict or incongruence among all nontrivial bipartitions present in a set of trees. The first measure, internode certainty (IC), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode (internal branch) in a given set of trees jointly with that of the most prevalent conflicting bipartition in the same tree set. The second measure, IC All (ICA), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode in a given set of trees in conjunction with that of all conflicting bipartitions in the same underlying tree set. Finally, the tree certainty (TC) and TC All (TCA) measures are the sum of IC and ICA values across all internodes of a phylogeny, respectively. IC, ICA, TC, and TCA can be calculated from different types of data that contain nontrivial bipartitions, including from bootstrap replicate trees to gene trees or individual characters. Given a set of phylogenetic trees, the IC and ICA values of a given internode reflect its specific degree of incongruence, and the TC and TCA values describe the global degree of incongruence between trees in the set. All four measures are implemented and freely available in version 8.0.0 and subsequent versions of the widely used program RAxML.

  7. Phylogenetic Stability, Tree Shape, and Character Compatibility: A Case Study Using Early Tetrapods.

    PubMed

    Bernardi, Massimo; Angielczyk, Kenneth D; Mitchell, Jonathan S; Ruta, Marcello

    2016-09-01

    Phylogenetic tree shape varies as the evolutionary processes affecting a clade change over time. In this study, we examined an empirical phylogeny of fossil tetrapods during several time intervals, and studied how temporal constraints manifested in patterns of tree imbalance and character change. The results indicate that the impact of temporal constraints on tree shape is minimal and highlights the stability through time of the reference tetrapod phylogeny. Unexpected values of imbalance for Mississippian and Pennsylvanian time slices strongly support the hypothesis that the Carboniferous was a period of explosive tetrapod radiation. Several significant diversification shifts take place in the Mississippian and underpin increased terrestrialization among the earliest limbed vertebrates. Character incompatibility is relatively high at the beginning of tetrapod history, but quickly decreases to a relatively stable lower level, relative to a null distribution based on constant rates of character change. This implies that basal tetrapods had high, but declining, rates of homoplasy early in their evolutionary history, although the origin of Lissamphibia is an exception to this trend. The time slice approach is a powerful method of phylogenetic analysis and a useful tool for assessing the impact of combining extinct and extant taxa in phylogenetic analyses of large and speciose clades.

  8. Phylogenetic Stability, Tree Shape, and Character Compatibility: A Case Study Using Early Tetrapods.

    PubMed

    Bernardi, Massimo; Angielczyk, Kenneth D; Mitchell, Jonathan S; Ruta, Marcello

    2016-09-01

    Phylogenetic tree shape varies as the evolutionary processes affecting a clade change over time. In this study, we examined an empirical phylogeny of fossil tetrapods during several time intervals, and studied how temporal constraints manifested in patterns of tree imbalance and character change. The results indicate that the impact of temporal constraints on tree shape is minimal and highlights the stability through time of the reference tetrapod phylogeny. Unexpected values of imbalance for Mississippian and Pennsylvanian time slices strongly support the hypothesis that the Carboniferous was a period of explosive tetrapod radiation. Several significant diversification shifts take place in the Mississippian and underpin increased terrestrialization among the earliest limbed vertebrates. Character incompatibility is relatively high at the beginning of tetrapod history, but quickly decreases to a relatively stable lower level, relative to a null distribution based on constant rates of character change. This implies that basal tetrapods had high, but declining, rates of homoplasy early in their evolutionary history, although the origin of Lissamphibia is an exception to this trend. The time slice approach is a powerful method of phylogenetic analysis and a useful tool for assessing the impact of combining extinct and extant taxa in phylogenetic analyses of large and speciose clades. PMID:27288479

  9. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees.

    PubMed

    Rabosky, Daniel L

    2014-01-01

    A number of methods have been developed to infer differential rates of species diversification through time and among clades using time-calibrated phylogenetic trees. However, we lack a general framework that can delineate and quantify heterogeneous mixtures of dynamic processes within single phylogenies. I developed a method that can identify arbitrary numbers of time-varying diversification processes on phylogenies without specifying their locations in advance. The method uses reversible-jump Markov Chain Monte Carlo to move between model subspaces that vary in the number of distinct diversification regimes. The model assumes that changes in evolutionary regimes occur across the branches of phylogenetic trees under a compound Poisson process and explicitly accounts for rate variation through time and among lineages. Using simulated datasets, I demonstrate that the method can be used to quantify complex mixtures of time-dependent, diversity-dependent, and constant-rate diversification processes. I compared the performance of the method to the MEDUSA model of rate variation among lineages. As an empirical example, I analyzed the history of speciation and extinction during the radiation of modern whales. The method described here will greatly facilitate the exploration of macroevolutionary dynamics across large phylogenetic trees, which may have been shaped by heterogeneous mixtures of distinct evolutionary processes. PMID:24586858

  10. The algebra of the general Markov model on phylogenetic trees and networks.

    PubMed

    Sumner, J G; Holland, B R; Jarvis, P D

    2012-04-01

    It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.

  11. Assessing confidence in phylogenetic trees : bootstrap versus Markov chain Monte Carlo

    SciTech Connect

    Burr, Tom; Doak, J. E.; Gattiker, J. R.; Stanbro, W. D.

    2002-01-01

    Recent implementations of Bayesian approaches are one of the largest advances in phylogenetic tree estimation in the last 10 years. Markov chain Monte Carlo (MCMC) is used in these new approaches to estimate the Bayesian posterior probability for each tree topology of interest. Our goal is to assess the confidence in the estimated tree (particularly in whether prespecified groups are monophyletic) using MCMC and to compare the Bayesian estimate of confidence to a bootstrap-based estimate of confidence. We compare the Bayesian posterior probability to the bootstrap probability for specified groups in two real sets of influenza sequences and two sets of simulated sequences for our comparison. We conclude that the bootstrap estimate is adequate compared to the MCMC estimate except perhaps if the number of DNA sites is small.

  12. Genetic distances and phylogenetic trees of different Awassi sheep populations based on DNA sequencing.

    PubMed

    Al-Atiyat, R M; Aljumaah, R S

    2014-01-01

    This study aimed to estimate evolutionary distances and to reconstruct phylogeny trees between different Awassi sheep populations. Thirty-two sheep individuals from three different geographical areas of Jordan and the Kingdom of Saudi Arabia (KSA) were randomly sampled. DNA was extracted from the tissue samples and sequenced using the T7 promoter universal primer. Different phylogenetic trees were reconstructed from 0.64-kb DNA sequences using the MEGA software with the best general time reverse distance model. Three methods of distance estimation were then used. The maximum composite likelihood test was considered for reconstructing maximum likelihood, neighbor-joining and UPGMA trees. The maximum likelihood tree indicated three major clusters separated by cytosine (C) and thymine (T). The greatest distance was shown between the South sheep and North sheep. On the other hand, the KSA sheep as an outgroup showed shorter evolutionary distance to the North sheep population than to the others. The neighbor-joining and UPGMA trees showed quite reliable clusters of evolutionary differentiation of Jordan sheep populations from the Saudi population. The overall results support geographical information and ecological types of the sheep populations studied. Summing up, the resulting phylogeny trees may contribute to the limited information about the genetic relatedness and phylogeny of Awassi sheep in nearby Arab countries.

  13. Use of Alignment-Free Phylogenetics for Rapid Genome Sequence-Based Typing of Helicobacter pylori Virulence Markers and Antibiotic Susceptibility

    PubMed Central

    Kusters, Johannes G.

    2015-01-01

    Whole-genome sequencing is becoming a leading technology in the typing and epidemiology of microbial pathogens, but the increase in genomic information necessitates significant investment in bioinformatic resources and expertise, and currently used methodologies struggle with genetically heterogeneous bacteria such as the human gastric pathogen Helicobacter pylori. Here we demonstrate that the alignment-free analysis method feature frequency profiling (FFP) can be used to rapidly construct phylogenetic trees of draft bacterial genome sequences on a standard desktop computer and that coupling with in silico genotyping methods gives useful information for comparative and clinical genomic and molecular epidemiology applications. FFP-based phylogenetic trees of seven gastric Helicobacter species matched those obtained by analysis of 16S rRNA genes and ribosomal proteins, and FFP- and core genome single nucleotide polymorphism-based analysis of 63 H. pylori genomes again showed comparable phylogenetic clustering, consistent with genomotypes assigned by using multilocus sequence typing (MLST). Analysis of 377 H. pylori genomes highlighted the conservation of genomotypes and linkage with phylogeographic characteristics and predicted the presence of an incomplete or nonfunctional cag pathogenicity island in 18/276 genomes. In silico analysis of antibiotic susceptibility markers suggests that most H. pylori hspAmerind and hspEAsia isolates are predicted to carry the T2812C mutation potentially conferring low-level clarithromycin resistance, while levels of metronidazole resistance were similar in all multilocus sequence types. In conclusion, the use of FFP phylogenetic clustering and in silico genotyping allows determination of genome evolution and phylogeographic clustering and can contribute to clinical microbiology by genomotyping for outbreak management and the prediction of pathogenic potential and antibiotic susceptibility. PMID:26135867

  14. Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence d...

  15. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees.

    PubMed

    Letunic, Ivica; Bork, Peer

    2016-07-01

    Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. The current version was completely redesigned and rewritten, utilizing current web technologies for speedy and streamlined processing. Numerous new features were introduced and several new data types are now supported. Trees with up to 100,000 leaves can now be efficiently displayed. Full interactive control over precise positioning of various annotation features and an unlimited number of datasets allow the easy creation of complex tree visualizations. iTOL 3 is the first tool which supports direct visualization of the recently proposed phylogenetic placements format. Finally, iTOL's account system has been redesigned to simplify the management of trees in user-defined workspaces and projects, as it is heavily used and currently handles already more than 500,000 trees from more than 10,000 individual users.

  16. [Research on constructing phylogenetics trees of ruminants basing on the database of milk protein gene sequences].

    PubMed

    Fan, B L; Li, N; Wu, C X

    2000-01-01

    Primers designed according to the sequences of four milk protein genes of cow Bos taurus (alpha-lactoalbumin, beta-lactoglobin, beta- and kappa-casein) were used to amplify the full length gene of alpha-lactalbumin in yak Bos grunniens (2999 bp), water buffalo Bubalus arnee bubalis (278 bp), partial sequence of this gene in red deer cervus elaphs xanthopygus (1582 bp), 5' and 3' flanking region of beta-lactoglobin gene (2167 bp and 1096 bp in length respectively), 5'-flanking region and exon VIII to exon IX of beta-casein gene (987 bp and 1096 bp in length respectively), exonIV of kappa-casein gene (780 bp). All the amplified DNA fragments were cloned and the Nt sequences were determined. Phylogenetic tree containing 20 species (or subspecies) of ruminantia suborder was constructed according to the partial sequence of kappa-casein gene exon IV (363 bp in length), which shows good monophyly of the Bovidae. And trees constructed according to other milk protein genes indicate that all the milk protein genes have good features for drawing phylogenetics tree at least among species belonging to different subfamilies.

  17. Algorithms for efficient near-perfect phylogenetic tree reconstruction in theory and practice.

    PubMed

    Sridhar, Srinath; Dhamdhere, Kedar; Blelloch, Guy; Halperin, Eran; Ravi, R; Schwartz, Russell

    2007-01-01

    We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.

  18. Hypothesis tests for phylogenetic quartets, with applications to coalescent-based species tree inference.

    PubMed

    Gaither, Jeff; Kubatko, Laura

    2016-11-01

    Numerous statistical methods have been developed to estimate evolutionary relationships among a collection of present-day species, typically represented by a phylogenetic tree, using the information contained in the DNA sequences sampled from representatives of each species. In the current era of high-throughput genome sequencing, the models underlying such methods have become increasingly sophisticated, and the resulting computations are often prohibitive. Here we consider the problem of rigorously testing the phylogenetic relationships among collections of four species under the multispecies coalescent model that accommodates both multi-locus datasets and SNP data. Our test employs a new statistic - the summed absolute differences between certain columns in flattened phylogenetic matrices - as well as a previously used statistic that measures the distance of a flattened matrix from the space of rank-10 matrices. We derive distributional results for both statistics and study the performance of the corresponding hypothesis tests using both simulated and empirical data. We discuss how these tests may be used to improve inference of phylogenetic relationships for larger samples of species under the multispecies coalescent model, a problem that has until recently been computationally intractable. PMID:27521524

  19. SICLE: a high-throughput tool for extracting evolutionary relationships from phylogenetic trees

    PubMed Central

    Wisecaver, Jennifer H.

    2016-01-01

    We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/. PMID:27635331

  20. SICLE: a high-throughput tool for extracting evolutionary relationships from phylogenetic trees

    PubMed Central

    Wisecaver, Jennifer H.

    2016-01-01

    We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/.

  1. SICLE: a high-throughput tool for extracting evolutionary relationships from phylogenetic trees.

    PubMed

    DeBlasio, Dan F; Wisecaver, Jennifer H

    2016-01-01

    We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/. PMID:27635331

  2. Mapping the Shapes of Phylogenetic Trees from Human and Zoonotic RNA Viruses

    PubMed Central

    Poon, Art F. Y.; Walker, Lorne W.; Murray, Heather; McCloskey, Rosemary M.; Harrigan, P. Richard; Liang, Richard H.

    2013-01-01

    A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the ‘kernel trick’, a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly ‘star-like’ shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses. PMID:24223766

  3. CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting.

    PubMed

    Berezikov, Eugene; Guryev, Victor; Plasterk, Ronald H A; Cuppen, Edwin

    2004-01-01

    Prediction of transcription-factor target sites in promoters remains difficult due to the short length and degeneracy of the target sequences. Although the use of orthologous sequences and phylogenetic footprinting approaches may help in the recognition of conserved and potentially functional sequences, correct alignment of the short transcription-factor binding sites can be problematic for established algorithms, especially when aligning more divergent species. Here, we report a novel phylogenetic footprinting approach, CONREAL, that uses biologically relevant information, that is, potential transcription-factor binding sites as represented by positional weight matrices, to establish anchors between orthologous sequences and to guide promoter sequence alignment. Comparison of the performance of CONREAL with the global alignment programs LAGAN and AVID using a reference data set, shows that CONREAL performs equally well for closely related species like rodents and human, and has a clear added value for aligning promoter elements of more divergent species like human and fish, as it identifies conserved transcription-factor binding sites that are not found by other methods. CONREAL is accessible via a Web interface at http://conreal.niob.knaw.nl/.

  4. Trends over time in tree and seedling phylogenetic diversity indicate regional differences in forest biodiversity change.

    PubMed

    Potter, Kevin M; Woodall, Christopher W

    2012-03-01

    Changing climate conditions may impact the short-term ability of forest tree species to regenerate in many locations. In the longer term, tree species may be unable to persist in some locations while they become established in new places. Over both time frames, forest tree biodiversity may change in unexpected ways. Using repeated inventory measurements five years apart from more than 7000 forested plots in the eastern United States, we tested three hypotheses: phylogenetic diversity is substantially different from species richness as a measure of biodiversity; forest communities have undergone recent changes in phylogenetic diversity that differ by size class, region, and seed dispersal strategy; and these patterns are consistent with expected early effects of climate change. Specifically, the magnitude of diversity change across broad regions should be greater among seedlings than in trees, should be associated with latitude and elevation, and should be greater among species with high dispersal capacity. Our analyses demonstrated that phylogenetic diversity and species richness are decoupled at small and medium scales and are imperfectly associated at large scales. This suggests that it is appropriate to apply indicators of biodiversity change based on phylogenetic diversity, which account for evolutionary relationships among species and may better represent community functional diversity. Our results also detected broadscale patterns of forest biodiversity change that are consistent with expected early effects of climate change. First, the statistically significant increase over time in seedling diversity in the South suggests that conditions there have become more favorable for the reproduction and dispersal of a wider variety of species, whereas the significant decrease in northern seedling diversity indicates that northern conditions have become less favorable. Second, we found weak correlations between seedling diversity change and latitude in both zones

  5. New approach for phylogenetic tree recovery based on genome-scale metabolic networks.

    PubMed

    Gamermann, Daniel; Montagud, Arnaud; Conejero, J Alberto; Urchueguía, Javier F; de Córdoba, Pedro Fernández

    2014-07-01

    A wide range of applications and research has been done with genome-scale metabolic models. In this work, we describe an innovative methodology for comparing metabolic networks constructed from genome-scale metabolic models and how to apply this comparison in order to infer evolutionary distances between different organisms. Our methodology allows a quantification of the metabolic differences between different species from a broad range of families and even kingdoms. This quantification is then applied in order to reconstruct phylogenetic trees for sets of various organisms.

  6. Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia).

    PubMed

    Bininda-Emonds, O R; Gittleman, J L; Purvis, A

    1999-05-01

    One way to build larger, more comprehensive phylogenies is to combine the vast amount of phylogenetic information already available. We review the two main strategies for accomplishing this (combining raw data versus combining trees), but employ a relatively new variant of the latter: supertree construction. The utility of one supertree technique, matrix representation using parsimony analysis (MRP), is demonstrated by deriving a complete phylogeny for all 271 extant species of the Carnivora from 177 literature sources. Beyond providing a 'consensus' estimate of carnivore phylogeny, the tree also indicates taxa for which the relationships remain controversial (e.g. the red panda; within canids, felids, and hyaenids) or have not been studied in any great detail (e.g. herpestids, viverrids, and intrageneric relationships in the procyonids). Times of divergence throughout the tree were also estimated from 74 literature sources based on both fossil and molecular data. We use the phylogeny to show that some lineages within the Mustelinae and Canidae contain significantly more species than expected for their age, illustrating the tree's utility for studies of macroevolution. It will also provide a useful foundation for comparative and conservational studies involving the carnivores.

  7. Bayesian Inference of the Evolution of a Phenotype Distribution on a Phylogenetic Tree

    PubMed Central

    Ansari, M. Azim; Didelot, Xavier

    2016-01-01

    The distribution of a phenotype on a phylogenetic tree is often a quantity of interest. Many phenotypes have imperfect heritability, so that a measurement of the phenotype for an individual can be thought of as a single realization from the phenotype distribution of that individual. If all individuals in a phylogeny had the same phenotype distribution, measured phenotypes would be randomly distributed on the tree leaves. This is, however, often not the case, implying that the phenotype distribution evolves over time. Here we propose a new model based on this principle of evolving phenotype distribution on the branches of a phylogeny, which is different from ancestral state reconstruction where the phenotype itself is assumed to evolve. We develop an efficient Bayesian inference method to estimate the parameters of our model and to test the evidence for changes in the phenotype distribution. We use multiple simulated data sets to show that our algorithm has good sensitivity and specificity properties. Since our method identifies branches on the tree on which the phenotype distribution has changed, it is able to break down a tree into components for which this distribution is unique and constant. We present two applications of our method, one investigating the association between HIV genetic variation and human leukocyte antigen and the other studying host range distribution in a lineage of Salmonella enterica, and we discuss many other potential applications. PMID:27412711

  8. primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies

    PubMed Central

    Contreras-Moreira, Bruno; Sachman-Ruiz, Bernardo; Figueroa-Palacios, Iraís; Vinuesa, Pablo

    2009-01-01

    Primers4clades is an easy-to-use web server that implements a fully automatic PCR primer design pipeline for cross-species amplification of novel sequences from metagenomic DNA, or from uncharacterized organisms, belonging to user-specified phylogenetic clades or taxa. The server takes a set of non-aligned protein coding genes, with or without introns, aligns them and computes a neighbor-joining tree, which is displayed on screen for easy selection of species or sequence clusters to design lineage-specific PCR primers. Primers4clades implements an extended CODEHOP primer design strategy based on both DNA and protein multiple sequence alignments. It evaluates several thermodynamic properties of the oligonucleotide pairs, and computes the phylogenetic information content of the predicted amplicon sets from Shimodaira–Hasegawa-like branch support values of maximum likelihood phylogenies. A non-redundant set of primer formulations is returned, ranked according to their thermodynamic properties. An amplicon distribution map provides a convenient overview of the coverage of the target locus. Altogether these features greatly help the user in making an informed choice between alternative primer pair formulations. Primers4clades is available at two mirror sites: http://maya.ccg.unam.mx/primers4clades/and http://floresta.eead.csic.es/primers4clades/. Three demo data sets and a comprehensive documentation/tutorial page are provided for easy testing of the server's capabilities and interface. PMID:19465390

  9. primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies.

    PubMed

    Contreras-Moreira, Bruno; Sachman-Ruiz, Bernardo; Figueroa-Palacios, Iraís; Vinuesa, Pablo

    2009-07-01

    Primers4clades is an easy-to-use web server that implements a fully automatic PCR primer design pipeline for cross-species amplification of novel sequences from metagenomic DNA, or from uncharacterized organisms, belonging to user-specified phylogenetic clades or taxa. The server takes a set of non-aligned protein coding genes, with or without introns, aligns them and computes a neighbor-joining tree, which is displayed on screen for easy selection of species or sequence clusters to design lineage-specific PCR primers. Primers4clades implements an extended CODEHOP primer design strategy based on both DNA and protein multiple sequence alignments. It evaluates several thermodynamic properties of the oligonucleotide pairs, and computes the phylogenetic information content of the predicted amplicon sets from Shimodaira-Hasegawa-like branch support values of maximum likelihood phylogenies. A non-redundant set of primer formulations is returned, ranked according to their thermodynamic properties. An amplicon distribution map provides a convenient overview of the coverage of the target locus. Altogether these features greatly help the user in making an informed choice between alternative primer pair formulations. Primers4clades is available at two mirror sites: http://maya.ccg.unam.mx/primers4clades/and http://floresta.eead.csic.es/primers4clades/. Three demo data sets and a comprehensive documentation/tutorial page are provided for easy testing of the server's capabilities and interface.

  10. Patterns of thinking about phylogenetic trees: A study of student learning and the potential of tree thinking to improve comprehension of biological concepts

    NASA Astrophysics Data System (ADS)

    Naegle, Erin

    Evolution education is a critical yet challenging component of teaching and learning biology. There is frequently an emphasis on natural selection when teaching about evolution and conducting educational research. A full understanding of evolution, however, integrates evolutionary processes, such as natural selection, with the resulting evolutionary patterns, such as species divergence. Phylogenetic trees are models of evolutionary patterns. The perspective gained from understanding biology through phylogenetic analyses is referred to as tree thinking. Due to the increasing prevalence of tree thinking in biology, understanding how to read phylogenetic trees is an important skill for students to learn. Interpreting graphics is not an intuitive process, as graphical representations are semiotic objects. This is certainly true concerning phylogenetic tree interpretation. Previous research and anecdotal evidence report that students struggle to correctly interpret trees. The objective of this research was to describe and investigate the rationale underpinning the prior knowledge of introductory biology students' tree thinking Understanding prior knowledge is valuable as prior knowledge influences future learning. In Chapter 1, qualitative methods such as semi-structured interviews were used to explore patterns of student rationale in regard to tree thinking. Seven common tree thinking misconceptions are described: (1) Equating the degree of trait similarity with the extent of relatedness, (2) Environmental change is a necessary prerequisite to evolution, (3) Essentialism of species, (4) Evolution is inherently progressive, (5) Evolution is a linear process, (6) Not all species are related, and (7) Trees portray evolution through the hybridization of species. These misconceptions are based in students' incomplete or incorrect understanding of evolution. These misconceptions are often reinforced by the misapplication of cultural conventions to make sense of trees

  11. A rank-based sequence aligner with applications in phylogenetic analysis.

    PubMed

    Dinu, Liviu P; Ionescu, Radu Tudor; Tomescu, Alexandru I

    2014-01-01

    Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.

  12. Minimizing the Average Distance to a Closest Leaf in a Phylogenetic Tree

    PubMed Central

    Matsen, Frederick A.; Gallagher, Aaron; McCoy, Connor O.

    2013-01-01

    When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized “reference sequences” to a smaller subset that is as close as possible on average to a collection of “query sequences” of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting “reference tree” sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software. [Mass transport; phylogenetic diversity; sequence selection.] PMID:23843314

  13. Construction and bootstrap analysis of DNA fingerprinting-based phylogenetic trees with the freeware program FreeTree: application to trichomonad parasites.

    PubMed

    Hampl, V; Pavlícek, A; Flegr, J

    2001-05-01

    The Win95/98/NT program FreeTree for computation of distance matrices and construction of phylogenetic or phenetic trees on the basis of random amplified polymorphic DNA (RAPD), RFLP and allozyme data is presented. In contrast to other similar software, the program FreeTree (available at http://www.natur.cuni.cz/~flegr/programs/freetree or http://ijs.sgmjournals.org/content/vol51/issue3/) can also assess the robustness of the tree topology by bootstrap, jackknife or operational taxonomic unit-jackknife analysis. Moreover, the program can be also used for the analysis of data obtained in several independent experiments performed with non-identical subsets of taxa. The function of the program was demonstrated by an analysis of RAPD data from 42 strains of 10 species of trichomonads. On the phylogenetic tree constructed using FreeTree, the high bootstrap values and short terminal branches for the Tritrichomonas foetus/suis 14-strain branch suggested relatively recent and probably clonal radiation of this species. At the same time, the relatively lower bootstrap values and long terminal branches for the Trichomonas vaginalis 20-strain branch suggested more ancient radiation of this species and the possible existence of genetic recombination (sexual reproduction) in this human pathogen. The low bootstrap values and the star-like topology of the whole Trichomonadidae tree confirm that the RAPD method is not suitable for phylogenetic analysis of protozoa at the level of higher taxa. It is proposed that the repeated bootstrap analysis should be an obligatory part of any RAPD study. It makes it possible to assess the reliability of the tree obtained and to adjust the amount of collected data (the number of random primers) to the amount of phylogenetic signals in the RAPD data of the taxon analysed. The FreeTree program makes such analysis possible. PMID:11411692

  14. Minimizing the average distance to a closest leaf in a phylogenetic tree.

    PubMed

    Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

    2013-11-01

    When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software. PMID:23843314

  15. TreSpEx—Detection of Misleading Signal in Phylogenetic Reconstructions Based on Tree Information

    PubMed Central

    Struck, Torsten H

    2014-01-01

    Phylogenies of species or genes are commonplace nowadays in many areas of comparative biological studies. However, for phylogenetic reconstructions one must refer to artificial signals such as paralogy, long-branch attraction, saturation, or conflict between different datasets. These signals might eventually mislead the reconstruction even in phylogenomic studies employing hundreds of genes. Unfortunately, there has been no program allowing the detection of such effects in combination with an implementation into automatic process pipelines. TreSpEx (Tree Space Explorer) now combines different approaches (including statistical tests), which utilize tree-based information like nodal support or patristic distances (PDs) to identify misleading signals. The program enables the parallel analysis of hundreds of trees and/or predefined gene partitions, and being command-line driven, it can be integrated into automatic process pipelines. TreSpEx is implemented in Perl and supported on Linux, Mac OS X, and MS Windows. Source code, binaries, and additional material are freely available at http://www.annelida.de/research/bioinformatics/software.html. PMID:24701118

  16. Conserving the functional and phylogenetic trees of life of European tetrapods.

    PubMed

    Thuiller, Wilfried; Maiorano, Luigi; Mazel, Florent; Guilhaumon, François; Ficetola, Gentile Francesco; Lavergne, Sébastien; Renaud, Julien; Roquet, Cristina; Mouillot, David

    2015-02-19

    Protected areas (PAs) are pivotal tools for biodiversity conservation on the Earth. Europe has had an extensive protection system since Natura 2000 areas were created in parallel with traditional parks and reserves. However, the extent to which this system covers not only taxonomic diversity but also other biodiversity facets, such as evolutionary history and functional diversity, has never been evaluated. Using high-resolution distribution data of all European tetrapods together with dated molecular phylogenies and detailed trait information, we first tested whether the existing European protection system effectively covers all species and in particular, those with the highest evolutionary or functional distinctiveness. We then tested the ability of PAs to protect the entire tetrapod phylogenetic and functional trees of life by mapping species' target achievements along the internal branches of these two trees. We found that the current system is adequately representative in terms of the evolutionary history of amphibians while it fails for the rest. However, the most functionally distinct species were better represented than they would be under random conservation efforts. These results imply better protection of the tetrapod functional tree of life, which could help to ensure long-term functioning of the ecosystem, potentially at the expense of conserving evolutionary history.

  17. Conserving the functional and phylogenetic trees of life of European tetrapods

    PubMed Central

    Thuiller, Wilfried; Maiorano, Luigi; Mazel, Florent; Guilhaumon, François; Ficetola, Gentile Francesco; Lavergne, Sébastien; Renaud, Julien; Roquet, Cristina; Mouillot, David

    2015-01-01

    Protected areas (PAs) are pivotal tools for biodiversity conservation on the Earth. Europe has had an extensive protection system since Natura 2000 areas were created in parallel with traditional parks and reserves. However, the extent to which this system covers not only taxonomic diversity but also other biodiversity facets, such as evolutionary history and functional diversity, has never been evaluated. Using high-resolution distribution data of all European tetrapods together with dated molecular phylogenies and detailed trait information, we first tested whether the existing European protection system effectively covers all species and in particular, those with the highest evolutionary or functional distinctiveness. We then tested the ability of PAs to protect the entire tetrapod phylogenetic and functional trees of life by mapping species' target achievements along the internal branches of these two trees. We found that the current system is adequately representative in terms of the evolutionary history of amphibians while it fails for the rest. However, the most functionally distinct species were better represented than they would be under random conservation efforts. These results imply better protection of the tetrapod functional tree of life, which could help to ensure long-term functioning of the ecosystem, potentially at the expense of conserving evolutionary history. PMID:25561666

  18. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    PubMed

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  19. MAVID multiple alignment server.

    PubMed

    Bray, Nicolas; Pachter, Lior

    2003-07-01

    MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID web server allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to subsequently analyse the alignments for conserved regions. MAVID has been successfully used for the alignment of closely related species such as primates and also for the alignment of more distant organisms such as human and fugu. The server is fast, capable of aligning hundreds of kilobases in less than a minute. The multiple alignment is used to build a phylogenetic tree for the sequences, which is subsequently used as a basis for identifying conserved regions in the alignment. The server can be accessed at http://baboon.math.berkeley.edu/mavid/.

  20. Fourteen nuclear genes provide phylogenetic resolution for difficult nodes in the turtle tree of life.

    PubMed

    Barley, Anthony J; Spinks, Phillip Q; Thomson, Robert C; Shaffer, H Bradley

    2010-06-01

    Advances in molecular biology have expanded our understanding of patterns of evolution and our ability to infer phylogenetic relationships. Despite many applications of molecular methods in attempts at resolving the evolutionary relationships among the major clades of turtles, some nodes in the tree have proved to be extremely problematic and have remained unresolved. In this study, we use 14 nuclear loci to provide an in depth look at several of these troublesome nodes and infer the systematic relationships among 11 of the 14 turtle families. We find strong support for two of the most problematic nodes in the deep phylogeny of turtles that have traditionally defied resolution. In particular, we recover strong support for a sister relationship between the Emydidae and the monotypic bigheaded-turtle, Platysternon megacephalum. We also find strong support for a clade consisting of sea turtles, mud and musk turtles, and snapping turtles. Within this clade, snapping turtles (Chelydridae) and mud/musk turtles (Kinosternidae) are sister taxa, again with strong support. Our results emphasize the utility of multi-locus datasets in phylogenetic analyses of difficult problems. PMID:19913628

  1. Supermatrix and species tree methods resolve phylogenetic relationships within the big cats, Panthera (Carnivora: Felidae).

    PubMed

    Davis, Brian W; Li, Gang; Murphy, William J

    2010-07-01

    The pantherine lineage of cats diverged from the remainder of modern Felidae less than 11 million years ago and consists of the five big cats of the genus Panthera, the lion, tiger, jaguar, leopard, and snow leopard, as well as the closely related clouded leopard. A significant problem exists with respect to the precise phylogeny of these highly threatened great cats. Despite multiple publications on the subject, no two molecular studies have reconstructed Panthera with the same topology. These evolutionary relationships remain unresolved partially due to the recent and rapid radiation of pantherines in the Pliocene, individual speciation events occurring within less than 1 million years, and probable introgression between lineages following their divergence. We provide an alternative, highly supported interpretation of the evolutionary history of the pantherine lineage using novel and published DNA sequence data from the autosomes, both sex chromosomes and the mitochondrial genome. New sequences were generated for 39 single-copy regions of the felid Y chromosome, as well as four mitochondrial and four autosomal gene segments, totaling 28.7 kb. Phylogenetic analysis of these new data, combined with all published data in GenBank, highlighted the prevalence of phylogenetic disparities stemming either from the amplification of a mitochondrial to nuclear translocation event (numt), or errors in species identification. Our 47.6 kb combined dataset was analyzed as a supermatrix and with respect to individual partitions using maximum likelihood and Bayesian phylogenetic inference, in conjunction with Bayesian Estimation of Species Trees (BEST) which accounts for heterogeneous gene histories. Our results yield a robust consensus topology supporting the monophyly of lion and leopard, with jaguar sister to these species, as well as a sister species relationship of tiger and snow leopard. These results highlight new avenues for the study of speciation genomics and

  2. Supermatrix and species tree methods resolve phylogenetic relationships within the big cats, Panthera (Carnivora: Felidae).

    PubMed

    Davis, Brian W; Li, Gang; Murphy, William J

    2010-07-01

    The pantherine lineage of cats diverged from the remainder of modern Felidae less than 11 million years ago and consists of the five big cats of the genus Panthera, the lion, tiger, jaguar, leopard, and snow leopard, as well as the closely related clouded leopard. A significant problem exists with respect to the precise phylogeny of these highly threatened great cats. Despite multiple publications on the subject, no two molecular studies have reconstructed Panthera with the same topology. These evolutionary relationships remain unresolved partially due to the recent and rapid radiation of pantherines in the Pliocene, individual speciation events occurring within less than 1 million years, and probable introgression between lineages following their divergence. We provide an alternative, highly supported interpretation of the evolutionary history of the pantherine lineage using novel and published DNA sequence data from the autosomes, both sex chromosomes and the mitochondrial genome. New sequences were generated for 39 single-copy regions of the felid Y chromosome, as well as four mitochondrial and four autosomal gene segments, totaling 28.7 kb. Phylogenetic analysis of these new data, combined with all published data in GenBank, highlighted the prevalence of phylogenetic disparities stemming either from the amplification of a mitochondrial to nuclear translocation event (numt), or errors in species identification. Our 47.6 kb combined dataset was analyzed as a supermatrix and with respect to individual partitions using maximum likelihood and Bayesian phylogenetic inference, in conjunction with Bayesian Estimation of Species Trees (BEST) which accounts for heterogeneous gene histories. Our results yield a robust consensus topology supporting the monophyly of lion and leopard, with jaguar sister to these species, as well as a sister species relationship of tiger and snow leopard. These results highlight new avenues for the study of speciation genomics and

  3. Bioinformatics analysis and construction of phylogenetic tree of aquaporins from Echinococcus granulosus.

    PubMed

    Wang, Fen; Ye, Bin

    2016-09-01

    Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus. PMID:27164831

  4. Wood nitrogen concentrations in tropical trees: phylogenetic patterns and ecological correlates.

    PubMed

    Martin, Adam R; Erickson, David L; Kress, W John; Thomas, Sean C

    2014-11-01

    In tropical and temperate trees, wood chemical traits are hypothesized to covary with species' life-history strategy along a 'wood economics spectrum' (WES), but evidence supporting these expected patterns remains scarce. Due to its role in nutrient storage, we hypothesize that wood nitrogen (N) concentration will covary along the WES, being higher in slow-growing species with high wood density (WD), and lower in fast-growing species with low WD. In order to test this hypothesis we quantified wood N concentrations in 59 Panamanian hardwood species, and used this dataset to examine ecological correlates and phylogenetic patterns of wood N. Wood N varied > 14-fold among species between 0.04 and 0.59%; closely related species were more similar in wood N than expected by chance. Wood N was positively correlated with WD, and negatively correlated with log-transformed relative growth rates, although these relationships were relatively weak. We found evidence for co-evolution between wood N and both WD and log-transformed mortality rates. Our study provides evidence that wood N covaries with tree life-history parameters, and that these patterns consistently co-evolve in tropical hardwoods. These results provide some support for the hypothesized WES, and suggest that wood is an increasingly important N pool through tropical forest succession.

  5. Bears in a forest of gene trees: phylogenetic inference is complicated by incomplete lineage sorting and gene flow.

    PubMed

    Kutschera, Verena E; Bidon, Tobias; Hailer, Frank; Rodi, Julia L; Fain, Steven R; Janke, Axel

    2014-08-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal.

  6. Bears in a Forest of Gene Trees: Phylogenetic Inference Is Complicated by Incomplete Lineage Sorting and Gene Flow

    PubMed Central

    Kutschera, Verena E.; Bidon, Tobias; Hailer, Frank; Rodi, Julia L.; Fain, Steven R.; Janke, Axel

    2014-01-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. PMID:24903145

  7. Bears in a forest of gene trees: phylogenetic inference is complicated by incomplete lineage sorting and gene flow.

    PubMed

    Kutschera, Verena E; Bidon, Tobias; Hailer, Frank; Rodi, Julia L; Fain, Steven R; Janke, Axel

    2014-08-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. PMID:24903145

  8. Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences.

    PubMed

    Zheng, Xiaoyan; Cai, Danying; Potter, Daniel; Postman, Joseph; Liu, Jing; Teng, Yuanwen

    2014-11-01

    Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence datasets. Phylogenetic trees based on both cpDNA and nuclear LFY2int2-N (LN) data resulted in poor resolution, especially, only five primary species were monophyletic in the LN tree. A phylogenetic network of LN suggested that reticulation caused by hybridization is one of the major evolutionary processes for Pyrus species. Polytomies of the gene trees and star-like structure of cpDNA networks suggested rapid radiation is another major evolutionary process, especially for the occidental species. Pyrus calleryana and P. regelii were the earliest diverged Pyrus species. Two North African species, P. cordata, P. spinosa and P. betulaefolia were descendent of primitive stock Pyrus species and still share some common molecular characters. Southwestern China, where a large number of P. pashia populations are found, is probably the most important diversification center of Pyrus. More accessions and nuclear genes are needed for further understanding the evolutionary histories of Pyrus.

  9. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees

    PubMed Central

    Letunic, Ivica; Bork, Peer

    2016-01-01

    Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. The current version was completely redesigned and rewritten, utilizing current web technologies for speedy and streamlined processing. Numerous new features were introduced and several new data types are now supported. Trees with up to 100,000 leaves can now be efficiently displayed. Full interactive control over precise positioning of various annotation features and an unlimited number of datasets allow the easy creation of complex tree visualizations. iTOL 3 is the first tool which supports direct visualization of the recently proposed phylogenetic placements format. Finally, iTOL's account system has been redesigned to simplify the management of trees in user-defined workspaces and projects, as it is heavily used and currently handles already more than 500,000 trees from more than 10,000 individual users. PMID:27095192

  10. Rooting the tree of life: the phylogenetic jury is still out

    PubMed Central

    Gouy, Richard; Baurain, Denis; Philippe, Hervé

    2015-01-01

    This article aims to shed light on difficulties in rooting the tree of life (ToL) and to explore the (sociological) reasons underlying the limited interest in accurately addressing this fundamental issue. First, we briefly review the difficulties plaguing phylogenetic inference and the ways to improve the modelling of the substitution process, which is highly heterogeneous, both across sites and over time. We further observe that enriched taxon samplings, better gene samplings and clever data removal strategies have led to numerous revisions of the ToL, and that these improved shallow phylogenies nearly always relocate simple organisms higher in the ToL provided that long-branch attraction artefacts are kept at bay. Then, we note that, despite the flood of genomic data available since 2000, there has been a surprisingly low interest in inferring the root of the ToL. Furthermore, the rare studies dealing with this question were almost always based on methods dating from the 1990s that have been shown to be inaccurate for much more shallow issues! This leads us to argue that the current consensus about a bacterial root for the ToL can be traced back to the prejudice of Aristotle's Great Chain of Beings, in which simple organisms are ancestors of more complex life forms. Finally, we demonstrate that even the best models cannot yet handle the complexity of the evolutionary process encountered both at shallow depth, when the outgroup is too distant, and at the level of the inter-domain relationships. Altogether, we conclude that the commonly accepted bacterial root is still unproven and that the root of the ToL should be revisited using phylogenomic supermatrices to ensure that new evidence for eukaryogenesis, such as the recently described Lokiarcheota, is interpreted in a sound phylogenetic framework. PMID:26323760

  11. Molecular phylogenetics reveal multiple tertiary vicariance origins of the African rain forest trees

    PubMed Central

    Couvreur, Thomas LP; Chatrou, Lars W; Sosef, Marc SM; Richardson, James E

    2008-01-01

    Background Tropical rain forests are the most diverse terrestrial ecosystems on the planet. How this diversity evolved remains largely unexplained. In Africa, rain forests are situated in two geographically isolated regions: the West-Central Guineo-Congolian region and the coastal and montane regions of East Africa. These regions have strong floristic affinities with each other, suggesting a former connection via an Eocene pan-African rain forest. High levels of endemism observed in both regions have been hypothesized to be the result of either 1) a single break-up followed by a long isolation or 2) multiple fragmentation and reconnection since the Oligocene. To test these hypotheses the evolutionary history of endemic taxa within a rain forest restricted African lineage of the plant family Annonaceae was studied. Molecular phylogenies and divergence dates were estimated using a Bayesian relaxed uncorrelated molecular clock assumption accounting for both calibration and phylogenetic uncertainties. Results Our results provide strong evidence that East African endemic lineages of Annonaceae have multiple origins dated to significantly different times spanning the Oligocene and Miocene epochs. Moreover, these successive origins (c. 33, 16 and 8 million years – Myr) coincide with known periods of aridification and geological activity in Africa that would have recurrently isolated the Guineo-Congolian rain forest from the East African one. All East African taxa were found to have diversified prior to Pleistocene times. Conclusion Molecular phylogenetic dating analyses of this large pan-African clade of Annonaceae unravels an interesting pattern of diversification for rain forest restricted trees co-occurring in West/Central and East African rain forests. Our results suggest that repeated reconnections between the West/Central and East African rain forest blocks allowed for biotic exchange while the break-ups induced speciation via vicariance, enhancing the levels of

  12. Phylogenetic Analysis of Local-Scale Tree Soil Associations in a Lowland Moist Tropical Forest

    PubMed Central

    Schreeg, Laura A.; Kress, W. John; Erickson, David L.; Swenson, Nathan G.

    2010-01-01

    Background Local plant-soil associations are commonly studied at the species-level, while associations at the level of nodes within a phylogeny have been less well explored. Understanding associations within a phylogenetic context, however, can improve our ability to make predictions across systems and can advance our understanding of the role of evolutionary history in structuring communities. Methodology/Principal Findings Here we quantified evolutionary signal in plant-soil associations using a DNA sequence-based community phylogeny and several soil variables (e.g., extractable phosphorus, aluminum and manganese, pH, and slope as a proxy for soil water). We used published plant distributional data from the 50-ha plot on Barro Colorado Island (BCI), Republic of Panamá. Our results suggest some groups of closely related species do share similar soil associations. Most notably, the node shared by Myrtaceae and Vochysiaceae was associated with high levels of aluminum, a potentially toxic element. The node shared by Apocynaceae was associated with high extractable phosphorus, a nutrient that could be limiting on a taxon specific level. The node shared by the large group of Laurales and Magnoliales was associated with both low extractable phosphorus and with steeper slope. Despite significant node-specific associations, this study detected little to no phylogeny-wide signal. We consider the majority of the ‘traits’ (i.e., soil variables) evaluated to fall within the category of ecological traits. We suggest that, given this category of traits, phylogeny-wide signal might not be expected while node-specific signals can still indicate phylogenetic structure with respect to the variable of interest. Conclusions Within the BCI forest dynamics plot, distributions of some plant taxa are associated with local-scale differences in soil variables when evaluated at individual nodes within the phylogenetic tree, but they are not detectable by phylogeny-wide signal. Trends

  13. Analyses of the radiation of birnaviruses from diverse host phyla and of their evolutionary affinities with other double-stranded RNA and positive strand RNA viruses using robust structure-based multiple sequence alignments and advanced phylogenetic methods

    PubMed Central

    2013-01-01

    Background Birnaviruses form a distinct family of double-stranded RNA viruses infecting animals as different as vertebrates, mollusks, insects and rotifers. With such a wide host range, they constitute a good model for studying the adaptation to the host. Additionally, several lines of evidence link birnaviruses to positive strand RNA viruses and suggest that phylogenetic analyses may provide clues about transition. Results We characterized the genome of a birnavirus from the rotifer Branchionus plicalitis. We used X-ray structures of RNA-dependent RNA polymerases and capsid proteins to obtain multiple structure alignments that allowed us to obtain reliable multiple sequence alignments and we employed “advanced” phylogenetic methods to study the evolutionary relationships between some positive strand and double-stranded RNA viruses. We showed that the rotifer birnavirus genome exhibited an organization remarkably similar to other birnaviruses. As this host was phylogenetically very distant from the other known species targeted by birnaviruses, we revisited the evolutionary pathways within the Birnaviridae family using phylogenetic reconstruction methods. We also applied a number of phylogenetic approaches based on structurally conserved domains/regions of the capsid and RNA-dependent RNA polymerase proteins to study the evolutionary relationships between birnaviruses, other double-stranded RNA viruses and positive strand RNA viruses. Conclusions We show that there is a good correlation between the phylogeny of the birnaviruses and that of their hosts at the phylum level using the RNA-dependent RNA polymerase (genomic segment B) on the one hand and a concatenation of the capsid protein, protease and ribonucleoprotein (genomic segment A) on the other hand. This correlation tends to vanish within phyla. The use of advanced phylogenetic methods and robust structure-based multiple sequence alignments allowed us to obtain a more accurate picture (in terms of

  14. Phylogenetic diversity of endophytic leaf fungus isolates from the medicinal tree Trichilia elegans (Meliaceae).

    PubMed

    Rhoden, S A; Garcia, A; Rubin Filho, C J; Azevedo, J L; Pamphile, J A

    2012-01-01

    Various types of organisms, mainly fungi and bacteria, live within vegetal organs and tissues, without causing damage to the plant. These microorganisms, which are called endophytes, can be useful for biological control and plant growth promotion; bioactive compounds from these organisms may have medical and pharmaceutical applications. Trichilia elegans (Meliaceae) is a native tree that grows abundantly in several regions of Brazil. Preparations using the leaves, seeds, bark, and roots of many species of the Meliaceae family have been widely used in traditional medicine, and some members of the Trichilia genus are used in Brazilian popular medicine. We assessed the diversity of endophytic fungi from two wild specimens of T. elegans, collected from a forest remnant, by sequencing ITS1-5.8S-ITS2 of rDNA of the isolates. The fungi were isolated and purified; 97 endophytic fungi were found; they were separated into 17 morpho-groups. Of the 97 endophytic fungi, four genera (Phomopsis, Diaporthe, Dothideomycete, and Cordyceps) with 11 morpho-groups were identified. Phomopsis was the most frequent genus among the identified endophytes. Phylogenetic analysis showed two major clades: Sordariomycetes, which includes three genera, Phomopsis, Diaporthe, and Cordyceps, and the clade Dothideomycetes, which was represented by the order Pleosporales. PMID:22782630

  15. The Hymenopteran Tree of Life: Evidence from Protein-Coding Genes and Objectively Aligned Ribosomal Data

    PubMed Central

    Klopfstein, Seraina; Vilhelmsen, Lars; Heraty, John M.; Sharkey, Michael; Ronquist, Fredrik

    2013-01-01

    Previous molecular analyses of higher hymenopteran relationships have largely been based on subjectively aligned ribosomal sequences (18S and 28S). Here, we reanalyze the 18S and 28S data (unaligned about 4.4 kb) using an objective and a semi-objective alignment approach, based on MAFFT and BAli-Phy, respectively. Furthermore, we present the first analyses of a substantial protein-coding data set (4.6 kb from one mitochondrial and four nuclear genes). Our results indicate that previous studies may have suffered from inflated support values due to subjective alignment of the ribosomal sequences, but apparently not from significant biases. The protein data provide independent confirmation of several earlier results, including the monophyly of non-xyelid hymenopterans, Pamphilioidea + Unicalcarida, Unicalcarida, Vespina, Apocrita, Proctotrupomorpha and core Proctotrupomorpha. The protein data confirm that Aculeata are nested within a paraphyletic Evaniomorpha, but cast doubt on the monophyly of Evanioidea. Combining the available morphological, ribosomal and protein-coding data, we examine the total-evidence signal as well as congruence and conflict among the three data sources. Despite an emerging consensus on many higher-level hymenopteran relationships, several problems remain unresolved or contentious, including rooting of the hymenopteran tree, relationships of the woodwasps, placement of Stephanoidea and Ceraphronoidea, and the sister group of Aculeata. PMID:23936325

  16. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  17. PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees.

    PubMed

    Whelan, Simon; de Bakker, Paul I W; Quevillon, Emmanuel; Rodriguez, Nicolas; Goldman, Nick

    2006-01-01

    PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phylogenetic methodology and the evolution of coding-DNA and protein sequences. Currently in version 17.0, PANDIT comprises 7738 families of homologous protein domains; for each family, DNA and corresponding amino acid sequence multiple alignments are available together with high quality phylogenetic tree estimates. Recent improvements include expanded methods for phylogenetic tree inference, assessment of alignment quality and a redesigned web interface, available at the URL http://www.ebi.ac.uk/goldman-srv/pandit.

  18. Variance to mean ratio, R(t), for poisson processes on phylogenetic trees.

    PubMed

    Goldman, N

    1994-09-01

    The ratio of expected variance to mean, R(t), of numbers of DNA base substitutions for contemporary sequences related by a "star" phylogeny is widely seen as a measure of the adherence of the sequences' evolution to a Poisson process with a molecular clock, as predicted by the "neutral theory" of molecular evolution under certain conditions. A number of estimators of R(t) have been proposed, all predicted to have mean 1 and distributions based on the chi 2. Various genes have previously been analyzed and found to have values of R(t) far in excess of 1, calling into question important aspects of the neutral theory. In this paper, I use Monte Carlo simulation to show that the previously suggested means and distributions of estimators of R(t) are highly inaccurate. The analysis is applied to star phylogenies and to general phylogenetic trees, and well-known gene sequences are reanalyzed. For star phylogenies the results show that Kimura's estimators ("The Neutral Theory of Molecular Evolution," Cambridge Univ. Press, Cambridge, 1983) are unsatisfactory for statistical testing of R(t), but confirm the accuracy of Bulmer's correction factor (Genetics 123: 615-619, 1989). For all three nonstar phylogenies studied, attained values of all three estimators of R(t), although larger than 1, are within their true confidence limits under simple Poisson process models. This shows that lineage effects can be responsible for high estimates of R(t), restoring some limited confidence in the molecular clock and showing that the distinction between lineage and molecular clock effects is vital.(ABSTRACT TRUNCATED AT 250 WORDS)

  19. Finding the tree of life: matching phylogenetic trees to the fossil record through the 20th century.

    PubMed Central

    Benton, M. J.

    2001-01-01

    Phylogenies, or evolutionary trees, are fundamental to biology. Systematists have laboured since the time of Darwin to discover the tree of life. Recent developments in systematics, such as cladistics and molecular sequencing, have led practitioners to believe that their phylogenies are more testable now than equivalent efforts from the 1960s or earlier. Whole trees, and nodes within trees, may be assessed for their robustness. However, these quantitative approaches cannot be used to demonstrate that one tree is more likely to be correct than another. Congruence assessments may help. Comparison of a sample of 1000 published trees with an essentially independent standard (dates of origin of groups in geological time) shows that the order of branching has improved slightly, but the disparity between estimated times of origination from phylogeny and stratigraphy has, if anything, become worse. Controlled comparisons of phylogenies of four major groups (Agnatha, Sarcopterygii, Sauria and Mammalia) do not show uniform improvement, or decline, of fit to stratigraphy through the twentieth century. Nor do morphological or molecular trees differ uniformly in their performance. PMID:11600076

  20. The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains.

    PubMed

    Yarza, Pablo; Richter, Michael; Peplies, Jörg; Euzeby, Jean; Amann, Rudolf; Schleifer, Karl-Heinz; Ludwig, Wolfgang; Glöckner, Frank Oliver; Rosselló-Móra, Ramon

    2008-09-01

    The signing authors together with the journal Systematic and Applied Microbiology (SAM) have started an ambitious project that has been conceived to provide a useful tool especially for the scientific microbial taxonomist community. The aim of what we have called "The All-Species Living Tree" is to reconstruct a single 16S rRNA tree harboring all sequenced type strains of the hitherto classified species of Archaea and Bacteria. This tree is to be regularly updated by adding the species with validly published names that appear monthly in the Validation and Notification lists of the International Journal of Systematic and Evolutionary Microbiology. For this purpose, the SAM executive editors, together with the responsible teams of the ARB, SILVA, and LPSN projects (www.arb-home.de, www.arb-silva.de, and www.bacterio.cict.fr, respectively), have prepared a 16S rRNA database containing over 6700 sequences, each of which represents a single type strain of a classified species up to 31 December 2007. The selection of sequences had to be undertaken manually due to a high error rate in the names and information fields provided for the publicly deposited entries. In addition, from among the often occurring multiple entries for a single type strain, the best-quality sequence was selected for the project. The living tree database that SAM now provides contains corrected entries and the best-quality sequences with a manually checked alignment. The tree reconstruction has been performed by using the maximum likelihood algorithm RAxML. The tree provided in the first release is a result of the calculation of a single dataset containing 9975 single entries, 6728 corresponding to type strain gene sequences, as well as 3247 additional high-fquality sequences to give robustness to the reconstruction. Trees are dynamic structures that change on the basis of the quality and availability of the data used for their calculation. Therefore, the addition of new type strain sequences in

  1. An African American Paternal Lineage Adds an Extremely Ancient Root to the Human Y Chromosome Phylogenetic Tree

    PubMed Central

    Mendez, Fernando L.; Krahn, Thomas; Schrack, Bonnie; Krahn, Astrid-Maria; Veeramah, Krishna R.; Woerner, August E.; Fomine, Forka Leypey Mathew; Bradman, Neil; Thomas, Mark G.; Karafet, Tatiana M.; Hammer, Michael F.

    2013-01-01

    We report the discovery of an African American Y chromosome that carries the ancestral state of all SNPs that defined the basal portion of the Y chromosome phylogenetic tree. We sequenced ∼240 kb of this chromosome to identify private, derived mutations on this lineage, which we named A00. We then estimated the time to the most recent common ancestor (TMRCA) for the Y tree as 338 thousand years ago (kya) (95% confidence interval = 237–581 kya). Remarkably, this exceeds current estimates of the mtDNA TMRCA, as well as those of the age of the oldest anatomically modern human fossils. The extremely ancient age combined with the rarity of the A00 lineage, which we also find at very low frequency in central Africa, point to the importance of considering more complex models for the origin of Y chromosome diversity. These models include ancient population structure and the possibility of archaic introgression of Y chromosomes into anatomically modern humans. The A00 lineage was discovered in a large database of consumer samples of African Americans and has not been identified in traditional hunter-gatherer populations from sub-Saharan Africa. This underscores how the stochastic nature of the genealogical process can affect inference from a single locus and warrants caution during the interpretation of the geographic location of divergent branches of the Y chromosome phylogenetic tree for the elucidation of human origins. PMID:23453668

  2. An African American paternal lineage adds an extremely ancient root to the human Y chromosome phylogenetic tree.

    PubMed

    Mendez, Fernando L; Krahn, Thomas; Schrack, Bonnie; Krahn, Astrid-Maria; Veeramah, Krishna R; Woerner, August E; Fomine, Forka Leypey Mathew; Bradman, Neil; Thomas, Mark G; Karafet, Tatiana M; Hammer, Michael F

    2013-03-01

    We report the discovery of an African American Y chromosome that carries the ancestral state of all SNPs that defined the basal portion of the Y chromosome phylogenetic tree. We sequenced ∼240 kb of this chromosome to identify private, derived mutations on this lineage, which we named A00. We then estimated the time to the most recent common ancestor (TMRCA) for the Y tree as 338 thousand years ago (kya) (95% confidence interval = 237-581 kya). Remarkably, this exceeds current estimates of the mtDNA TMRCA, as well as those of the age of the oldest anatomically modern human fossils. The extremely ancient age combined with the rarity of the A00 lineage, which we also find at very low frequency in central Africa, point to the importance of considering more complex models for the origin of Y chromosome diversity. These models include ancient population structure and the possibility of archaic introgression of Y chromosomes into anatomically modern humans. The A00 lineage was discovered in a large database of consumer samples of African Americans and has not been identified in traditional hunter-gatherer populations from sub-Saharan Africa. This underscores how the stochastic nature of the genealogical process can affect inference from a single locus and warrants caution during the interpretation of the geographic location of divergent branches of the Y chromosome phylogenetic tree for the elucidation of human origins.

  3. SoRT2: a tool for sorting genomes and reconstructing phylogenetic trees by reversals, generalized transpositions and translocations.

    PubMed

    Huang, Yen-Lin; Huang, Chen-Cheng; Tang, Chuan Yi; Lu, Chin Lung

    2010-07-01

    SoRT(2) is a web server that allows the user to perform genome rearrangement analysis involving reversals, generalized transpositions and translocations (including fusions and fissions), and infer phylogenetic trees of genomes being considered based on their pairwise genome rearrangement distances. It takes as input two or more linear/circular multi-chromosomal gene (or synteny block) orders in FASTA-like format. When the input is two genomes, SoRT(2) will quickly calculate their rearrangement distance, as well as a corresponding optimal scenario by highlighting the genes involved in each rearrangement operation. In the case of multiple genomes, SoRT(2) will also construct phylogenetic trees of these genomes based on a matrix of their pairwise rearrangement distances using distance-based approaches, such as neighbor-joining (NJ), unweighted pair group method with arithmetic mean (UPGMA) and Fitch-Margoliash (FM) methods. In addition, if the function of computing jackknife support values is selected, SoRT(2) will further perform the jackknife analysis to evaluate statistical reliability of the constructed NJ, UPGMA and FM trees. SoRT(2) is available online at http://bioalgorithm.life.nctu.edu.tw/SORT2/.

  4. Comprehensive phylogenetic reconstruction of relationships in Octocorallia (Cnidaria: Anthozoa) from the Atlantic ocean using mtMutS and nad2 genes tree reconstructions

    NASA Astrophysics Data System (ADS)

    Morris, K. J.; Herrera, S.; Gubili, C.; Tyler, P. A.; Rogers, A.; Hauton, C.

    2012-12-01

    Despite being an abundant group of significant ecological importance the phylogenetic relationships of the Octocorallia remain poorly understood and very much understudied. We used 1132 bp of two mitochondrial protein-coding genes, nad2 and mtMutS (previously referred to as msh1), to construct a phylogeny for 161 octocoral specimens from the Atlantic, including both Isididae and non-Isididae species. We found that four clades were supported using a concatenated alignment. Two of these (A and B) were in general agreement with the of Holaxonia-Alcyoniina and Anthomastus-Corallium clades identified by previous work. The third and fourth clades represent a split of the Calcaxonia-Pennatulacea clade resulting in a clade containing the Pennatulacea and a small number of Isididae specimens and a second clade containing the remaining Calcaxonia. When individual genes were considered nad2 largely agreed with previous work with MtMutS also producing a fourth clade corresponding to a split of Isididae species from the Calcaxonia-Pennatulacea clade. It is expected these difference are a consequence of the inclusion of Isisdae species that have undergone a gene inversion in the mtMutS gene causing their separation in the MtMutS only tree. The fourth clade in the concatenated tree is also suspected to be a result of this gene inversion, as there were very few Isidiae species included in previous work tree and thus this separation would not be clearly resolved. A~larger phylogeny including both Isididae and non Isididae species is required to further resolve these clades.

  5. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4]. PMID:27672674

  6. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  7. Phylogenetic analysis with the iPlant discovery environment.

    PubMed

    Matasci, Naim; McKay, Sheldon

    2013-06-01

    The iPlant Collaborative's Discovery Environment is a unified Web portal to many bioinformatics applications and analytical workflows, including various methods of phylogenetic analysis. This unit describes example protocols for phylogenetic analyses, starting at sequence retrieval from the GenBank sequence database, through to multiple sequence alignment inference and visualization of phylogenetic trees. Methods for extracting smaller sub-trees from very large phylogenies, and the comparative method of continuous ancestral character state reconstruction based on observed morphology of extant species related to their phylogenetic relationships, are also presented.

  8. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Thomas, Paul D

    2013-01-01

    The data and tools in PANTHER-a comprehensive, curated database of protein families, trees, subfamilies and functions available at http://pantherdb.org-have undergone continual, extensive improvement for over a decade. Here, we describe the current PANTHER process as a whole, as well as the website tools for analysis of user-uploaded data. The main goals of PANTHER remain essentially unchanged: the accurate inference (and practical application) of gene and protein function over large sequence databases, using phylogenetic trees to extrapolate from the relatively sparse experimental information from a few model organisms. Yet the focus of PANTHER has continually shifted toward more accurate and detailed representations of evolutionary events in gene family histories. The trees are now designed to represent gene family evolution, including inference of evolutionary events, such as speciation and gene duplication. Subfamilies are still curated and used to define HMMs, but gene ontology functional annotations can now be made at any node in the tree, and are designed to represent gain and loss of function by ancestral genes during evolution. Finally, PANTHER now includes stable database identifiers for inferred ancestral genes, which are used to associate inferred gene attributes with particular genes in the common ancestral genomes of extant species.

  9. Phylogenetic trait conservation in the partner choice of a group of ectomycorrhizal trees.

    PubMed

    Hayward, Jeremy; Horton, Thomas R

    2014-10-01

    Ecological interactions are frequently conserved across evolutionary time. In the case of mutualisms, these conserved interactions may play a large role in structuring mutualist communities. We hypothesized that phylogenetic trait conservation could play a key role in determining patterns of association in the ectomycorrhizal symbiosis, a globally important trophic mutualism. We used the association between members of the pantropical plant tribe Pisonieae and its fungal mutualist partners as a model system to test the prediction that Pisonieae-associating ectomycorrhizal fungi will be more closely related than expected by chance, reflecting a conserved trait. We tested this prediction using previously published and newly generated sequences in a Bayesian framework incorporating phylogenetic uncertainty. We report that phylogenetic trait conservation does exist in this association. We generated a five-marker phylogeny of members of the Pisonieae and used this phylogeny in a Bayesian relaxed molecular clock analysis. We established that the most recent common ancestors of Pisonieae species and Pisonieae-associating fungi sharing phylogenetic conservation of their patterns of ectomycorrhizal association occurred no more recently than 14.2 Ma. We therefore suggest that phylogenetic trait conservation in the Pisonieae ectomycorrhizal mutualism association represents an inherited syndrome which has existed for at least 14 Myr. PMID:25169622

  10. Determining the Position of Storks on the Phylogenetic Tree of Waterbirds by Retroposon Insertion Analysis

    PubMed Central

    Kuramoto, Tae; Nishihara, Hidenori; Watanabe, Maiko; Okada, Norihiro

    2015-01-01

    Despite many studies on avian phylogenetics in recent decades that used morphology, mitochondrial genomes, and/or nuclear genes, the phylogenetic positions of several birds (e.g., storks) remain unsettled. In addition to the aforementioned approaches, analysis of retroposon insertions, which are nearly homoplasy-free phylogenetic markers, has also been used in avian phylogenetics. However, the first step in the analysis of retroposon insertions, that is, isolation of retroposons from genomic libraries, is a costly and time-consuming procedure. Therefore, we developed a high-throughput and cost-effective protocol to collect retroposon insertion information based on next-generation sequencing technology, which we call here the STRONG (Screening of Transposons Obtained by Next Generation Sequencing) method, and applied it to 3 waterbird species, for which we identified 35,470 loci containing chicken repeat 1 retroposons (CR1). Our analysis of the presence/absence of 30 CR1 insertions demonstrated the intra- and interordinal phylogenetic relationships in the waterbird assemblage, namely 1) Loons diverged first among the waterbirds, 2) penguins (Sphenisciformes) and petrels (Procellariiformes) diverged next, and 3) among the remaining families of waterbirds traditionally classified in Ciconiiformes/Pelecaniformes, storks (Ciconiidae) diverged first. Furthermore, our genome-scale, in silico retroposon analysis based on published genome data uncovered a complex divergence history among pelican, heron, and ibis lineages, presumably involving ancient interspecies hybridization between the heron and ibis lineages. Thus, our retroposon-based waterbird phylogeny and the established phylogenetic position of storks will help to understand the evolutionary processes of aquatic adaptation and related morphological convergent evolution. PMID:26527652

  11. [Estimating genetic distance and phylogenetic tree of HPA-1-3, 5, and 15 in different populations].

    PubMed

    Feng, Ming-Liang; Huang, Hui; Shen, Tong; Zhang, Xi; Yin, Biao; Yang, Jian-Hao; Liu, Da-Zhuang

    2008-07-01

    According to the human platelet alloantigens (HPA) polymorphisms in five systems, the distributions of HPA-1 -3, 5, and 15 systems in 1 000 Chinese donors were carried out by using a polymerase chain reaction with sequence-specific primers (PCR-SSP) method. The genetic distance and phylogenetic tree between Chinese Hans and other populations were estimated by using DISPAN and PHYLIP software. As presented by the phylogenetic tree, Asian had a convergence with European first, and grouped together with African. Beninese which came from Africa was on the top of dendrogram. Indian was located between Asian and European. Brazilian was converged with other Europe populations. Oceanian Polynexiya had been shown specifically to cluster with Asia populations. These results proved the "out of Africa theory" from one side, and it also confirmed that early migration of Asian is from south to southeast, and east Asia., thus it is probable that Europeans are migrated from south to north, and west Europe. As genetic distance was estimated effectively by HPA systems, HPA systems could serve as the genetic marker in human migration and evolution research. PMID:18779125

  12. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments.

    PubMed

    Jansen, M Andrew; Franz, Nico M

    2015-01-01

    This contribution adopts the taxonomic concept annotation and alignment approach. Accordingly, and where indicated, previous and newly inferred meanings of taxonomic names are individuated according to one specific source. Articulations among these concepts and pairwise, logically consistent alignments of original and revisionary classifications are also provided, in addition to conventional nomenclatural provenance information. A phylogenetic revision of the broad-nosed weevil genera Minyomerus Horn, 1876 sec. O'Brien & Wibmer (1982), and Piscatopus Sleeper, 1960 sec. O'Brien & Wibmer (1982) (Curculionidae [non-focal]: Entiminae [non-focal]: Tanymecini [non-focal]) is presented. Prior to this study, Minyomerus sec. O'Brien & Wibmer (1982) contained seven species, whereas the monotypic Piscatopus sec. O'Brien & Wibmer (1982) was comprised solely of Piscatopus griseus Sleeper, 1960 sec. O'Brien & Wibmer (1982). We thoroughly redescribe these recognized species-level entities and furthermore describe ten species as new to science: Minyomerus bulbifrons sec. Jansen & Franz (2015) (henceforth: [JF2015]), sp. n., Minyomerus aeriballux [JF2015], sp. n., Minyomerus cracens [JF2015], sp. n., Minyomerus gravivultus [JF2015], sp. n., Minyomerus imberbus [JF2015], sp. n., Minyomerus reburrus [JF2015], sp. n., Minyomerus politus [JF2015], sp. n., Minyomerus puticulatus [JF2015], sp. n., Minyomerus rutellirostris [JF2015], sp. n., and Minyomerus trisetosus [JF2015], sp. n. A cladistic analysis using 46 morphological characters of 22 terminal taxa (5/17 outgroup/ingroup) yielded a single most-parsimonious cladogram (L = 82, CI = 65, RI = 82). The analysis strongly supports the monophyly of Minyomerus [JF2015] with eight unreversed synapomorphies, and places Piscatopus griseus sec. O'Brien & Wibmer (1982) within the genus as sister to Minyomerus rutellirostris [JF2015]. Accordingly, Piscatopus sec. Sleeper (1960), syn. n. is changed to junior synonymy of Minyomerus [JF2015], and

  13. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments

    PubMed Central

    Jansen, M. Andrew; Franz, Nico M.

    2015-01-01

    Abstract This contribution adopts the taxonomic concept annotation and alignment approach. Accordingly, and where indicated, previous and newly inferred meanings of taxonomic names are individuated according to one specific source. Articulations among these concepts and pairwise, logically consistent alignments of original and revisionary classifications are also provided, in addition to conventional nomenclatural provenance information. A phylogenetic revision of the broad-nosed weevil genera Minyomerus Horn, 1876 sec. O’Brien & Wibmer (1982), and Piscatopus Sleeper, 1960 sec. O’Brien & Wibmer (1982) (Curculionidae [non-focal]: Entiminae [non-focal]: Tanymecini [non-focal]) is presented. Prior to this study, Minyomerus sec. O’Brien & Wibmer (1982) contained seven species, whereas the monotypic Piscatopus sec. O’Brien & Wibmer (1982) was comprised solely of Piscatopus griseus Sleeper, 1960 sec. O’Brien & Wibmer (1982). We thoroughly redescribe these recognized species-level entities and furthermore describe ten species as new to science: Minyomerus bulbifrons sec. Jansen & Franz (2015) (henceforth: [JF2015]), sp. n., Minyomerus aeriballux [JF2015], sp. n., Minyomerus cracens [JF2015], sp. n., Minyomerus gravivultus [JF2015], sp. n., Minyomerus imberbus [JF2015], sp. n., Minyomerus reburrus [JF2015], sp. n., Minyomerus politus [JF2015], sp. n., Minyomerus puticulatus [JF2015], sp. n., Minyomerus rutellirostris [JF2015], sp. n., and Minyomerus trisetosus [JF2015], sp. n. A cladistic analysis using 46 morphological characters of 22 terminal taxa (5/17 outgroup/ingroup) yielded a single most-parsimonious cladogram (L = 82, CI = 65, RI = 82). The analysis strongly supports the monophyly of Minyomerus [JF2015] with eight unreversed synapomorphies, and places Piscatopus griseus sec. O’Brien & Wibmer (1982) within the genus as sister to Minyomerus rutellirostris [JF2015]. Accordingly, Piscatopus sec. Sleeper (1960), syn. n. is changed to junior synonymy of

  14. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments.

    PubMed

    Jansen, M Andrew; Franz, Nico M

    2015-01-01

    This contribution adopts the taxonomic concept annotation and alignment approach. Accordingly, and where indicated, previous and newly inferred meanings of taxonomic names are individuated according to one specific source. Articulations among these concepts and pairwise, logically consistent alignments of original and revisionary classifications are also provided, in addition to conventional nomenclatural provenance information. A phylogenetic revision of the broad-nosed weevil genera Minyomerus Horn, 1876 sec. O'Brien & Wibmer (1982), and Piscatopus Sleeper, 1960 sec. O'Brien & Wibmer (1982) (Curculionidae [non-focal]: Entiminae [non-focal]: Tanymecini [non-focal]) is presented. Prior to this study, Minyomerus sec. O'Brien & Wibmer (1982) contained seven species, whereas the monotypic Piscatopus sec. O'Brien & Wibmer (1982) was comprised solely of Piscatopus griseus Sleeper, 1960 sec. O'Brien & Wibmer (1982). We thoroughly redescribe these recognized species-level entities and furthermore describe ten species as new to science: Minyomerus bulbifrons sec. Jansen & Franz (2015) (henceforth: [JF2015]), sp. n., Minyomerus aeriballux [JF2015], sp. n., Minyomerus cracens [JF2015], sp. n., Minyomerus gravivultus [JF2015], sp. n., Minyomerus imberbus [JF2015], sp. n., Minyomerus reburrus [JF2015], sp. n., Minyomerus politus [JF2015], sp. n., Minyomerus puticulatus [JF2015], sp. n., Minyomerus rutellirostris [JF2015], sp. n., and Minyomerus trisetosus [JF2015], sp. n. A cladistic analysis using 46 morphological characters of 22 terminal taxa (5/17 outgroup/ingroup) yielded a single most-parsimonious cladogram (L = 82, CI = 65, RI = 82). The analysis strongly supports the monophyly of Minyomerus [JF2015] with eight unreversed synapomorphies, and places Piscatopus griseus sec. O'Brien & Wibmer (1982) within the genus as sister to Minyomerus rutellirostris [JF2015]. Accordingly, Piscatopus sec. Sleeper (1960), syn. n. is changed to junior synonymy of Minyomerus [JF2015], and

  15. Structural diversity of eukaryotic 18S rRNA and its impact on alignment and phylogenetic reconstruction.

    PubMed

    Xie, Qiang; Lin, Jinzhong; Qin, Yan; Zhou, Jianfu; Bu, Wenjun

    2011-02-01

    Ribosomal RNAs are important because they catalyze the synthesis of peptides and proteins. Comparative studies of the secondary structure of 18S rRNA have revealed the basic locations of its many length-conserved and length-variable regions. In recent years, many more sequences of 18S rDNA with unusual lengths have been documented in GenBank. These data make it possible to recognize the diversity of the secondary and tertiary structures of 18S rRNAs and to identify the length-conserved parts of 18S rDNAs. The longest 18S rDNA sequences of almost every known eukaryotic phylum were included in this study. We illustrated the bioinformatics-based structure to show that, the regions that are more length-variable, regions that are less length-variable, the splicing sites for introns, and the sites of A-minor interactions are mostly distributed in different parts of the 18S rRNA. Additionally, this study revealed that some length-variable regions or insertion positions could be quite close to the functional part of the 18S rRNA of Foraminifera organisms. The tertiary structure as well as the secondary structure of 18S rRNA can be more diverse than what was previously supposed. Besides revealing how this interesting gene evolves, it can help to remove ambiguity from the alignment of eukaryotic 18S rDNAs and to improve the performance of 18S rDNA in phylogenetic reconstruction. Six nucleotides shared by Archaea and Eukaryota but rarely by Bacteria are also reported here for the first time, which might further support the supposed origin of eukaryote from archaeans.

  16. A chloroplast tree for Viburnum (Adoxaceae) and its implications for phylogenetic classification and character evolution.

    PubMed

    Clement, Wendy L; Arakaki, Mónica; Sweeney, Patrick W; Edwards, Erika J; Donoghue, Michael J

    2014-06-13

    • Premise of the study: Despite recent progress, significant uncertainties remain concerning relationships among early-branching lineages within Viburnum (Adoxaceae), prohibiting a new classification and hindering studies of character evolution and the increasing use of Viburnum in addressing a wide range of ecological and evolutionary questions. We hoped to resolve these issues by sequencing whole plastid genomes for representative species and combining these with molecular data previously obtained from an expanded taxon sample.• Methods: We performed paired-end Illumina sequencing of plastid genomes of 22 Viburnum species and combined these data with a 10-gene data set to infer phylogenetic relationships for 113 species. We used the results to devise a comprehensive phylogenetic classification and to analyze the evolution of eight morphological characters that vary among early-branching lineages.• Key results: With greatly increased levels of confidence in most of the early branches, we propose a phylogenetic classification of Viburnum, providing formal phylogenetic definitions for 30 clades, including 13 with names recognized under the International Code of Nomenclature for Algae, Fungi, and Plants, eight with previously proposed informal names, and nine newly proposed names for major branches. Our parsimony reconstructions of bud structure, leaf margins, inflorescence form, ruminate endosperm, extrafloral nectaries, glandular trichomes, palisade anatomy, and pollen exine showed varying levels of homoplasy, but collectively provided morphological support for some, though not all, of the major clades.• Conclusions: Our study demonstrates the value of next-generation plastid sequencing, the ease of creating a formal phylogenetic classification, and the utility of such a system in describing patterns of character evolution. PMID:24928633

  17. Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

    PubMed Central

    Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka

    2016-01-01

    Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378296

  18. A Revised Root for the Human Y Chromosomal Phylogenetic Tree: The Origin of Patrilineal Diversity in Africa

    PubMed Central

    Cruciani, Fulvio; Trombetta, Beniamino; Massaia, Andrea; Destro-Bisol, Giovanni; Sellitto, Daniele; Scozzari, Rosaria

    2011-01-01

    To shed light on the structure of the basal backbone of the human Y chromosome phylogeny, we sequenced about 200 kb of the male-specific region of the human Y chromosome (MSY) from each of seven Y chromosomes belonging to clades A1, A2, A3, and BT. We detected 146 biallelic variant sites through this analysis. We used these variants to construct a patrilineal tree, without taking into account any previously reported information regarding the phylogenetic relationships among the seven Y chromosomes here analyzed. There are several key changes at the basal nodes as compared with the most recent reference Y chromosome tree. A different position of the root was determined, with important implications for the origin of human Y chromosome diversity. An estimate of 142 KY was obtained for the coalescence time of the revised MSY tree, which is earlier than that obtained in previous studies and easier to reconcile with plausible scenarios of modern human origin. The number of deep branchings leading to African-specific clades has doubled, further strengthening the MSY-based evidence for a modern human origin in the African continent. An analysis of 2204 African DNA samples showed that the deepest clades of the revised MSY phylogeny are currently found in central and northwest Africa, opening new perspectives on early human presence in the continent. PMID:21601174

  19. Biological pattern and transcriptomic exploration and phylogenetic analysis in the odd floral architecture tree: Helwingia willd

    PubMed Central

    2014-01-01

    Background Odd traits in few of plant species usually implicate potential biology significances in plant evolutions. The genus Helwingia Willd, a dioecious medical shrub in Aquifoliales order, has an odd floral architecture-epiphyllous inflorescence. The potential significances and possible evolutionary origin of this specie are not well understood due to poorly available data of biological and genetic studies. In addition, the advent of genomics-based technologies has widely revolutionized plant species with unknown genomic information. Results Morphological and biological pattern were detailed via anatomical and pollination analyses. An RNA sequencing based transcriptomic analysis were undertaken and a high-resolution phylogenetic analysis was conducted based on single-copy genes in more than 80 species of seed plants, including H. japonica. It is verified that a potential fusion of rachis to the leaf midvein facilitates insect pollination. RNA sequencing yielded a total of 111450 unigenes; half of them had significant similarity with proteins in the public database, and 20281 unigenes were mapped to 119 pathways. Deduced from the phylogenetic analysis based on single-copy genes, the group of Helwingia is closer with Euasterids II and rather than Euasterids, congruent with previous reports using plastid sequences. Conclusions The odd flower architecture make H. Willd adapt to insect pollination by hosting those insects larger than the flower in size via leave, which has little common character that other insect pollination plants hold. Further the present transcriptome greatly riches genomics information of Helwingia species and nucleus genes based phylogenetic analysis also greatly improve the resolution and robustness of phylogenetic reconstruction in H. japonica. PMID:24969969

  20. Extreme convergence in stick insect evolution: phylogenetic placement of the Lord Howe Island tree lobster

    PubMed Central

    Buckley, Thomas R.; Attanayake, Dilini; Bradler, Sven

    2008-01-01

    The ‘tree lobsters’ are an enigmatic group of robust, ground-dwelling stick insects (order Phasmatodea) from the subfamily Eurycanthinae, distributed in New Guinea, New Caledonia and associated islands. Its most famous member is the Lord Howe Island stick insect Dryococelus australis (Montrouzier), which was believed to have become extinct but was rediscovered in 2001 and is considered to be one of the rarest insects in the world. To resolve the evolutionary position of Dryococelus, we constructed a phylogeny from approximately 2.4 kb of mitochondrial and nuclear sequence data from representatives of all major phasmatodean lineages. Our data placed Dryococelus and the New Caledonian tree lobsters outside the New Guinean Eurycanthinae as members of an unrelated Australasian stick insect clade, the Lanceocercata. These results suggest a convergent origin of the ‘tree lobster’ body form. Our reanalysis of tree lobster characters provides additional support for our hypothesis of convergent evolution. We conclude that the phenotypic traits leading to the traditional classification are convergent adaptations to ground-living behaviour. Our molecular dating analyses indicate an ancient divergence (more than 22 Myr ago) between Dryococelus and its Australian relatives. Hence, Dryococelus represents a long-standing separate evolutionary lineage within the stick insects and must be regarded as a key taxon to protect with respect to phasmatodean diversity. PMID:19129110

  1. An Efficient Independence Sampler for Updating Branches in Bayesian Markov chain Monte Carlo Sampling of Phylogenetic Trees.

    PubMed

    Aberer, Andre J; Stamatakis, Alexandros; Ronquist, Fredrik

    2016-01-01

    Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that do not rely on old branch lengths but instead are based on approximations of the conditional posterior. Using a diverse set of empirical data sets, we show that most conditional branch posteriors can be accurately approximated via a [Formula: see text] distribution. We empirically determine the relationship between the logarithmic conditional posterior density, its derivatives, and the characteristics of the branch posterior. We use these relationships to derive an independence sampler for proposing branches with an acceptance ratio of ~90% on most data sets. This proposal samples branches between 2× and 3× more efficiently than traditional proposals with respect to the effective sample size per unit of runtime. We also compare the performance of standard topology proposals with hybrid proposals that use the new independence sampler to update those branches that are most affected by the topological change. Our results show that hybrid proposals can sometimes noticeably decrease the number of generations necessary for topological convergence. Inconsistent performance gains indicate that branch updates are not the limiting factor in improving topological convergence for the currently employed set of proposals. However, our independence sampler might be essential for the construction of novel tree proposals that apply more radical topology changes. PMID:26231183

  2. [Phylogeny of genus Spermophilus and position of Alashan ground squirrel (Spermophilus alashanicus, Buchner, 1888) on phylogenetic tree of Paleartic short-tailed ground squirrels].

    PubMed

    Kapustina, S Yu; Brandler, O V; Adiya, Ya

    2015-01-01

    Phylogenetic relationships within a group of Paleartic short tailed ground squirrels (Spermophilus), recently defined as genus, are not sufficiently clear and need a critical revision. Interspecies hybridization, found in Eurasian Spermophilus, can affect the results of reconstruction of molecular phylogeny. Alashan ground squirrel position on the phylogenetic tree needs clarification. We analyzed eight nucleotide sequences of cytb gene of S. alashanicus and 127 sequences of other Spermophilus species form GenBank. S.alashanicus and S. dauricus close phylogenetic relationship, and their affinity to ancestral forms of the group are revealed. Monophyly of Colobotis subgenus was confirmed. Paraphyly of eastern and western forms of S. relictus was shown.

  3. PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees.

    PubMed

    Ribeiro-Gonçalves, Bruno; Francisco, Alexandre P; Vaz, Cátia; Ramirez, Mário; Carriço, João André

    2016-07-01

    High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software. PHYLOViZ Online is freely available at https://online.phyloviz.net. PMID:27131357

  4. PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees

    PubMed Central

    Ribeiro-Gonçalves, Bruno; Francisco, Alexandre P.; Vaz, Cátia; Ramirez, Mário; Carriço, João André

    2016-01-01

    High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software. PHYLOViZ Online is freely available at https://online.phyloviz.net. PMID:27131357

  5. The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction

    NASA Technical Reports Server (NTRS)

    Weisburg, W. G.; Giovannoni, S. J.; Woese, C. R.

    1989-01-01

    Through comparative analysis of 16S ribosomal RNA sequences, it can be shown that two seemingly dissimilar types of eubacteria Deinococcus and the ubiquitous hot spring organism Thermus are distantly but specifically related to one another. This confirms an earlier report based upon 16S rRNA oligonucleotide cataloging studies (Hensel et al., 1986). Their two lineages form a distinctive grouping within the eubacteria that deserved the taxonomic status of a phylum. The (partial) sequence of T. aquaticus rRNA appears relatively close to those of other thermophilic eubacteria. e.g. Thermotoga maritima and Thermomicrobium roseum. However, this closeness does not reflect a true evolutionary closeness; rather it is due to a "thermophilic convergence", the result of unusually high G+C composition in the rRNAs of thermophilic bacteria. Unless such compositional biases are taken into account, the branching order and root of phylogenetic trees can be incorrectly inferred.

  6. A metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity.

    PubMed

    Magallón, Susana; Gómez-Acevedo, Sandra; Sánchez-Reyes, Luna L; Hernández-Hernández, Tania

    2015-07-01

    The establishment of modern terrestrial life is indissociable from angiosperm evolution. While available molecular clock estimates of angiosperm age range from the Paleozoic to the Late Cretaceous, the fossil record is consistent with angiosperm diversification in the Early Cretaceous. The time-frame of angiosperm evolution is here estimated using a sample representing 87% of families and sequences of five plastid and nuclear markers, implementing penalized likelihood and Bayesian relaxed clocks. A literature-based review of the palaeontological record yielded calibrations for 137 phylogenetic nodes. The angiosperm crown age was bound within a confidence interval calculated with a method that considers the fossil record of the group. An Early Cretaceous crown angiosperm age was estimated with high confidence. Magnoliidae, Monocotyledoneae and Eudicotyledoneae diversified synchronously 135-130 million yr ago (Ma); Pentapetalae is 126-121 Ma; and Rosidae (123-115 Ma) preceded Asteridae (119-110 Ma). Family stem ages are continuously distributed between c. 140 and 20 Ma. This time-frame documents an early phylogenetic proliferation that led to the establishment of major angiosperm lineages, and the origin of over half of extant families, in the Cretaceous. While substantial amounts of angiosperm morphological and functional diversity have deep evolutionary roots, extant species richness was probably acquired later. PMID:25615647

  7. Internal Transcribed Spacer rRNA Gene-Based Phylogenetic Reconstruction Using Algorithms with Local and Global Sequence Alignment for Black Yeasts and Their Relatives

    PubMed Central

    Caligiorne, R. B.; Licinio, P.; Dupont, J.; de Hoog, G. S.

    2005-01-01

    Sequences of rRNA gene internal transcribed spacer (ITS) of a standard set of black yeast-like fungal pathogens were compared using two methods: local and global alignments. The latter is based on DNA-walk divergence analysis. This method has become recently available as an algorithm (DNAWD program) which converts sequences into three-dimensional walks. The walks are compared with, or fit to, each other generating global alignments. The DNA-walk geometry defines a proper metric used to create a distance matrix appropriated for phylogenetic reconstruction. In this work, the analyses were carried out for species currently classified in Capronia, Cladophialophora, Exophiala, Fonsecaea, Phialophora, and Ramichloridium. Main groups were verified by small-subunit rRNA gene data. DNAWD applied to ITS2 alone enabled species recognition as well as phylogenetic reconstruction reflecting clades discriminated in small-subunit rRNA gene phylogeny, which was not possible with any other algorithm using local alignment for the same data set. It is concluded that DNAWD provides rapid insight into broader relationships between groups using genes that otherwise would be hardly usable for this purpose. PMID:15956403

  8. Molecular Dissection of the Basal Clades in the Human Y Chromosome Phylogenetic Tree

    PubMed Central

    Scozzari, Rosaria; Massaia, Andrea; D’Atanasio, Eugenia; Myres, Natalie M.; Perego, Ugo A.; Trombetta, Beniamino; Cruciani, Fulvio

    2012-01-01

    One hundred and forty-six previously detected mutations were more precisely positioned in the human Y chromosome phylogeny by the analysis of 51 representative Y chromosome haplogroups and the use of 59 mutations from literature. Twenty-two new mutations were also described and incorporated in the revised phylogeny. This analysis made it possible to identify new haplogroups and to resolve a deep trifurcation within haplogroup B2. Our data provide a highly resolved branching in the African-specific portion of the Y tree and support the hypothesis of an origin in the north-western quadrant of the African continent for the human MSY diversity. PMID:23145109

  9. Molecular dissection of the basal clades in the human Y chromosome phylogenetic tree.

    PubMed

    Scozzari, Rosaria; Massaia, Andrea; D'Atanasio, Eugenia; Myres, Natalie M; Perego, Ugo A; Trombetta, Beniamino; Cruciani, Fulvio

    2012-01-01

    One hundred and forty-six previously detected mutations were more precisely positioned in the human Y chromosome phylogeny by the analysis of 51 representative Y chromosome haplogroups and the use of 59 mutations from literature. Twenty-two new mutations were also described and incorporated in the revised phylogeny. This analysis made it possible to identify new haplogroups and to resolve a deep trifurcation within haplogroup B2. Our data provide a highly resolved branching in the African-specific portion of the Y tree and support the hypothesis of an origin in the north-western quadrant of the African continent for the human MSY diversity.

  10. Phylogenetic assemblage structure of North American trees is more strongly shaped by glacial-interglacial climate variability in gymnosperms than in angiosperms.

    PubMed

    Ma, Ziyu; Sandel, Brody; Svenning, Jens-Christian

    2016-05-01

    How fast does biodiversity respond to climate change? The relationship of past and current climate with phylogenetic assemblage structure helps us to understand this question. Studies of angiosperm tree diversity in North America have already suggested effects of current water-energy balance and tropical niche conservatism. However, the role of glacial-interglacial climate variability remains to be determined, and little is known about any of these relationships for gymnosperms. Moreover, phylogenetic endemism, the concentration of unique lineages in restricted ranges, may also be related to glacial-interglacial climate variability and needs more attention. We used a refined phylogeny of both angiosperms and gymnosperms to map phylogenetic diversity, clustering and endemism of North American trees in 100-km grid cells, and climate change velocity since Last Glacial Maximum together with postglacial accessibility to recolonization to quantify glacial-interglacial climate variability. We found: (1) Current climate is the dominant factor explaining the overall patterns, with more clustered angiosperm assemblages toward lower temperature, consistent with tropical niche conservatism. (2) Long-term climate stability is associated with higher angiosperm endemism, while higher postglacial accessibility is linked to to more phylogenetic clustering and endemism in gymnosperms. (3) Factors linked to glacial-interglacial climate change have stronger effects on gymnosperms than on angiosperms. These results suggest that paleoclimate legacies supplement current climate in shaping phylogenetic patterns in North American trees, and especially so for gymnosperms.

  11. Sampling strategies for improving tree accuracy and phylogenetic analyses: a case study in ciliate protists, with notes on the genus Paramecium.

    PubMed

    Yi, Zhenzhen; Strüder-Kypke, Michaela; Hu, Xiaozhong; Lin, Xiaofeng; Song, Weibo

    2014-02-01

    In order to assess how dataset-selection for multi-gene analyses affects the accuracy of inferred phylogenetic trees in ciliates, we chose five genes and the genus Paramecium, one of the most widely used model protist genera, and compared tree topologies of the single- and multi-gene analyses. Our empirical study shows that: (1) Using multiple genes improves phylogenetic accuracy, even when their one-gene topologies are in conflict with each other. (2) The impact of missing data on phylogenetic accuracy is ambiguous: resolution power and topological similarity, but not number of represented taxa, are the most important criteria of a dataset for inclusion in concatenated analyses. (3) As an example, we tested the three classification models of the genus Paramecium with a multi-gene based approach, and only the monophyly of the subgenus Paramecium is supported.

  12. Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus).

    PubMed

    Moody, Michael L; Rieseberg, Loren H

    2012-07-01

    The annual sunflowers (Helianthus sect. Helianthus) present a formidable challenge for phylogenetic inference because of ancient hybrid speciation, recent introgression, and suspected issues with deep coalescence. Here we analyze sequence data from 11 nuclear DNA (nDNA) genes for multiple genotypes of species within the section to (1) reconstruct the phylogeny of this group, (2) explore the utility of nDNA gene trees for detecting hybrid speciation and introgression; and (3) test an empirical method of hybrid identification based on the phylogenetic congruence of nDNA gene trees from tightly linked genes. We uncovered considerable topological heterogeneity among gene trees with or without three previously identified hybrid species included in the analyses, as well as a general lack of reciprocal monophyly of species. Nonetheless, partitioned Bayesian analyses provided strong support for the reciprocal monophyly of all species except H. annuus (0.89 PP), the most widespread and abundant annual sunflower. Previous hypotheses of relationships among taxa were generally strongly supported (1.0 PP), except among taxa typically associated with H. annuus, apparently due to the paraphyly of the latter in all gene trees. While the individual nDNA gene trees provided a useful means for detecting recent hybridization, identification of ancient hybridization was problematic for all ancient hybrid species, even when linkage was considered. We discuss biological factors that affect the efficacy of phylogenetic methods for hybrid identification.

  13. A cholinesterase genes server (ESTHER): a database of cholinesterase-related sequences for multiple alignments, phylogenetic relationships, mutations and structural data retrieval.

    PubMed Central

    Cousin, X; Hotelier, T; Liévin, P; Toutant, J P; Chatonnet, A

    1996-01-01

    We have built a database of sequences phylogenetically related to cholinesterases (ESTHER) for esterases, alpha/beta hydrolase enzymes and relatives). These sequences define a homogeneous group of enzymes (carboxylesterases, lipases and hormone-sensitive lipases) with some related proteins devoid of enzymatic activity. The purpose of ESTHER is to help comparison and alignment of any new sequence appearing in the field, to favour mutation analysis of structure-function relationships and to allow structural data recovery. ESTHER is a World Wide Web server with the URL http://www.montpellier.inra.fr:70/cholinesterase. PMID:8594562

  14. A cholinesterase genes server (ESTHER): a database of cholinesterase-related sequences for multiple alignments, phylogenetic relationships, mutations and structural data retrieval.

    PubMed

    Cousin, X; Hotelier, T; Liévin, P; Toutant, J P; Chatonnet, A

    1996-01-01

    We have built a database of sequences phylogenetically related to cholinesterases (ESTHER) for esterases, alpha/beta hydrolase enzymes and relatives). These sequences define a homogeneous group of enzymes (carboxylesterases, lipases and hormone-sensitive lipases) with some related proteins devoid of enzymatic activity. The purpose of ESTHER is to help comparison and alignment of any new sequence appearing in the field, to favour mutation analysis of structure-function relationships and to allow structural data recovery. ESTHER is a World Wide Web server with the URL http://www.montpellier.inra.fr:70/cholinesterase.

  15. Ultrafast approximation for phylogenetic bootstrap.

    PubMed

    Minh, Bui Quang; Nguyen, Minh Anh Thi; von Haeseler, Arndt

    2013-05-01

    Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference.

  16. Predicting MicroRNA Biomarkers for Cancer Using Phylogenetic Tree and Microarray Analysis.

    PubMed

    Wang, Hsiuying

    2016-01-01

    MicroRNAs (miRNAs) are shown to be involved in the initiation and progression of cancers in the literature, and the expression of miRNAs is used as an important cancer prognostic tool. The aim of this study is to predict high-confidence miRNA biomarkers for cancer. We adopt a method that combines miRNA phylogenetic structure and miRNA microarray data analysis to discover high-confidence miRNA biomarkers for colon, prostate, pancreatic, lung, breast, bladder and kidney cancers. There are 53 miRNAs selected through this method that either have potential to involve a single cancer's development or to involve several cancers' development. These miRNAs can be used as high-confidence miRNA biomarkers of these seven investigated cancers for further experiment validation. miR-17, miR-20, miR-106a, miR-106b, miR-92, miR-25, miR-16, miR-195 and miR-143 are selected to involve a single cancer's development in these seven cancers. They have the potential to be useful miRNA biomarkers when the result can be confirmed by experiments. PMID:27213352

  17. Predicting MicroRNA Biomarkers for Cancer Using Phylogenetic Tree and Microarray Analysis

    PubMed Central

    Wang, Hsiuying

    2016-01-01

    MicroRNAs (miRNAs) are shown to be involved in the initiation and progression of cancers in the literature, and the expression of miRNAs is used as an important cancer prognostic tool. The aim of this study is to predict high-confidence miRNA biomarkers for cancer. We adopt a method that combines miRNA phylogenetic structure and miRNA microarray data analysis to discover high-confidence miRNA biomarkers for colon, prostate, pancreatic, lung, breast, bladder and kidney cancers. There are 53 miRNAs selected through this method that either have potential to involve a single cancer’s development or to involve several cancers’ development. These miRNAs can be used as high-confidence miRNA biomarkers of these seven investigated cancers for further experiment validation. miR-17, miR-20, miR-106a, miR-106b, miR-92, miR-25, miR-16, miR-195 and miR-143 are selected to involve a single cancer’s development in these seven cancers. They have the potential to be useful miRNA biomarkers when the result can be confirmed by experiments. PMID:27213352

  18. Toward resolving the eukaryotic tree: the phylogenetic positions of jakobids and cercozoans.

    PubMed

    Rodríguez-Ezpeleta, Naiara; Brinkmann, Henner; Burger, Gertraud; Roger, Andrew J; Gray, Michael W; Philippe, Hervé; Lang, B Franz

    2007-08-21

    Resolving the global phylogeny of eukaryotes has proven to be challenging. Among the eukaryotic groups of uncertain phylogenetic position are jakobids, a group of bacterivorous flagellates that possess the most bacteria-like mitochondrial genomes known. Jakobids share several ultrastructural features with malawimonads and an assemblage of anaerobic protists (e.g., diplomonads and oxymonads). These lineages together with Euglenozoa and Heterolobosea have collectively been designated "excavates". However, published molecular phylogenies based on the sequences of nuclear rRNAs and up to six nucleus-encoded proteins do not provide convincing support for the monophyly of excavates, nor do they uncover their relationship to other major eukaryotic groups. Here, we report the first large-scale eukaryotic phylogeny, inferred from 143 nucleus-encoded proteins comprising 31,604 amino acid positions, that includes jakobids, malawimonads and cercozoans. We obtain compelling support for the monophyly of jakobids, Euglenozoa plus Heterolobosea (JEH group), and for the association of cercozoans with stramenopiles plus alveolates. Furthermore, we observe a sister-group relationship between the JEH group and malawimonads after removing fast-evolving species from the dataset. We discuss the implications of these results for the concept of "excavates" and for the elucidation of eukaryotic phylogeny in general.

  19. Optimal Network Alignment with Graphlet Degree Vectors

    PubMed Central

    Milenković, Tijana; Ng, Weng Leong; Hayes, Wayne; Pržulj, Nataša

    2010-01-01

    Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology. PMID:20628593

  20. Probabilistic phylogenetic inference with insertions and deletions.

    PubMed

    Rivas, Elena; Eddy, Sean R

    2008-01-01

    A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm. PMID:18787703

  1. Amino acid sequence of myoglobin from the chiton Liolophura japonica and a phylogenetic tree for molluscan globins.

    PubMed

    Suzuki, T; Furukohri, T; Okamoto, S

    1993-02-01

    Myoglobin was isolated from the radular muscle of the chiton Liolophura japonica, a primitive archigastropodic mollusc. Liolophura contains three monomeric myoglobins (I, II, and III), and the complete amino acid sequence of myoglobin I has been determined. It is composed of 145 amino acid residues, and the molecular mass was calculated to be 16,070 D. The E7 distal histidine, which is replaced by valine or glutamine in several molluscan globins, is conserved in Liolophura myoglobin. The autoxidation rate at physiological conditions indicated that Liolophura oxymyoglobin is fairly stable when compared with other molluscan myoglobins. The amino acid sequence of Liolophura myoglobin shows low homology (11-21%) with molluscan dimeric myoglobins and hemoglobins, but shows higher homology (26-29%) with monomeric myoglobins from the gastropodic molluscs Aplysia, Dolabella, and Bursatella. A phylogenetic tree was constructed from 19 molluscan globin sequences. The tree separated them into two distinct clusters, a cluster for muscle myoglobins and a cluster for erythrocyte or gill hemoglobins. The myoglobin cluster is divided further into two subclusters, corresponding to monomeric and dimeric myoglobins, respectively. Liolophura myoglobin was placed on the branch of monomeric myoglobin lineage, showing that it diverged earlier from other monomeric myoglobins. The hemoglobin cluster is also divided into two subclusters. One cluster contains homodimeric, heterodimeric, tetrameric, and didomain chains of erythrocyte hemoglobins of the blood clams Anadara, Scapharca, and Barbatia. Of special interest is the other subcluster. It consists of three hemoglobin chains derived from the bacterial symbiontharboring clams Calyptogena and Lucina, in which hemoglobins are supposed to play an important role in maintaining the symbiosis with sulfide bacteria.

  2. Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools

    PubMed Central

    Huang, Weichun; Nevins, Joseph R; Ohler, Uwe

    2007-01-01

    Background The phenomenon of functional site turnover has important implications for the study of regulatory region evolution, such as for promoter sequence alignments and transcription factor binding site (TFBS) identification. At present, it remains difficult to estimate TFBS turnover rates on real genomic sequences, as reliable mappings of functional sites across related species are often not available. As an alternative, we introduce a flexible new simulation system, Phylogenetic Simulation of Promoter Evolution (PSPE), designed to study functional site turnovers in regulatory sequences. Results Using PSPE, we study replacement turnover rates of different individual TFBSs and simple modules of two sites under neutral evolutionary functional constraints. We find that TFBS replacement turnover can happen rapidly in promoters, and turnover rates vary significantly among different TFBSs and modules. We assess the influence of different constraints such as insertion/deletion rate and translocation distances. Complementing the simulations, we give simple but effective mathematical models for TFBS turnover rate prediction. As one important application of PSPE, we also present a first systematic evaluation of multiple sequence aligners regarding their capability of detecting TFBSs in promoters with site turnovers. Conclusion PSPE allows researchers for the first time to investigate TFBS replacement turnovers in promoters systematically. The assessment of alignment tools points out the limitations of current approaches to identify TFBSs in non-coding sequences, where turnover events of functional sites may happen frequently, and where we are interested in assessing the similarity on the functional level. PSPE is freely available at the authors' website. PMID:17956628

  3. GeneTrees: a phylogenomics resource for prokaryotes.

    PubMed

    Tian, Yuying; Dickerman, Allan W

    2007-01-01

    The GeneTrees phylogenomics system pursues comparative genomic analyses from the perspective of gene phylogenies for individual genes. The GeneTrees project has the goal of providing detailed evolutionary models for all protein-coding gene components of the fully sequenced genomes. Currently, a database of alignments and trees for all protein sequences for 325 fully sequenced and annotated prokaryote genomes is available. The prokaryote database contains 890,000 protein sequences organized into over 100,000 alignments, each described by a phylogenetic tree. An original homology group discovery tool assembles sets of related proteins from all versus all pairwise alignments. Multiple alignments for each homology group are stored and subjected to phylogenetic tree inference. A graphical web interface provides visual exploration of the GeneTrees database. Homology groups can be queried by sequence identifiers or annotation terms. Genomes can be browsed visually on a gene map of each chromosome or plasmid. Phylogenetic trees with support values are displayed in conjunction with the associated sequence alignment. A variety of classes of information can be selected to label the tree tips to aid in visual evaluation of annotation and gene function. This web interface is available at http://genetrees.vbi.vt.edu.

  4. Phylogenetic analysis of otospiralin protein

    PubMed Central

    Torktaz, Ibrahim; Behjati, Mohaddeseh; Rostami, Amin

    2016-01-01

    Background: Fibrocyte-specific protein, otospiralin, is a small protein, widely expressed in the central nervous system as neuronal cell bodies and glia. The increased expression of otospiralin in reactive astrocytes implicates its role in signaling pathways and reparative mechanisms subsequent to injury. Indeed, otospiralin is considered to be essential for the survival of fibrocytes of the mesenchymal nonsensory regions of the cochlea. It seems that other functions of this protein are not yet completely understood. Materials and Methods: Amino acid sequences of otospiralin from 12 vertebrates were derived from National Center for Biotechnology Information database. Phylogenetic analysis and phylogeny estimation were performed using MEGA 5.0.5 program, and neighbor-joining tree was constructed by this software. Results: In this computational study, the phylogenetic tree of otospiralin has been investigated. Therefore, dendrograms of otospiralin were depicted. Alignment performed in MUSCLE method by UPGMB algorithm. Also, entropy plot determined for a better illustration of amino acid variations in this protein. Conclusion: In the present study, we used otospiralin sequence of 12 different species and by constructing phylogenetic tree, we suggested out group for some related species. PMID:27099854

  5. POWER: PhylOgenetic WEb Repeater--an integrated and user-optimized framework for biomolecular phylogenetic analysis.

    PubMed

    Lin, Chung-Yen; Lin, Fan-Kai; Lin, Chieh Hua; Lai, Li-Wei; Hsu, Hsiu-Jun; Chen, Shu-Hwa; Hsiung, Chao A

    2005-07-01

    POWER, the PhylOgenetic WEb Repeater, is a web-based service designed to perform user-friendly pipeline phylogenetic analysis. POWER uses an open-source LAMP structure and infers genetic distances and phylogenetic relationships using well-established algorithms (ClustalW and PHYLIP). POWER incorporates a novel tree builder based on the GD library to generate a high-quality tree topology according to the calculated result. POWER accepts either raw sequences in FASTA format or user-uploaded alignment output files. Through a user-friendly web interface, users can sketch a tree effortlessly in multiple steps. After a tree has been generated, users can freely set and modify parameters, select tree building algorithms, refine sequence alignments or edit the tree topology. All the information related to input sequences and the processing history is logged and downloadable for the user's reference. Furthermore, iterative tree construction can be performed by adding sequences to, or removing them from, a previously submitted job. POWER is accessible at http://power.nhri.org.tw.

  6. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  7. Were the first springtails semi-aquatic? A phylogenetic approach by means of 28S rDNA and optimization alignment.

    PubMed Central

    D'Haese, Cyrille A

    2002-01-01

    Emergence from an aquatic environment to the land is one of the major evolutionary transitions within the arthropods. It is often considered that the first hexapods, and in particular the first springtails, were semi-aquatic and this assumption drives evolutionary models towards particular conclusions. To address the question of the ecological origin of the springtails, phylogenetic analyses by optimization alignment were performed on D1 and D2 regions of the 28S rDNA for 55 collembolan exemplars and eight outgroups. Relationships among the orders Symphypleona, Entomobryomorpha and Poduromorpha are inferred. More specifically, a robust hypothesis is provided for the subfamilial relationships within the order Poduromorpha. Contrary to previous statements, the semi-aquatic species Podura aquatica is not basal or 'primitive', but well nested in the Poduromorpha. The analyses performed for the 24 different weighting schemes yielded the same conclusion: semi-aquatic ecology is not ancestral for the springtails. It is a derived condition that evolved independently several times. The adaptation for semi-aquatic life is better interpreted as a step towards independence from land, rather than indication of an aquatic origin. PMID:12061958

  8. On comparing two structured RNA multiple alignments.

    PubMed

    Patel, Vandanaben; Wang, Jason T L; Setia, Shefali; Verma, Anurag; Warden, Charles D; Zhang, Kaizhong

    2010-12-01

    We present a method, called BlockMatch, for aligning two blocks, where a block is an RNA multiple sequence alignment with the consensus secondary structure of the alignment in Stockholm format. The method employs a quadratic-time dynamic programming algorithm for aligning columns and column pairs of the multiple alignments in the blocks. Unlike many other tools that can perform pairwise alignment of either single sequences or structures only, BlockMatch takes into account the characteristics of all the sequences in the blocks along with their consensus structures during the alignment process, thus being able to achieve a high-quality alignment result. We apply BlockMatch to phylogeny reconstruction on a set of 5S rRNA sequences taken from fifteen bacteria species. Experimental results showed that the phylogenetic tree generated by our method is more accurate than the tree constructed based on the widely used ClustalW tool. The BlockMatch algorithm is implemented into a web server, accessible at http://bioinformatics.njit.edu/blockmatch. A jar file of the program is also available for download from the web server. PMID:21121021

  9. On comparing two structured RNA multiple alignments.

    PubMed

    Patel, Vandanaben; Wang, Jason T L; Setia, Shefali; Verma, Anurag; Warden, Charles D; Zhang, Kaizhong

    2010-12-01

    We present a method, called BlockMatch, for aligning two blocks, where a block is an RNA multiple sequence alignment with the consensus secondary structure of the alignment in Stockholm format. The method employs a quadratic-time dynamic programming algorithm for aligning columns and column pairs of the multiple alignments in the blocks. Unlike many other tools that can perform pairwise alignment of either single sequences or structures only, BlockMatch takes into account the characteristics of all the sequences in the blocks along with their consensus structures during the alignment process, thus being able to achieve a high-quality alignment result. We apply BlockMatch to phylogeny reconstruction on a set of 5S rRNA sequences taken from fifteen bacteria species. Experimental results showed that the phylogenetic tree generated by our method is more accurate than the tree constructed based on the widely used ClustalW tool. The BlockMatch algorithm is implemented into a web server, accessible at http://bioinformatics.njit.edu/blockmatch. A jar file of the program is also available for download from the web server.

  10. TreeParser-Aided Klee Diagrams Display Taxonomic Clusters in DNA Barcode and Nuclear Gene Datasets

    PubMed Central

    Stoeckle, Mark Y.; Coffran, Cameron

    2013-01-01

    Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called a Klee diagram, with potential insight into clustering patterns in evolution. However, so far this approach has examined only mitochondrial cytochrome c oxidase I (COI) DNA barcode sequences. To further explore, we developed TreeParser, a freely-available web-based program that sorts a sequence alignment according to a phylogenetic tree generated from the dataset. We applied TreeParser to nuclear gene and COI barcode alignments from birds and butterflies. Distinct blocks in the resulting Klee diagrams corresponded to species and higher-level taxonomic divisions in both groups, and this enabled graphic comparison of phylogenetic information in nuclear and mitochondrial genes. Our results demonstrate TreeParser-aided Klee diagrams objectively display taxonomic clusters in nucleotide sequence alignments. This approach may help establish taxonomy in poorly studied groups and investigate higher-level clustering which appears widespread but not well understood. PMID:24022383

  11. TreeParser-aided Klee diagrams display taxonomic clusters in DNA barcode and nuclear gene datasets.

    PubMed

    Stoeckle, Mark Y; Coffran, Cameron

    2013-01-01

    Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called a Klee diagram, with potential insight into clustering patterns in evolution. However, so far this approach has examined only mitochondrial cytochrome c oxidase I (COI) DNA barcode sequences. To further explore, we developed TreeParser, a freely-available web-based program that sorts a sequence alignment according to a phylogenetic tree generated from the dataset. We applied TreeParser to nuclear gene and COI barcode alignments from birds and butterflies. Distinct blocks in the resulting Klee diagrams corresponded to species and higher-level taxonomic divisions in both groups, and this enabled graphic comparison of phylogenetic information in nuclear and mitochondrial genes. Our results demonstrate TreeParser-aided Klee diagrams objectively display taxonomic clusters in nucleotide sequence alignments. This approach may help establish taxonomy in poorly studied groups and investigate higher-level clustering which appears widespread but not well understood.

  12. Open Reading Frame Phylogenetic Analysis on the Cloud

    PubMed Central

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  13. Open reading frame phylogenetic analysis on the cloud.

    PubMed

    Hung, Che-Lun; Lin, Chun-Yuan

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  14. Explaining forest productivity using tree functional traits and phylogenetic information: two sides of the same coin over evolutionary scale?

    PubMed Central

    Paquette, Alain; Joly, Simon; Messier, Christian

    2015-01-01

    Given evidences that diverse ecosystems provide more services than depauperate ones, much attention has now turned toward finding meaningful and operational diversity indices. We ask two questions: (1) Does phylogenetic diversity contain additional information not explained by functional traits? And (2) What are the strength and nature of the correlation between phylogeny and functional traits according to the evolutionary scale considered? We used data from permanent forest plots of northeastern Canada for which these links have been demonstrated and important functional traits identified. We show that the nature of the relationship between traits and phylogeny varies dramatically among traits, but also according to the evolutionary distance considered. The demonstration that different characters show phylogenetic autocorrelation at different evolutionary depths suggests that phylogenetic content of traits may be too crude to determine whether phylogenies contain relevant information. However, our study provides support for the use of phylogenies to assess ecosystem functioning when key functional traits are unavailable. We also highlight a potentially important contribution of phylogenetics for conservation and the study of the impact of biodiversity loss on ecosystem functioning and the provision of services, given the accumulating evidence that mechanisms promoting diversity effects shift over time to involve different traits. PMID:26140194

  15. Explaining forest productivity using tree functional traits and phylogenetic information: two sides of the same coin over evolutionary scale?

    PubMed

    Paquette, Alain; Joly, Simon; Messier, Christian

    2015-05-01

    Given evidences that diverse ecosystems provide more services than depauperate ones, much attention has now turned toward finding meaningful and operational diversity indices. We ask two questions: (1) Does phylogenetic diversity contain additional information not explained by functional traits? And (2) What are the strength and nature of the correlation between phylogeny and functional traits according to the evolutionary scale considered? We used data from permanent forest plots of northeastern Canada for which these links have been demonstrated and important functional traits identified. We show that the nature of the relationship between traits and phylogeny varies dramatically among traits, but also according to the evolutionary distance considered. The demonstration that different characters show phylogenetic autocorrelation at different evolutionary depths suggests that phylogenetic content of traits may be too crude to determine whether phylogenies contain relevant information. However, our study provides support for the use of phylogenies to assess ecosystem functioning when key functional traits are unavailable. We also highlight a potentially important contribution of phylogenetics for conservation and the study of the impact of biodiversity loss on ecosystem functioning and the provision of services, given the accumulating evidence that mechanisms promoting diversity effects shift over time to involve different traits.

  16. Computer System for Analysis of Molecular Evolution Modes (SAMEM): analysis of molecular evolution modes at deep inner branches of the phylogenetic tree.

    PubMed

    Gunbin, Konstantin V; Suslov, Valentin V; Genaev, Mikhail A; Afonnikov, Dmitry A

    SAMEM (System for Analysis of Molecular Evolution Modes), a web-based pipeline system for inferring modes of molecular evolution in genes and proteins (http://pixie.bionet.nsc.ru/samem/), is presented. Pipeline 1 performs analyses of protein-coding gene evolution; pipeline 2 performs analyses of protein evolution; pipeline 3 prepares datasets of genes and/or proteins, performs their primary analysis, and builds BLOSUM matrices; pipeline 4 checks if these genes really are protein-coding. Pipeline 1 has an all-new feature, which allows the user to obtain K(R)/K(C) estimates using several different methods. An important feature of pipeline 2 is an original method for analyzing the rates of amino acid substitutions at the branches of a phylogenetic tree. The method is based on Markov modeling and a non-parametric permutation test, which compares expected and observed frequencies of amino acid substitutions, and infers the modes of molecular evolution at deep inner branches.

  17. Internal Transcribed Spacer 2 (nu ITS2 rRNA) Sequence-Structure Phylogenetics: Towards an Automated Reconstruction of the Green Algal Tree of Life

    PubMed Central

    Buchheim, Mark A.; Keller, Alexander; Koetschan, Christian; Förster, Frank; Merget, Benjamin; Wolf, Matthias

    2011-01-01

    Background Chloroplast-encoded genes (matK and rbcL) have been formally proposed for use in DNA barcoding efforts targeting embryophytes. Extending such a protocol to chlorophytan green algae, though, is fraught with problems including non homology (matK) and heterogeneity that prevents the creation of a universal PCR toolkit (rbcL). Some have advocated the use of the nuclear-encoded, internal transcribed spacer two (ITS2) as an alternative to the traditional chloroplast markers. However, the ITS2 is broadly perceived to be insufficiently conserved or to be confounded by introgression or biparental inheritance patterns, precluding its broad use in phylogenetic reconstruction or as a DNA barcode. A growing body of evidence has shown that simultaneous analysis of nucleotide data with secondary structure information can overcome at least some of the limitations of ITS2. The goal of this investigation was to assess the feasibility of an automated, sequence-structure approach for analysis of IT2 data from a large sampling of phylum Chlorophyta. Methodology/Principal Findings Sequences and secondary structures from 591 chlorophycean, 741 trebouxiophycean and 938 ulvophycean algae, all obtained from the ITS2 Database, were aligned using a sequence structure-specific scoring matrix. Phylogenetic relationships were reconstructed by Profile Neighbor-Joining coupled with a sequence structure-specific, general time reversible substitution model. Results from analyses of the ITS2 data were robust at multiple nodes and showed considerable congruence with results from published phylogenetic analyses. Conclusions/Significance Our observations on the power of automated, sequence-structure analyses of ITS2 to reconstruct phylum-level phylogenies of the green algae validate this approach to assessing diversity for large sets of chlorophytan taxa. Moreover, our results indicate that objections to the use of ITS2 for DNA barcoding should be weighed against the utility of an automated

  18. Evolutionary history of the Afro-Madagascan Ixora species (Rubiaceae): species diversification and distribution of key morphological traits inferred from dated molecular phylogenetic trees

    PubMed Central

    Tosh, J.; Dessein, S.; Buerki, S.; Groeninckx, I.; Mouly, A.; Bremer, B.; Smets, E. F.; De Block, P.

    2013-01-01

    Background and Aims Previous work on the pantropical genus Ixora has revealed an Afro-Madagascan clade, but as yet no study has focused in detail on the evolutionary history and morphological trends in this group. Here the evolutionary history of Afro-Madagascan Ixora spp. (a clade of approx. 80 taxa) is investigated and the phylogenetic trees compared with several key morphological traits in taxa occurring in Madagascar. Methods Phylogenetic relationships of Afro-Madagascan Ixora are assessed using sequence data from four plastid regions (petD, rps16, rpoB-trnC and trnL-trnF) and nuclear ribosomal external transcribed spacer (ETS) and internal transcribed spacer (ITS) regions. The phylogenetic distribution of key morphological characters is assessed. Bayesian inference (implemented in BEAST) is used to estimate the temporal origin of Ixora based on fossil evidence. Key Results Two separate lineages of Madagascan taxa are recovered, one of which is nested in a group of East African taxa. Divergence in Ixora is estimated to have commenced during the mid Miocene, with extensive cladogenesis occurring in the Afro-Madagascan clade during the Pliocene onwards. Conclusions Both lineages of Madagascan Ixora exhibit morphological innovations that are rare throughout the rest of the genus, including a trend towards pauciflorous inflorescences and a trend towards extreme corolla tube length, suggesting that the same ecological and selective pressures are acting upon taxa from both Madagascan lineages. Novel ecological opportunities resulting from climate-induced habitat fragmentation and corolla tube length diversification are likely to have facilitated species radiation on Madagascar. PMID:24142919

  19. Relating belowground microbial composition to the taxonomic, phylogenetic, and functional trait distributions of trees in a tropical forest.

    PubMed

    Barberán, Albert; McGuire, Krista L; Wolf, Jeffrey A; Jones, F Andrew; Wright, Stuart Joseph; Turner, Benjamin L; Essene, Adam; Hubbell, Stephen P; Faircloth, Brant C; Fierer, Noah

    2015-12-01

    The complexities of the relationships between plant and soil microbial communities remain unresolved. We determined the associations between plant aboveground and belowground (root) distributions and the communities of soil fungi and bacteria found across a diverse tropical forest plot. Soil microbial community composition was correlated with the taxonomic and phylogenetic structure of the aboveground plant assemblages even after controlling for differences in soil characteristics, but these relationships were stronger for fungi than for bacteria. In contrast to expectations, the species composition of roots in our soil core samples was a poor predictor of microbial community composition perhaps due to the patchy, ephemeral, and highly overlapping nature of fine root distributions. Our ability to predict soil microbial composition was not improved by incorporating information on plant functional traits suggesting that the most commonly measured plant traits are not particularly useful for predicting the plot-level variability in belowground microbial communities. PMID:26472095

  20. Relating belowground microbial composition to the taxonomic, phylogenetic, and functional trait distributions of trees in a tropical forest.

    PubMed

    Barberán, Albert; McGuire, Krista L; Wolf, Jeffrey A; Jones, F Andrew; Wright, Stuart Joseph; Turner, Benjamin L; Essene, Adam; Hubbell, Stephen P; Faircloth, Brant C; Fierer, Noah

    2015-12-01

    The complexities of the relationships between plant and soil microbial communities remain unresolved. We determined the associations between plant aboveground and belowground (root) distributions and the communities of soil fungi and bacteria found across a diverse tropical forest plot. Soil microbial community composition was correlated with the taxonomic and phylogenetic structure of the aboveground plant assemblages even after controlling for differences in soil characteristics, but these relationships were stronger for fungi than for bacteria. In contrast to expectations, the species composition of roots in our soil core samples was a poor predictor of microbial community composition perhaps due to the patchy, ephemeral, and highly overlapping nature of fine root distributions. Our ability to predict soil microbial composition was not improved by incorporating information on plant functional traits suggesting that the most commonly measured plant traits are not particularly useful for predicting the plot-level variability in belowground microbial communities.

  1. Three phylogenetic groups of nodA and nifH genes in Sinorhizobium and Mesorhizobium isolates from leguminous trees growing in Africa and Latin America.

    PubMed

    Haukka, K; Lindström, K; Young, J P

    1998-02-01

    The diversity and phylogeny of nodA and nifH genes were studied by using 52 rhizobial isolates from Acacia senegal, Prosopis chilensis, and related leguminous trees growing in Africa and Latin America. All of the strains had similar host ranges and belonged to the genera Sinorhizobium and Mesorhizobium, as previously determined by 16S rRNA gene sequence analysis. The restriction patterns and a sequence analysis of the nodA and nifH genes divided the strains into the following three distinct groups: sinorhizobia from Africa, sinorhizobia from Latin America, and mesorhizobia from both regions. In a phylogenetic tree also containing previously published sequences, the nodA genes of our rhizobia formed a branch of their own, but within the branch no correlation between symbiotic genes and host trees was apparent. Within the large group of African sinorhizobia, similar symbiotic gene types were found in different chromosomal backgrounds, suggesting that transfer of symbiotic genes has occurred across species boundaries. Most strains had plasmids, and the presence of plasmid-borne nifH was demonstrated by hybridization for some examples. The nodA and nifH genes of Sinorhizobium teranga ORS1009T grouped with the nodA and nifH genes of the other African sinorhizobia, but Sinorhizobium saheli ORS609T had a totally different nodA sequence, although it was closely related based on the 16S rRNA gene and nifH data. This might be because this S. saheli strain was originally isolated from Sesbania sp., which belongs to a different cross-nodulation group than Acacia and Prosopis spp. The factors that appear to have influenced the evolution of rhizobial symbiotic genes vary in importance at different taxonomic levels.

  2. Three Phylogenetic Groups of nodA and nifH Genes in Sinorhizobium and Mesorhizobium Isolates from Leguminous Trees Growing in Africa and Latin America

    PubMed Central

    Haukka, Kaisa; Lindström, Kristina; Young, J. Peter W.

    1998-01-01

    The diversity and phylogeny of nodA and nifH genes were studied by using 52 rhizobial isolates from Acacia senegal, Prosopis chilensis, and related leguminous trees growing in Africa and Latin America. All of the strains had similar host ranges and belonged to the genera Sinorhizobium and Mesorhizobium, as previously determined by 16S rRNA gene sequence analysis. The restriction patterns and a sequence analysis of the nodA and nifH genes divided the strains into the following three distinct groups: sinorhizobia from Africa, sinorhizobia from Latin America, and mesorhizobia from both regions. In a phylogenetic tree also containing previously published sequences, the nodA genes of our rhizobia formed a branch of their own, but within the branch no correlation between symbiotic genes and host trees was apparent. Within the large group of African sinorhizobia, similar symbiotic gene types were found in different chromosomal backgrounds, suggesting that transfer of symbiotic genes has occurred across species boundaries. Most strains had plasmids, and the presence of plasmid-borne nifH was demonstrated by hybridization for some examples. The nodA and nifH genes of Sinorhizobium teranga ORS1009T grouped with the nodA and nifH genes of the other African sinorhizobia, but Sinorhizobium saheli ORS609T had a totally different nodA sequence, although it was closely related based on the 16S rRNA gene and nifH data. This might be because this S. saheli strain was originally isolated from Sesbania sp., which belongs to a different cross-nodulation group than Acacia and Prosopis spp. The factors that appear to have influenced the evolution of rhizobial symbiotic genes vary in importance at different taxonomic levels. PMID:9464375

  3. The Tree versus the Forest: The Fungal Tree of Life and the Topological Diversity within the Yeast Phylome

    PubMed Central

    Marcet-Houben, Marina; Gabaldón, Toni

    2009-01-01

    A recurrent topic in phylogenomics is the combination of various sequence alignments to reconstruct a tree that describes the evolutionary relationships within a group of species. However, such approach has been criticized for not being able to properly represent the topological diversity found among gene trees. To evaluate the representativeness of species trees based on concatenated alignments, we reconstruct several fungal species trees and compare them with the complete collection of phylogenies of genes encoded in the Saccharomyces cerevisiae genome. We found that, despite high levels of among-gene topological variation, the species trees do represent widely supported phylogenetic relationships. Most topological discrepancies between gene and species trees are concentrated in certain conflicting nodes. We propose to map such information on the species tree so that it accounts for the levels of congruence across the genome. We identified the lack of sufficient accuracy of current alignment and phylogenetic methods as an important source for the topological diversity encountered among gene trees. Finally, we discuss the implications of the high levels of topological variation for phylogeny-based orthology prediction strategies. PMID:19190756

  4. ImOSM: intermittent evolution and robustness of phylogenetic methods.

    PubMed

    Thi Nguyen, Minh Anh; Gesell, Tanja; von Haeseler, Arndt

    2012-02-01

    Among the criteria to evaluate the performance of a phylogenetic method, robustness to model violation is of particular practical importance as complete a priori knowledge of evolutionary processes is typically unavailable. For studies of robustness in phylogenetic inference, a utility to add well-defined model violations to the simulated data would be helpful. We therefore introduce ImOSM, a tool to imbed intermittent evolution as model violation into an alignment. Intermittent evolution refers to extra substitutions occurring randomly on branches of a tree, thus changing alignment site patterns. This means that the extra substitutions are placed on the tree after the typical process of sequence evolution is completed. We then study the robustness of widely used phylogenetic methods: maximum likelihood (ML), maximum parsimony (MP), and a distance-based method (BIONJ) to various scenarios of model violation. Violation of rates across sites (RaS) heterogeneity and simultaneous violation of RaS and the transition/transversion ratio on two nonadjacent external branches hinder all the methods recovery of the true topology for a four-taxon tree. For an eight-taxon balanced tree, the violations cause each of the three methods to infer a different topology. Both ML and MP fail, whereas BIONJ, which calculates the distances based on the ML estimated parameters, reconstructs the true tree. Finally, we report that a test of model homogeneity and goodness of fit tests have enough power to detect such model violations. The outcome of the tests can help to actually gain confidence in the inferred trees. Therefore, we recommend using these tests in practical phylogenetic analyses.

  5. Phylogenetic relationships within the lizard clade Xantusiidae: using trees and divergence times to address evolutionary questions at multiple levels.

    PubMed

    Noonan, Brice P; Pramuk, Jennifer B; Bezy, Robert L; Sinclair, Elizabeth A; de Queiroz, Kevin; Sites, Jack W

    2013-10-01

    Xantusiidae (night lizards) is a clade of small-bodied, cryptic lizards endemic to the New World. The clade is characterized by several features that would benefit from interpretation in a phylogenetic context, including: (1) monophyletic status of extant taxa Cricosaura, Lepidophyma, and Xantusia; (2) a species endemic to Cuba (Cricosaura typica) of disputed age; (3) origins of the parthenogenetic species of Lepidophyma; (4) pronounced micro-habitat differences accompanied by distinct morphologies in both Xantusia and Lepidophyma; and (5) placement of Xantusia riversiana, the only vertebrate species endemic to the California Channel Islands, which is highly divergent from its mainland relatives. This study incorporates extensive new character data from multiple gene regions to investigate the phylogeny of Xantusiidae using the most comprehensive taxonomic sampling available to date. Parsimony and partitioned Bayesian analyses of more than 7 kb of mitochondrial and nuclear sequence data from 11 loci all confirm that Xantusiidae is monophyletic, and comprises three well-supported clades: Cricosaura, Xantusia, and Lepidophyma. The Cuban endemic Cricosaura typica is well supported as the sister to all other xantusiids. Estimates of divergence time indicate that Cricosaura diverged from the (Lepidophyma+Xantusia) clade ≈ 81 million years ago (Ma), a time frame consistent with the separation of the Antilles from North America. Our results also confirm and extend an earlier study suggesting that parthenogenesis has arisen at least twice within Lepidophyma without hybridization, that rock-crevice ecomorphs evolved numerous times (>9) within Xantusia and Lepidophyma, and that the large-bodied Channel Island endemic X. riversiana is a distinct, early lineage that may form the sister group to the small-bodied congeners of the mainland.

  6. The Eukaryotic Tree of Life from a Global Phylogenomic Perspective

    PubMed Central

    Burki, Fabien

    2014-01-01

    Molecular phylogenetics has revolutionized our knowledge of the eukaryotic tree of life. With the advent of genomics, a new discipline of phylogenetics has emerged: phylogenomics. This method uses large alignments of tens to hundreds of genes to reconstruct evolutionary histories. This approach has led to the resolution of ancient and contentious relationships, notably between the building blocks of the tree (the supergroups), and allowed to place in the tree enigmatic yet important protist lineages for understanding eukaryote evolution. Here, I discuss the pros and cons of phylogenomics and review the eukaryotic supergroups in light of earlier work that laid the foundation for the current view of the tree, including the position of the root. I conclude by presenting a picture of eukaryote evolution, summarizing the most recent progress in assembling the global tree. PMID:24789819

  7. Improved description of the bipolar ciliate, Euplotes petzi, and definition of its basal position in the Euplotes phylogenetic tree.

    PubMed

    Di Giuseppe, Graziano; Erra, Fabrizio; Paolo Frontini, Francesco; Dini, Fernando; Vallesi, Adriana; Luporini, Pierangelo

    2014-08-01

    Data improving the characterization of the marine Euplotes species, E. petzi Wilbert and Song, 2008, were obtained from morphological, ecological and genetic analyses of Antarctic and Arctic wild-type strains. This species is identified by a minute (mean size, 46 μm × 32 μm) and ellipsoidal cell body which is dorsally decorated with an argyrome of the double-patella type, five dorsal kineties (of which the median one contains 8-10 dikinetids), five sharp-edged longitudinal ridges, and a right anterior spur. Ventrally, it bears 10 fronto-ventral, five transverse, two caudal and two marginal cirri, 30-35 adoral membranelles, and three inconspicuous ridges. Euplotes petzi grows well at 4 °C on green algae, does not produce cysts, undergoes mating under the genetic control of a multiple mating-type system, constitutively secretes water-borne pheromones, and behaves as a psychrophilic microorganism unable to survive at >15 °C. While the α-tubulin gene sequence determination did not provide useful information on the E. petzi molecular phylogeny, the small subunit rRNA (SSU rRNA) gene sequence determination provided solid evidence that E. petzi clusters with E. sinicus Jiang et al., 2010a, into a clade which represents the deepest branch at the base of the Euplotes phylogentic tree. PMID:25051516

  8. Improved description of the bipolar ciliate, Euplotes petzi, and definition of its basal position in the Euplotes phylogenetic tree.

    PubMed

    Di Giuseppe, Graziano; Erra, Fabrizio; Paolo Frontini, Francesco; Dini, Fernando; Vallesi, Adriana; Luporini, Pierangelo

    2014-08-01

    Data improving the characterization of the marine Euplotes species, E. petzi Wilbert and Song, 2008, were obtained from morphological, ecological and genetic analyses of Antarctic and Arctic wild-type strains. This species is identified by a minute (mean size, 46 μm × 32 μm) and ellipsoidal cell body which is dorsally decorated with an argyrome of the double-patella type, five dorsal kineties (of which the median one contains 8-10 dikinetids), five sharp-edged longitudinal ridges, and a right anterior spur. Ventrally, it bears 10 fronto-ventral, five transverse, two caudal and two marginal cirri, 30-35 adoral membranelles, and three inconspicuous ridges. Euplotes petzi grows well at 4 °C on green algae, does not produce cysts, undergoes mating under the genetic control of a multiple mating-type system, constitutively secretes water-borne pheromones, and behaves as a psychrophilic microorganism unable to survive at >15 °C. While the α-tubulin gene sequence determination did not provide useful information on the E. petzi molecular phylogeny, the small subunit rRNA (SSU rRNA) gene sequence determination provided solid evidence that E. petzi clusters with E. sinicus Jiang et al., 2010a, into a clade which represents the deepest branch at the base of the Euplotes phylogentic tree.

  9. AGP: a multimethods web server for alignment-free genome phylogeny.

    PubMed

    Cheng, Jinkui; Cao, Fuliang; Liu, Zhihua

    2013-05-01

    Phylogenetic analysis based on alignment method meets huge challenges when dealing with whole-genome sequences, for example, recombination, shuffling, and rearrangement of sequences. Thus, various alignment-free methods for phylogeny construction have been proposed. However, most of these methods have not been implemented as tools or web servers. Researchers cannot use these methods easily with their data sets. To facilitate the usage of various alignment-free methods, we implemented most of the popular alignment-free methods and constructed a user-friendly web server for alignment-free genome phylogeny (AGP). AGP integrated the phylogenetic tree construction, visualization, and comparison functions together. Both AGP and all source code of the methods are available at http://www.herbbol.org:8000/agp (last accessed February 26, 2013). AGP will facilitate research in the field of whole-genome phylogeny and comparison.

  10. Comparing alignment methods for inferring the history of the new world lizard genus Mabuya (Squamata: Scincidae).

    PubMed

    Whiting, Alison S; Sites, Jack W; Pellegrino, Katia C M; Rodrigues, Miguel T

    2006-03-01

    The rapid increase in the ability to generate molecular data, and the focus on model-based methods for tree reconstruction have greatly advanced the use of phylogenetics in many fields. The recent flurry of new analytical techniques has focused almost solely on tree reconstruction, whereas alignment issues have received far less attention. In this paper, we use a diverse sampling of gene regions from lizards of the genus Mabuya to compare the impact, on phylogeny estimation, of new maximum likelihood alignment algorithms with more widely used methods. Sequences aligned under different optimality criteria are analyzed using partitioned Bayesian analysis with independent models and parameter settings for each gene region, and the most strongly supported phylogenetic hypothesis is then used to test the hypothesis of two colonizations of the New World by African scincid lizards. Our results show that the consistent use of model-based methods in both alignment and tree reconstruction leads to trees with more optimal likelihood scores than the use of independent criteria in alignment and tree reconstruction. We corroborate and extend earlier evidence for two independent colonizations of South America by scincid lizards. Relationships within South American Mabuya are found to be in need of taxonomic revision, specifically complexes under the names M. heathi, M. agilis, and M. bistriata (sensu, M.T. Rodrigues, Papeis Avulsos de Zoologia 41 (2000) 313).

  11. Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies.

    PubMed

    Leaché, Adam D; Banbury, Barbara L; Felsenstein, Joseph; de Oca, Adrián Nieto-Montes; Stamatakis, Alexandros

    2015-11-01

    Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the

  12. Phylogenetic comparison of local length plasticity of the small subunit of nuclear rDNAs among all Hexapoda orders and the impact of hyper-length-variation on alignment.

    PubMed

    Xie, Qiang; Tian, Xiaoxuan; Qin, Yan; Bu, Wenjun

    2009-02-01

    The SSU nrDNA (18S), is one of the most frequently sequenced molecular markers in phylogenetic studies. However, the length-hyper-variation at multiple positions of this gene can affect the accuracy of alignment greatly and this length variation makes alignment across arthropod orders a serious problem. The analyses of Hexapoda phylogeny is such a case. A more clear recognition of the distribution of the length-variable-regions is needed. In this study, the secondary structure of some length-variable-regions in the SSU nrRNA of Arthropoda was adjusted by the principle of co-variation. It is found that the extent of plasticity of some length-variable-region can extraordinarily be higher than 600 bases in hexapods. And the numbers of hyper length-variable-regions are largest in Strepsiptera and Sternorrhyncha (Hemiptera). Our study shows that some length-variable-regions can serve as synapomorphies for some groups. The phylogenetic comparison also suggested that the expansion of a lateral bulge could be the origin of a helix. PMID:19027081

  13. The impact of single substitutions on multiple sequence alignments.

    PubMed

    Klaere, Steffen; Gesell, Tanja; von Haeseler, Arndt

    2008-12-27

    We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.

  14. Multiple sequence alignment with the Clustal series of programs.

    PubMed

    Chenna, Ramu; Sugawara, Hideaki; Koike, Tadashi; Lopez, Rodrigo; Gibson, Toby J; Higgins, Desmond G; Thompson, Julie D

    2003-07-01

    The Clustal series of programs are widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. The popularity of the programs depends on a number of factors, including not only the accuracy of the results, but also the robustness, portability and user-friendliness of the programs. New features include NEXUS and FASTA format output, printing range numbers and faster tree calculation. Although, Clustal was originally developed to run on a local computer, numerous Web servers have been set up, notably at the EBI (European Bioinformatics Institute) (http://www.ebi.ac.uk/clustalw/).

  15. Comments on the gonotyl of Proctocaecum macroclemidis (Tkach and Snyder, 2003) n. comb. (Digenea: Acanthostomidae: Acanthostominae), with a key to the genera of acanthostominae and new phylogenetic tree for Proctocaecum Baugh, 1957.

    PubMed

    Brooks, Daniel R

    2004-06-01

    The species recently described as Acanthostomum macroclemidis possesses the gonotyl in the form of a solid muscular pad uniquely diagnostic for species of Proctocaecum and is accordingly transferred to that genus. An artificial key to the 5 acanthostomine genera, as well as an updated phylogenetic hypothesis for the 10 known species of Proctocaecum, based on 11 characters and including 2 species described since the last phylogenetic analysis, are presented. The single most parsimonious phylogenetic tree with a consistency index of 87.5% suggests that Proctocaecum originated in Africa and spread to North America and South America before the breakup of Pangaea. As a result, the 2 North American and 1 South American species are most closely related to different African members of the genus. African and Indo-Pacific species inhabit crocodylids; hence, the occurrence of North American species in alligatorids and chelonians and a South American species in alligatorids are the result of host switches.

  16. Verification of phylogenetic inference programs using metamorphic testing.

    PubMed

    Sadi, Md Shaik; Kuo, Fei-Ching; Ho, Joshua W K; Charleston, Michael A; Chen, T Y

    2011-12-01

    Many phylogenetic inference programs are available to infer evolutionary relationships among taxa using aligned sequences of characters, typically DNA or amino acids. These programs are often used to infer the evolutionary history of species. However, in most cases it is impossible to systematically verify the correctness of the tree returned by these programs, as the correct evolutionary history is generally unknown and unknowable. In addition, it is nearly impossible to verify whether any non-trivial tree is correct in accordance to the specification of the often complicated search and scoring algorithms. This difficulty is known as the oracle problem of software testing: there is no oracle that we can use to verify the correctness of the returned tree. This makes it very challenging to test the correctness of any phylogenetic inference programs. Here, we demonstrate how to apply a simple software testing technique, called Metamorphic Testing, to alleviate the oracle problem in testing phylogenetic inference programs. We have used both real and randomly generated test inputs to evaluate the effectiveness of metamorphic testing, and found that metamorphic testing can detect failures effectively in faulty phylogenetic inference programs with both types of test inputs.

  17. Diversity of a ribonucleoprotein family in tobacco chloroplasts: two new chloroplast ribonucleoproteins and a phylogenetic tree of ten chloroplast RNA-binding domains.

    PubMed Central

    Ye, L H; Li, Y Q; Fukami-Kobayashi, K; Go, M; Konishi, T; Watanabe, A; Sugiura, M

    1991-01-01

    Two new ribonucleoproteins (RNPs) have been identified from a tobacco chloroplast lysate. These two proteins (cp29A and cp29B) are nuclear-encoded and have a less affinity to single-stranded DNA as compared with three other chloroplast RNPs (cp28, cp31 and cp33) previously isolated. DNA sequencing revealed that both contain two consensus sequence-type homologous RNA-binding domains (CS-RBDs) and a very acidic amino-terminal domain but shorter than that of cp28, cp31 and cp33. Comparison of cp29A and cp29B showed a 19 amino acid insertion in the region separating the two CS-RBDs in cp29B. This insertion results in three tandem repeats of a glycine-rich sequence of 10 amino acids, which is a novel feature in RNPs. The two proteins are encoded by different single nuclear genes and no alternatively spliced transcripts could be identified. We constructed a phylogenetic tree for the ten chloroplast CS-RBDs. These results suggest that there is a sizable RNP family in chloroplasts and the diversity was mainly generated through a series of gene duplications rather than through alternative pre-mRNA splicing. The gene for cp29B contains three introns. The first and second introns interrupt the first CS-RBD and the third intron does the second CS-RBD. The position of the first intron site is the same as that in the human hnRNP A1 protein gene. Images PMID:1721701

  18. Aligning Biomolecular Networks Using Modular Graph Kernels

    NASA Astrophysics Data System (ADS)

    Towfic, Fadi; Greenlee, M. Heather West; Honavar, Vasant

    Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. We explore a class of algorithms for aligning large biomolecular networks by breaking down such networks into subgraphs and computing the alignment of the networks based on the alignment of their subgraphs. The resulting subnetworks are compared using graph kernels as scoring functions. We provide implementations of the resulting algorithms as part of BiNA, an open source biomolecular network alignment toolkit. Our experiments using Drosophila melanogaster, Saccharomyces cerevisiae, Mus musculus and Homo sapiens protein-protein interaction networks extracted from the DIP repository of protein-protein interaction data demonstrate that the performance of the proposed algorithms (as measured by % GO term enrichment of subnetworks identified by the alignment) is competitive with some of the state-of-the-art algorithms for pair-wise alignment of large protein-protein interaction networks. Our results also show that the inter-species similarity scores computed based on graph kernels can be used to cluster the species into a species tree that is consistent with the known phylogenetic relationships among the species.

  19. Phylogenetic mixture models for proteins.

    PubMed

    Le, Si Quang; Lartillot, Nicolas; Gascuel, Olivier

    2008-12-27

    Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution.We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TREEBASE.We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average AIC gain per site with TREEBASE test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008 Mol. Biol. Evol. 25, 1307-1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover

  20. Make Your Own Phylogenetic Tree

    ERIC Educational Resources Information Center

    Rau, Gerald

    2012-01-01

    Molecular similarity is one of the strongest lines of evidence for evolution--and one of the most difficult for students to grasp. That is because the underlying observations--that identical mutations are found in closely related species and the degree of similarity decreases with evolutionary distance--are not visible to the human eye. And it's…

  1. k-merSNP discovery: Software for alignment-and reference-free scalable SNP discovery, phylogenetics, and annotation for hundreds of microbial genomes

    SciTech Connect

    2014-11-18

    With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.

  2. k-merSNP discovery: Software for alignment-and reference-free scalable SNP discovery, phylogenetics, and annotation for hundreds of microbial genomes

    2014-11-18

    With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny inmore » minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.« less

  3. TreeDomViewer: a tool for the visualization of phylogeny and protein domain structure.

    PubMed

    Alako, Blaise T F; Rainey, Daphne; Nijveen, Harm; Leunissen, Jack A M

    2006-07-01

    Phylogenetic analysis and examination of protein domains allow accurate genome annotation and are invaluable to study proteins and protein complex evolution. However, two sequences can be homologous without sharing statistically significant amino acid or nucleotide identity, presenting a challenging bioinformatics problem. We present TreeDomViewer, a visualization tool available as a web-based interface that combines phylogenetic tree description, multiple sequence alignment and InterProScan data of sequences and generates a phylogenetic tree projecting the corresponding protein domain information onto the multiple sequence alignment. Thereby it makes use of existing domain prediction tools such as InterProScan. TreeDomViewer adopts an evolutionary perspective on how domain structure of two or more sequences can be aligned and compared, to subsequently infer the function of an unknown homolog. This provides insight into the function assignment of, in terms of amino acid substitution, very divergent but yet closely related family members. Our tool produces an interactive scalar vector graphics image that provides orthological relationship and domain content of proteins of interest at one glance. In addition, PDF, JPEG or PNG formatted output is also provided. These features make TreeDomViewer a valuable addition to the annotation pipeline of unknown genes or gene products. TreeDomViewer is available at http://www.bioinformatics.nl/tools/treedom/.

  4. Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer

    PubMed Central

    Grishin, Nick V.; Otwinowski, Zbyszek

    2016-01-01

    Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz. PMID:27336403

  5. Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer.

    PubMed

    Bromberg, Raquel; Grishin, Nick V; Otwinowski, Zbyszek

    2016-06-01

    Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz. PMID:27336403

  6. Phylogenetically resolving epidemiologic linkage

    DOE PAGES

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-02-22

    The use of phylogenetic trees in epidemiological investigations has become commonplace, but their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the truemore » transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. Moreover, we confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results.« less

  7. Phylogenetically resolving epidemiologic linkage.

    PubMed

    Romero-Severson, Ethan O; Bulla, Ingo; Leitner, Thomas

    2016-03-01

    Although the use of phylogenetic trees in epidemiological investigations has become commonplace, their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the true transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals' HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. We confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results. PMID:26903617

  8. Phylogenetically resolving epidemiologic linkage

    PubMed Central

    Romero-Severson, Ethan O.; Bulla, Ingo; Leitner, Thomas

    2016-01-01

    Although the use of phylogenetic trees in epidemiological investigations has become commonplace, their epidemiological interpretation has not been systematically evaluated. Here, we use an HIV-1 within-host coalescent model to probabilistically evaluate transmission histories of two epidemiologically linked hosts. Previous critique of phylogenetic reconstruction has claimed that direction of transmission is difficult to infer, and that the existence of unsampled intermediary links or common sources can never be excluded. The phylogenetic relationship between the HIV populations of epidemiologically linked hosts can be classified into six types of trees, based on cladistic relationships and whether the reconstruction is consistent with the true transmission history or not. We show that the direction of transmission and whether unsampled intermediary links or common sources existed make very different predictions about expected phylogenetic relationships: (i) Direction of transmission can often be established when paraphyly exists, (ii) intermediary links can be excluded when multiple lineages were transmitted, and (iii) when the sampled individuals’ HIV populations both are monophyletic a common source was likely the origin. Inconsistent results, suggesting the wrong transmission direction, were generally rare. In addition, the expected tree topology also depends on the number of transmitted lineages, the sample size, the time of the sample relative to transmission, and how fast the diversity increases after infection. Typically, 20 or more sequences per subject give robust results. We confirm our theoretical evaluations with analyses of real transmission histories and discuss how our findings should aid in interpreting phylogenetic results. PMID:26903617

  9. STBase: One Million Species Trees for Comparative Biology

    PubMed Central

    McMahon, Michelle M.; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J.

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed

  10. STBase: one million species trees for comparative biology.

    PubMed

    McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed

  11. Biochemical and structural characterizations of two Dictyostelium cellobiohydrolases from the amoebozoa kingdom reveal a high level of conservation between distant phylogenetic trees of life

    DOE PAGES

    Hobdey, Sarah E.; Knott, Brandon C.; Momeni, Majid Haddad; Taylor, II, Larry E.; Borisova, Anna S.; Podkaminer, Kara K.; VanderWall, Todd A.; Himmel, Michael E.; Decker, Stephen R.; Beckham, Gregg T.; et al

    2016-04-01

    Glycoside hydrolase family 7 (GH7) cellobiohydrolases (CBHs) are enzymes often employed in plant cell wall degradation across eukaryotic kingdoms of life, as they provide significant hydrolytic potential in cellulose turnover. To date, many fungal GH7 CBHs have been examined, yet many questions regarding structure-activity relationships in these important natural and commercial enzymes remain. Here, we present the crystal structures and a biochemical analysis of two GH7 CBHs from social amoeba: Dictyostelium discoideum Cel7A (DdiCel7A) and Dictyostelium purpureum Cel7A (DpuCel7A). DdiCel7A and DpuCel7A natively consist of a catalytic domain and do not exhibit a carbohydrate-binding module (CBM). The structures of DdiCel7Amore » and DpuCel7A, resolved to 2.1 Å and 2.7 Å, respectively, are homologous to those of other GH7 CBHs with an enclosed active-site tunnel. Two primary differences between the Dictyostelium CBHs and the archetypal model GH7 CBH, Trichoderma reesei Cel7A (TreCel7A), occur near the hydrolytic active site and the product-binding sites. To compare the activities of these enzymes with the activity of TreCel7A, the family 1 TreCel7A CBM and linker were added to the C terminus of each of the Dictyostelium enzymes, creating DdiCel7ACBM and DpuCel7ACBM, which were recombinantly expressed in T. reesei. DdiCel7ACBM and DpuCel7ACBM hydrolyzed Avicel, pretreated corn stover, and phosphoric acid-swollen cellulose as efficiently as TreCel7A when hydrolysis was compared at their temperature optima. The Ki of cellobiose was significantly higher for DdiCel7ACBM and DpuCel7ACBM than for TreCel7A: 205, 130, and 29 μM, respectively. Finally, taken together, the present study highlights the remarkable degree of conservation of the activity of these key natural and industrial enzymes across quite distant phylogenetic trees of life.« less

  12. MUST, a computer package of Management Utilities for Sequences and Trees.

    PubMed Central

    Philippe, H

    1993-01-01

    The MUST package is a phylogenetically oriented set of programs for data management and display, allowing one to handle both raw data (sequences) and results (trees, number of steps, bootstrap proportions). It is complementary to the main available software for phylogenetic analysis (PHYLIP, PAUP, HENNING86, CLUSTAL) with which it is fully compatible. The first part of MUST consists of the acquisition of new sequences, their storage, modification, and checking of sequence integrity in files of aligned sequences. In order to improve alignment, an editor function for aligned sequences offers numerous options, such as selection of subsets of sequences, display of consensus sequences, and search for similarities over small sequence fragments. For phylogenetic reconstruction, the choice of species and portions of sequences to be analyzed is easy and very rapid, permitting fast testing of numerous combinations of sequences and taxa. The resulting files can be formatted for most programs of tree construction. An interactive tree-display program recovers the output of all these programs. Finally, various modules allow an in-depth analysis of results, such as comparison of distance matrices, variation of bootstrap proportions with respect to various parameters or comparison of the number of steps per position. All presently available complete sequences of 28S rRNA are furnished aligned in the package. MUST therefore allows the management of all the operations required for phylogenetic reconstructions. PMID:8255784

  13. Efficient exploration of the space of reconciled gene trees.

    PubMed

    Szöllõsi, Gergely J; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent

    2013-11-01

    Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree-species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source

  14. Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments.

    PubMed

    Poirot, Olivier; O'Toole, Eamonn; Notredame, Cedric

    2003-07-01

    This paper presents Tcoffee@igs, a new server provided to the community by Hewlet Packard computers and the Centre National de la Recherche Scientifique. This server is a web-based tool dedicated to the computation, the evaluation and the combination of multiple sequence alignments. It uses the latest version of the T-Coffee package. Given a set of unaligned sequences, the server returns an evaluated multiple sequence alignment and the associated phylogenetic tree. This server also makes it possible to evaluate the local reliability of an existing alignment and to combine several alternative multiple alignments into a single new one. Tcoffee@igs can be used for aligning protein, RNA or DNA sequences. Datasets of up to 100 sequences (2000 residues long) can be processed. The server and its documentation are available from: http://igs-server.cnrs-mrs.fr/Tcoffee/.

  15. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications.

    PubMed

    Goremykin, Vadim V; Holland, Barbara; Hirsch-Ernst, Karen I; Hellwig, Frank H

    2005-09-01

    Determining the phylogenetic relationships among the major lines of angiosperms is a long-standing problem, yet the uncertainty as to the phylogenetic affinity of these lines persists. While a number of studies have suggested that the ANITA (Amborella-Nymphaeales-Illiciales-Trimeniales-Aristolochiales) grade is basal within angiosperms, studies of complete chloroplast genome sequences also suggested an alternative tree, wherein the line leading to the grasses branches first among the angiosperms. To improve taxon sampling in the existing chloroplast genome data, we sequenced the chloroplast genome of the monocot Acorus calamus. We generated a concatenated alignment (89,436 positions for 15 taxa), encompassing almost all sequences usable for phylogeny reconstruction within spermatophytes. The data still contain support for both the ANITA-basal and grasses-basal hypotheses. Using simulations we can show that were the ANITA-basal hypothesis true, parsimony (and distance-based methods with many models) would be expected to fail to recover it. The self-evident explanation for this failure appears to be a long-branch attraction (LBA) between the clade of grasses and the out-group. However, this LBA cannot explain the discrepancies observed between tree topology recovered using the maximum likelihood (ML) method and the topologies recovered using the parsimony and distance-based methods when grasses are deleted. Furthermore, the fact that neither maximum parsimony nor distance methods consistently recover the ML tree, when according to the simulations they would be expected to, when the out-group (Pinus) is deleted, suggests that either the generating tree is not correct or the best symmetric model is misspecified (or both). We demonstrate that the tree recovered under ML is extremely sensitive to model specification and that the best symmetric model is misspecified. Hence, we remain agnostic regarding phylogenetic relationships among basal angiosperm lineages.

  16. An exploration of how to define and measure the evolution of behavior, learning, memory and mind across the full phylogenetic tree of life.

    PubMed

    Eisenstein, E M; Eisenstein, D L; Sarma, J S M

    2016-01-01

    There are probably few terms in evolutionary studies regarding neuroscience issues that are used more frequently than 'behavior', 'learning', 'memory', and 'mind'. Yet there are probably as many different meanings of these terms as there are users of them. Further, investigators in such studies, while recognizing the full phylogenetic spectrum of life and the evolution of these phenomena, rarely go beyond mammals and other vertebrates in their investigations; invertebrates are sometimes included. What is rarely taken into consideration, though, is that to fully understand the evolution and significance for survival of these phenomena across phylogeny, it is essential that they be measured and compared in the same units of measurement across the full phylogenetic spectrum from aneural bacteria and protozoa to humans. This paper explores how these terms are generally used as well as how they might be operationally defined and measured to facilitate uniform examination and comparisons across the full phylogenetic spectrum of life. This paper has 2 goals: (1) to provide models for measuring the evolution of 'behavior' and its changes across the full phylogenetic spectrum, and (2) to explain why 'mind phenomena' cannot be measured scientifically at the present time. PMID:27489578

  17. An exploration of how to define and measure the evolution of behavior, learning, memory and mind across the full phylogenetic tree of life

    PubMed Central

    Eisenstein, E. M.; Eisenstein, D. L.; Sarma, J. S. M.

    2016-01-01

    ABSTRACT There are probably few terms in evolutionary studies regarding neuroscience issues that are used more frequently than ‘behavior', ‘learning', ‘memory', and ‘mind'. Yet there are probably as many different meanings of these terms as there are users of them. Further, investigators in such studies, while recognizing the full phylogenetic spectrum of life and the evolution of these phenomena, rarely go beyond mammals and other vertebrates in their investigations; invertebrates are sometimes included. What is rarely taken into consideration, though, is that to fully understand the evolution and significance for survival of these phenomena across phylogeny, it is essential that they be measured and compared in the same units of measurement across the full phylogenetic spectrum from aneural bacteria and protozoa to humans. This paper explores how these terms are generally used as well as how they might be operationally defined and measured to facilitate uniform examination and comparisons across the full phylogenetic spectrum of life. This paper has 2 goals: (1) to provide models for measuring the evolution of ‘behavior' and its changes across the full phylogenetic spectrum, and (2) to explain why ‘mind phenomena' cannot be measured scientifically at the present time. PMID:27489578

  18. An exploration of how to define and measure the evolution of behavior, learning, memory and mind across the full phylogenetic tree of life.

    PubMed

    Eisenstein, E M; Eisenstein, D L; Sarma, J S M

    2016-01-01

    There are probably few terms in evolutionary studies regarding neuroscience issues that are used more frequently than 'behavior', 'learning', 'memory', and 'mind'. Yet there are probably as many different meanings of these terms as there are users of them. Further, investigators in such studies, while recognizing the full phylogenetic spectrum of life and the evolution of these phenomena, rarely go beyond mammals and other vertebrates in their investigations; invertebrates are sometimes included. What is rarely taken into consideration, though, is that to fully understand the evolution and significance for survival of these phenomena across phylogeny, it is essential that they be measured and compared in the same units of measurement across the full phylogenetic spectrum from aneural bacteria and protozoa to humans. This paper explores how these terms are generally used as well as how they might be operationally defined and measured to facilitate uniform examination and comparisons across the full phylogenetic spectrum of life. This paper has 2 goals: (1) to provide models for measuring the evolution of 'behavior' and its changes across the full phylogenetic spectrum, and (2) to explain why 'mind phenomena' cannot be measured scientifically at the present time.

  19. Stochastic models for horizontal gene transfer: taking a random walk through tree space.

    PubMed

    Suchard, Marc A

    2005-05-01

    Horizontal gene transfer (HGT) plays a critical role in evolution across all domains of life with important biological and medical implications. I propose a simple class of stochastic models to examine HGT using multiple orthologous gene alignments. The models function in a hierarchical phylogenetic framework. The top level of the hierarchy is based on a random walk process in "tree space" that allows for the development of a joint probabilistic distribution over multiple gene trees and an unknown, but estimable species tree. I consider two general forms of random walks. The first form is derived from the subtree prune and regraft (SPR) operator that mirrors the observed effects that HGT has on inferred trees. The second form is based on walks over complete graphs and offers numerically tractable solutions for an increasing number of taxa. The bottom level of the hierarchy utilizes standard phylogenetic models to reconstruct gene trees given multiple gene alignments conditional on the random walk process. I develop a well-mixing Markov chain Monte Carlo algorithm to fit the models in a Bayesian framework. I demonstrate the flexibility of these stochastic models to test competing ideas about HGT by examining the complexity hypothesis. Using 144 orthologous gene alignments from six prokaryotes previously collected and analyzed, Bayesian model selection finds support for (1) the SPR model over the alternative form, (2) the 16S rRNA reconstruction as the most likely species tree, and (3) increased HGT of operational genes compared to informational genes.

  20. Structure of the small ribosomal subunit RNA of the pulmonate snail, Limicolaria kambeul, and phylogenetic analysis of the Metazoa.

    PubMed

    Winnepennickx, B; Backeljau, T; van de Peer, Y; De Wachter, R

    1992-09-01

    The complete nucleotide sequence of the small ribosomal subunit RNA of the gastropod, Limicolaria kambeul, was determined and used to infer a secondary structure model. In order to clarify the phylogenetic position of the Mollusca among the Metazoa, an evolutionary tree was constructed by neighbor-joining, starting from an alignment of small ribosomal subunit RNA sequences. The Mollusca appear to be a monophyletic group, related to Arthropoda and Chordata in an unresolved trichotomy. PMID:1505675

  1. Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts

    NASA Technical Reports Server (NTRS)

    Woese, C. R.; Achenbach, L.; Rouviere, P.; Mandelco, L.

    1991-01-01

    A major and too little recognized source of artifact in phylogenetic analysis of molecular sequence data is compositional difference among sequences. The problem becomes particularly acute when alignments contain ribosomal RNAs from both mesophilic and thermophilic species. Among prokaryotes the latter are considerably higher in G + C content than the former, which often results in artificial clustering of thermophilic lineages and their being placed artificially deep in phylogenetic trees. In this communication we review archaeal phylogeny in the light of this consideration, focusing in particular on the phylogenetic position of the sulfate reducing species Archaeoglobus fulgidus, using both 16S rRNA and 23S rRNA sequences. The analysis shows clearly that the previously reported deep branching of the A. fulgidus lineage (very near the base of the euryarchaeal side of the archaeal tree) is incorrect, and that the lineage actually groups with a previously recognized unit that comprises the Methanomicrobiales and extreme halophiles.

  2. ALFRED: A Practical Method for Alignment-Free Distance Computation.

    PubMed

    Thankachan, Sharma V; Chockalingam, Sriram P; Liu, Yongchao; Apostolico, Alberto; Aluru, Srinivas

    2016-06-01

    Alignment-free approaches are gaining persistent interest in many sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, especially for large-scale sequence datasets. Besides the widely used k-mer methods, the average common substring (ACS) approach has emerged to be one of the well-known alignment-free approaches. Two recent works further generalize this ACS approach by allowing a bounded number k of mismatches in the common substrings, relying on approximation (linear time) and exact computation, respectively. Albeit having a good worst-case time complexity [Formula: see text], the exact approach is complex and unlikely to be efficient in practice. Herein, we present ALFRED, an alignment-free distance computation method, which solves the generalized common substring search problem via exact computation. Compared to the theoretical approach, our algorithm is easier to implement and more practical to use, while still providing highly competitive theoretical performances with an expected run-time of [Formula: see text]. By applying our program to phylogenetic inference as a case study, we find that our program facilitates to exactly reconstruct the topology of the reference phylogenetic tree for a set of 27 primate mitochondrial genomes, at reasonably acceptable speed. ALFRED is implemented in C++ programming language and the source code is freely available online. PMID:27138275

  3. CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes.

    PubMed

    Xu, Zhao; Hao, Bailin

    2009-07-01

    The CVTree web server (http://tlife.fudan.edu.cn/cvtree) presented here is a new implementation of the whole genome-based, alignment-free composition vector (CV) method for phylogenetic analysis. It is more efficient and user-friendly than the previously published version in the 2004 web server issue of Nucleic Acids Research. The development of whole genome-based alignment-free CV method has provided an independent verification to the traditional phylogenetic analysis based on a single gene or a few genes. This new implementation attempts to meet the challenge of ever increasing amount of genome data and includes in its database more than 850 prokaryotic genomes which will be updated monthly from NCBI, and more than 80 fungal genomes collected manually from several sequencing centers. This new CVTree web server provides a faster and stable research platform. Users can upload their own sequences to find their phylogenetic position among genomes selected from the server's; inbuilt database. All sequence data used in a session may be downloaded as a compressed file. In addition to standard phylogenetic trees, users can also choose to output trees whose monophyletic branches are collapsed to various taxonomic levels. This feature is particularly useful for comparing phylogeny with taxonomy when dealing with thousands of genomes.

  4. Molecular Characterization and Phylogenetic Evaluation of the Hsp90 Gene from Selected Nematodes

    PubMed Central

    Skantar, Andrea M.; Carta, Lynn K.

    2004-01-01

    While multiple genes are optimal for corroborating nematode phylogenies, only a few are commonly used. Here we examine the phylogenetic potential of the nuclear Hsp90 chaperone gene. We used degenerate primers to obtain partial Hsp90 sequences from several plant-parasitic and free-living nematodes. Hsp90 was single-copy in Heterodera glycines and Meloidogyne javanica, similar to the situation for Caenorhabditis elegans. The full-length H. glycines Hsp90 protein sequence showed homology to sequences from C. elegans and Brugia pahangi and to other eukaryotes, and contains several functionally important regions common to cytoplasmic Hsp90 proteins. The Hsp90 amino acid phylogeny supported the Coelomata hypothesis for metazoan evolution. Phylogenetic trees, substitution scatter plots, and statistics for phylogenetic signal were made for Hsp90, 18S small subunit (SSU), and 28S large subunit (LSU) over a limited but broad sampling of nematode taxa. Only the LSU data set failed to recover any of the expected topology and showed extensive substitution saturation. In an intensive sampling of plant-parasitic nematode taxa, the Hsp90 tree topologies were generally congruent with rDNA results and alignments were unambiguous. Hsp90 sequences may help strengthen branch support or clarify tree topologies when other molecules show ambiguous alignments, greater branch-length heterogeneity, or codon bias in certain taxonomic groups. PMID:19262827

  5. Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting.

    PubMed

    Desper, Richard; Gascuel, Olivier

    2004-03-01

    Due to its speed, the distance approach remains the best hope for building phylogenies on very large sets of taxa. Recently (R. Desper and O. Gascuel, J. Comp. Biol. 9:687-705, 2002), we introduced a new "balanced" minimum evolution (BME) principle, based on a branch length estimation scheme of Y. Pauplin (J. Mol. Evol. 51:41-47, 2000). Initial simulations suggested that FASTME, our program implementing the BME principle, was more accurate than or equivalent to all other distance methods we tested, with running time significantly faster than Neighbor-Joining (NJ). This article further explores the properties of the BME principle, and it explains and illustrates its impressive topological accuracy. We prove that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates. We show that the BME principle is statistically consistent. We demonstrate that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Finally, we consider a large simulated data set, with 5,000 100-taxon trees generated by the Aldous beta-splitting distribution encompassing a range of distributions from Yule-Harding to uniform, and using a covarion-like model of sequence evolution. FASTME produces trees faster than NJ, and much faster than WEIGHBOR and the weighted least-squares implementation of PAUP*. Moreover, FASTME trees are consistently more accurate at all settings, ranging from Yule-Harding to uniform distributions, and all ranges of maximum pairwise divergence and departure from molecular clock. Interestingly, the covarion parameter has little effect on the tree quality for any of the algorithms. FASTME is freely available on the web.

  6. Phylogeny.fr: robust phylogenetic analysis for the non-specialist

    PubMed Central

    Dereeper, A.; Guignon, V.; Blanc, G.; Audic, S.; Buffet, S.; Chevenet, F.; Dufayard, J.-F.; Guindon, S.; Lefort, V.; Lescot, M.; Gascuel, O.

    2008-01-01

    Phylogenetic analyses are central to many research areas in biology and typically involve the identification of homologous sequences, their multiple alignment, the phylogenetic reconstruction and the graphical representation of the inferred tree. The Phylogeny.fr platform transparently chains programs to automatically perform these tasks. It is primarily designed for biologists with no experience in phylogeny, but can also meet the needs of specialists; the first ones will find up-to-date tools chained in a phylogeny pipeline to analyze their data in a simple and robust way, while the specialists will be able to easily build and run sophisticated analyses. Phylogeny.fr offers three main modes. The ‘One Click’ mode targets non-specialists and provides a ready-to-use pipeline chaining programs with recognized accuracy and speed: MUSCLE for multiple alignment, PhyML for tree building, and TreeDyn for tree rendering. All parameters are set up to suit most studies, and users only have to provide their input sequences to obtain a ready-to-print tree. The ‘Advanced’ mode uses the same pipeline but allows the parameters of each program to be customized by users. The ‘A la Carte’ mode offers more flexibility and sophistication, as users can build their own pipeline by selecting and setting up the required steps from a large choice of tools to suit their specific needs. Prior to phylogenetic analysis, users can also collect neighbors of a query sequence by running BLAST on general or specialized databases. A guide tree then helps to select neighbor sequences to be used as input for the phylogeny pipeline. Phylogeny.fr is available at: http://www.phylogeny.fr/ PMID:18424797

  7. Phylogeny.fr: robust phylogenetic analysis for the non-specialist.

    PubMed

    Dereeper, A; Guignon, V; Blanc, G; Audic, S; Buffet, S; Chevenet, F; Dufayard, J-F; Guindon, S; Lefort, V; Lescot, M; Claverie, J-M; Gascuel, O

    2008-07-01

    Phylogenetic analyses are central to many research areas in biology and typically involve the identification of homologous sequences, their multiple alignment, the phylogenetic reconstruction and the graphical representation of the inferred tree. The Phylogeny.fr platform transparently chains programs to automatically perform these tasks. It is primarily designed for biologists with no experience in phylogeny, but can also meet the needs of specialists; the first ones will find up-to-date tools chained in a phylogeny pipeline to analyze their data in a simple and robust way, while the specialists will be able to easily build and run sophisticated analyses. Phylogeny.fr offers three main modes. The 'One Click' mode targets non-specialists and provides a ready-to-use pipeline chaining programs with recognized accuracy and speed: MUSCLE for multiple alignment, PhyML for tree building, and TreeDyn for tree rendering. All parameters are set up to suit most studies, and users only have to provide their input sequences to obtain a ready-to-print tree. The 'Advanced' mode uses the same pipeline but allows the parameters of each program to be customized by users. The 'A la Carte' mode offers more flexibility and sophistication, as users can build their own pipeline by selecting and setting up the required steps from a large choice of tools to suit their specific needs. Prior to phylogenetic analysis, users can also collect neighbors of a query sequence by running BLAST on general or specialized databases. A guide tree then helps to select neighbor sequences to be used as input for the phylogeny pipeline. Phylogeny.fr is available at: http://www.phylogeny.fr/

  8. Data on phylogenetic analyses of gazelles (genus Gazella) based on mitochondrial and nuclear intron markers

    PubMed Central

    Lerp, Hannes; Klaus, Sebastian; Allgöwer, Stefanie; Wronski, Torsten; Pfenninger, Markus; Plath, Martin

    2016-01-01

    The data provided is related to the article “Phylogenetic analyses of gazelles reveal repeated transitions of key ecological traits and provide novel insights into the origin of the genus Gazella” [1]. The data is based on 48 tissue samples of all nine extant species of the genus Gazella, namely Gazella gazella, Gazella arabica, Gazella bennettii, Gazella cuvieri, Gazella dorcas, Gazella leptoceros, Gazella marica, Gazella spekei, and Gazella subgutturosa and four related taxa (Saiga tatarica, Antidorcas marsupialis, Antilope cervicapra and Eudorcas rufifrons). It comprises alignments of sequences of a cytochrome b data set and of six nuclear intron markers. For the latter new primers were designed based on cattle and sheep genomes. Based on these alignments phylogenetic trees were inferred using Bayesian Inference and Maximum Likelihood methods. Furthermore, ancestral character states (inferred with BayesTraits 1.0) and ancestral ranges based on a Dispersal-Extinction-Cladogenesis model were estimated and results׳ files were stored within this article. PMID:27054158

  9. Data on phylogenetic analyses of gazelles (genus Gazella) based on mitochondrial and nuclear intron markers.

    PubMed

    Lerp, Hannes; Klaus, Sebastian; Allgöwer, Stefanie; Wronski, Torsten; Pfenninger, Markus; Plath, Martin

    2016-06-01

    The data provided is related to the article "Phylogenetic analyses of gazelles reveal repeated transitions of key ecological traits and provide novel insights into the origin of the genus Gazella" [1]. The data is based on 48 tissue samples of all nine extant species of the genus Gazella, namely Gazella gazella, Gazella arabica, Gazella bennettii, Gazella cuvieri, Gazella dorcas, Gazella leptoceros, Gazella marica, Gazella spekei, and Gazella subgutturosa and four related taxa (Saiga tatarica, Antidorcas marsupialis, Antilope cervicapra and Eudorcas rufifrons). It comprises alignments of sequences of a cytochrome b data set and of six nuclear intron markers. For the latter new primers were designed based on cattle and sheep genomes. Based on these alignments phylogenetic trees were inferred using Bayesian Inference and Maximum Likelihood methods. Furthermore, ancestral character states (inferred with BayesTraits 1.0) and ancestral ranges based on a Dispersal-Extinction-Cladogenesis model were estimated and results׳ files were stored within this article. PMID:27054158

  10. Global Alignment System for Large Genomic Sequencing

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  11. Multigene Phylogenetics Reveals Temporal Diversification of Major African Malaria Vectors

    PubMed Central

    Kamali, Maryam; Marek, Paul E.; Peery, Ashley; Antonio-Nkondjio, Christophe; Ndo, Cyrille; Tu, Zhijian; Simard, Frederic; Sharakhov, Igor V.

    2014-01-01

    The major vectors of malaria in sub-Saharan Africa belong to subgenus Cellia. Yet, phylogenetic relationships and temporal diversification among African mosquito species have not been unambiguously determined. Knowledge about vector evolutionary history is crucial for correct interpretation of genetic changes identified through comparative genomics analyses. In this study, we estimated a molecular phylogeny using 49 gene sequences for the African malaria vectors An. gambiae, An. funestus, An. nili, the Asian malaria mosquito An. stephensi, and the outgroup species Culex quinquefasciatus and Aedes aegypti. To infer the phylogeny, we identified orthologous sequences uniformly distributed approximately every 5 Mb in the five chromosomal arms. The sequences were aligned and the phylogenetic trees were inferred using maximum likelihood and neighbor-joining methods. Bayesian molecular dating using a relaxed log normal model was used to infer divergence times. Trees from individual genes agreed with each other, placing An. nili as a basal clade that diversified from the studied malaria mosquito species 47.6 million years ago (mya). Other African malaria vectors originated more recently, and independently acquired traits related to vectorial capacity. The lineage leading to An. gambiae diverged 30.4 mya, while the African vector An. funestus and the Asian vector An. stephensi were the most closely related sister taxa that split 20.8 mya. These results were supported by consistently high bootstrap values in concatenated phylogenetic trees generated individually for each chromosomal arm. Genome-wide multigene phylogenetic analysis is a useful approach for discerning historic relationships among malaria vectors, providing a framework for the correct interpretation of genomic changes across species, and comprehending the evolutionary origins of this ubiquitous and deadly insect-borne disease. PMID:24705448

  12. The phylogenetic likelihood library.

    PubMed

    Flouri, T; Izquierdo-Carrasco, F; Darriba, D; Aberer, A J; Nguyen, L-T; Minh, B Q; Von Haeseler, A; Stamatakis, A

    2015-03-01

    We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2-10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required). The PLL is available at http://www.libpll.org under the GNU General Public License (GPL).

  13. Molecular identification and phylogenetic study of Demodex caprae.

    PubMed

    Zhao, Ya-E; Cheng, Juan; Hu, Li; Ma, Jun-Xian

    2014-10-01

    The DNA barcode has been widely used in species identification and phylogenetic analysis since 2003, but there have been no reports in Demodex. In this study, to obtain an appropriate DNA barcode for Demodex, molecular identification of Demodex caprae based on mitochondrial cox1 was conducted. Firstly, individual adults and eggs of D. caprae were obtained for genomic DNA (gDNA) extraction; Secondly, mitochondrial cox1 fragment was amplified, cloned, and sequenced; Thirdly, cox1 fragments of D. caprae were aligned with those of other Demodex retrieved from GenBank; Finally, the intra- and inter-specific divergences were computed and the phylogenetic trees were reconstructed to analyze phylogenetic relationship in Demodex. Results obtained from seven 429-bp fragments of D. caprae showed that sequence identities were above 99.1% among three adults and four eggs. The intraspecific divergences in D. caprae, Demodex folliculorum, Demodex brevis, and Demodex canis were 0.0-0.9, 0.5-0.9, 0.0-0.2, and 0.0-0.5%, respectively, while the interspecific divergences between D. caprae and D. folliculorum, D. canis, and D. brevis were 20.3-20.9, 21.8-23.0, and 25.0-25.3, respectively. The interspecific divergences were 10 times higher than intraspecific ones, indicating considerable barcoding gap. Furthermore, the phylogenetic trees showed that four Demodex species gathered separately, representing independent species; and Demodex folliculorum gathered with canine Demodex, D. caprae, and D. brevis in sequence. In conclusion, the selected 429-bp mitochondrial cox1 gene is an appropriate DNA barcode for molecular classification, identification, and phylogenetic analysis of Demodex. D. caprae is an independent species and D. folliculorum is closer to D. canis than to D. caprae or D. brevis.

  14. Evaluating the phylogenetic signal limit from mitogenomes, slow evolving nuclear genes, and the concatenation approach. New insights into the Lacertini radiation using fast evolving nuclear genes and species trees.

    PubMed

    Mendes, Joana; Harris, D James; Carranza, Salvador; Salvi, Daniele

    2016-07-01

    Estimating the phylogeny of lacertid lizards, and particularly the tribe Lacertini has been challenging, possibly due to the fast radiation of this group resulting in a hard polytomy. However this is still an open question, as concatenated data primarily from mitochondrial markers have been used so far whereas in a recent phylogeny based on a compilation of these data within a squamate supermatrix the basal polytomy seems to be resolved. In this study, we estimate phylogenetic relationships between all Lacertini genera using for the first time DNA sequences from five fast evolving nuclear genes (acm4, mc1r, pdc, βfib and reln) and two mitochondrial genes (nd4 and 12S). We generated a total of 529 sequences from 88 species and used Maximum Likelihood and Bayesian Inference methods based on concatenated multilocus dataset as well as a coalescent-based species tree approach with the aim of (i) shedding light on the basal relationships of Lacertini (ii) assessing the monophyly of genera which were previously questioned, and (iii) discussing differences between estimates from this and previous studies based on different markers, and phylogenetic methods. Results uncovered (i) a new phylogenetic clade formed by the monotypic genera Archaeolacerta, Zootoca, Teira and Scelarcis; and (ii) support for the monophyly of the Algyroides clade, with two sister species pairs represented by western (A. marchi and A. fitzingeri) and eastern (A. nigropunctatus and A. moreoticus) lineages. In both cases the members of these groups show peculiar morphology and very different geographical distributions, suggesting that they are relictual groups that were once diverse and widespread. They probably originated about 11-13 million years ago during early events of speciation in the tribe, and the split between their members is estimated to be only slightly older. This scenario may explain why mitochondrial markers (possibly saturated at higher divergence levels) or slower nuclear markers

  15. From a comb to a tree: phylogenetic relationships of the comb-footed spiders (Araneae, Theridiidae) inferred from nuclear and mitochondrial genes.

    PubMed

    Arnedo, Miquel A; Coddington, Jonathan; Agnarsson, Ingi; Gillespie, Rosemary G

    2004-04-01

    The family Theridiidae is one of the most diverse assemblages of spiders, from both a morphological and ecological point of view. The family includes some of the very few cases of sociality reported in spiders, in addition to bizarre foraging behaviors such as kleptoparasitism and araneophagy, and highly diverse web architecture. Theridiids are one of the seven largest families in the Araneae, with about 2200 species described. However, this species diversity is currently grouped in half the number of genera described for other spider families of similar species richness. Recent cladistic analyses of morphological data have provided an undeniable advance in identifying the closest relatives of the theridiids as well as establishing the family's monophyly. Nevertheless, the comb-footed spiders remain an assemblage of poorly defined genera, among which hypothesized relationships have yet to be examined thoroughly. Providing a robust cladistic structure for the Theridiidae is an essential step towards the clarification of the taxonomy of the group and the interpretation of the evolution of the diverse traits found in the family. Here we present results of a molecular phylogenetic analysis of a broad taxonomic sample of the family (40 taxa in 33 of the 79 currently recognized genera) and representatives of nine additional araneoid families, using approximately 2.5kb corresponding to fragments of three nuclear genes (Histone 3, 18SrDNA, and 28SrDNA) and two mitochondrial genes (16SrDNA and CoI). Several methods for incorporating indel information into the phylogenetic analysis are explored, and partition support for the different clades and sensitivity of the results to different assumptions of the analysis are examined as well. Our results marginally support theridiid monophyly, although the phylogenetic structure of the outgroup is unstable and largely contradicts current phylogenetic hypotheses based on morphological data. Several groups of theridiids receive strong

  16. Divergent ancestral lineages of newfound hantaviruses harbored by phylogenetically related crocidurine shrew species in Korea

    PubMed Central

    Arai, Satoru; Gu, Se Hun; Baek, Luck Ju; Tabara, Kenji; Bennett, Shannon; Oh, Hong-Shik; Takada, Nobuhiro; Kang, Hae Ji; Tanaka-Taya, Keiko; Morikawa, Shigeru; Okabe, Nobuhiko; Yanagihara, Richard; Song, Jin-Won

    2012-01-01

    Spurred by the recent isolation of a novel hantavirus, named Imjin virus (MJNV), from the Ussuri white-toothed shrew (Crocidura lasiura), targeted trapping was conducted for the phylogenetically related Asian lesser white-toothed shrew (Crocidura shantungensis). Pair-wise alignment and comparison of the S, M and L segments of a newfound hantavirus, designated Jeju virus (JJUV), indicated remarkably low nucleotide and amino acid sequence similarity with MJNV. Phylogenetic analyses, using maximum likelihood and Bayesian methods, showed divergent ancestral lineages for JJUV and MJNV, despite the close phylogenetic relationship of their reservoir soricid hosts. Also, no evidence of host switching was apparent in tanglegrams, generated by TreeMap 2.0β. PMID:22230701

  17. Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments

    SciTech Connect

    Pollard, Daniel A.; Moses, Alan M.; Iyer, Venky N.; Eisen,Michael B.

    2006-08-14

    Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and least accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and

  18. Quartets and unrooted phylogenetic networks.

    PubMed

    Gambette, Philippe; Berry, Vincent; Paul, Christophe

    2012-08-01

    Phylogenetic networks were introduced to describe evolution in the presence of exchanges of genetic material between coexisting species or individuals. Split networks in particular were introduced as a special kind of abstract network to visualize conflicts between phylogenetic trees which may correspond to such exchanges. More recently, methods were designed to reconstruct explicit phylogenetic networks (whose vertices can be interpreted as biological events) from triplet data. In this article, we link abstract and explicit networks through their combinatorial properties, by introducing the unrooted analog of level-k networks. In particular, we give an equivalence theorem between circular split systems and unrooted level-1 networks. We also show how to adapt to quartets some existing results on triplets, in order to reconstruct unrooted level-k phylogenetic networks. These results give an interesting perspective on the combinatorics of phylogenetic networks and also raise algorithmic and combinatorial questions.

  19. Quartets and unrooted phylogenetic networks.

    PubMed

    Gambette, Philippe; Berry, Vincent; Paul, Christophe

    2012-08-01

    Phylogenetic networks were introduced to describe evolution in the presence of exchanges of genetic material between coexisting species or individuals. Split networks in particular were introduced as a special kind of abstract network to visualize conflicts between phylogenetic trees which may correspond to such exchanges. More recently, methods were designed to reconstruct explicit phylogenetic networks (whose vertices can be interpreted as biological events) from triplet data. In this article, we link abstract and explicit networks through their combinatorial properties, by introducing the unrooted analog of level-k networks. In particular, we give an equivalence theorem between circular split systems and unrooted level-1 networks. We also show how to adapt to quartets some existing results on triplets, in order to reconstruct unrooted level-k phylogenetic networks. These results give an interesting perspective on the combinatorics of phylogenetic networks and also raise algorithmic and combinatorial questions. PMID:22809417

  20. Phylogenics & Tree-Thinking

    ERIC Educational Resources Information Center

    Baum, David A.; Offner, Susan

    2008-01-01

    Phylogenetic trees, which are depictions of the inferred evolutionary relationships among a set of species, now permeate almost all branches of biology and are appearing in increasing numbers in biology textbooks. While few state standards explicitly require knowledge of phylogenetics, most require some knowledge of evolutionary biology, and many…

  1. Insights into the phylogenetic positions of photosynthetic bacteria obtained from 5S rRNA and 16S rRNA sequence data

    NASA Technical Reports Server (NTRS)

    Fox, G. E.

    1985-01-01

    Comparisons of complete 16S ribosomal ribonucleic acid (rRNA) sequences established that the secondary structure of these molecules is highly conserved. Earlier work with 5S rRNA secondary structure revealed that when structural conservation exists the alignment of sequences is straightforward. The constancy of structure implies minimal functional change. Under these conditions a uniform evolutionary rate can be expected so that conditions are favorable for phylogenetic tree construction.

  2. On Determining if Tree-based Networks Contain Fixed Trees.

    PubMed

    Anaya, Maria; Anipchenko-Ulaj, Olga; Ashfaq, Aisha; Chiu, Joyce; Kaiser, Mahedi; Ohsawa, Max Shoji; Owen, Megan; Pavlechko, Ella; St John, Katherine; Suleria, Shivam; Thompson, Keith; Yap, Corrine

    2016-05-01

    We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is [Formula: see text]-hard to decide, by reduction from 3-Dimensional Matching (3DM) and further that the problem is fixed-parameter tractable. PMID:27125655

  3. FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies.

    PubMed

    Furnham, Nicholas; Sillitoe, Ian; Holliday, Gemma L; Cuff, Alison L; Rahman, Syed A; Laskowski, Roman A; Orengo, Christine A; Thornton, Janet M

    2012-01-01

    FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.

  4. Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure.

    PubMed

    DeGiorgio, Michael; Rosenberg, Noah A

    2016-08-01

    In the last few years, several statistically consistent consensus methods for species tree inference have been devised that are robust to the gene tree discordance caused by incomplete lineage sorting in unstructured ancestral populations. One source of gene tree discordance that has only recently been identified as a potential obstacle for phylogenetic inference is ancestral population structure. In this article, we describe a general model of ancestral population structure, and by relying on a single carefully constructed example scenario, we show that the consensus methods Democratic Vote, STEAC, STAR, R(∗) Consensus, Rooted Triple Consensus, Minimize Deep Coalescences, and Majority-Rule Consensus are statistically inconsistent under the model. We find that among the consensus methods evaluated, the only method that is statistically consistent in the presence of ancestral population structure is GLASS/Maximum Tree. We use simulations to evaluate the behavior of the various consensus methods in a model with ancestral population structure, showing that as the number of gene trees increases, estimates on the basis of GLASS/Maximum Tree approach the true species tree topology irrespective of the level of population structure, whereas estimates based on the remaining methods only approach the true species tree topology if the level of structure is low. However, through simulations using species trees both with and without ancestral population structure, we show that GLASS/Maximum Tree performs unusually poorly on gene trees inferred from alignments with little information. This practical limitation of GLASS/Maximum Tree together with the inconsistency of other methods prompts the need for both further testing of additional existing methods and development of novel methods under conditions that incorporate ancestral population structure.

  5. Phylogenetic position of foraminifera inferred from LSU rRNA gene sequences.

    PubMed

    Pawlowski, J; Bolivar, I; Guiard-Maffia, J; Gouy, M

    1994-11-01

    A 5'-terminal region of 1600-1800 base pairs was amplified, cloned, and sequenced in the large subunit rDNA (LSU rDNA) of four species of foraminifera. These sequences were compared with the homologous regions of 16 eukaryotic taxa in order to establish the phylogenetic position of foraminifera. Analysis of 610 unambiguously aligned bases shows that foraminifera branch closely to plasmodial and cellular slime molds in the middle of the eukaryotic tree--that is, much earlier than suggested by the fossil record. These data, the first DNA sequences reported for foraminifera, will help analyze this class of protists and the early evolution of eukaryotes.

  6. Update of phylogenetic and genetic diversity of Sporothrix schenckii sensu lato.

    PubMed

    Rangel-Gamboa, Lucía; Martínez-Hernandez, Fernando; Maravilla, Pablo; Arenas-Guzmán, Roberto; Flisser, Ana

    2016-03-01

    Sporothrix schenckii sensu lato causes subcutaneous mycosis. In this article we analysed its phylogeny and genetic diversity using calmodulin DNA sequences deposited in GenBank database. Population genetics indices were calculated, plus phylogenetic and haplotype network trees were built. Five clades with high values of posterior probability, 47 haplotypes and high diversity in the complex were found. Analysis of partial calmodulin sequences alignment revealed conserved and polymorphic regions that could be used as reference for taxonomic identification. The use of population genetics analysis allowed understanding the phylogenetic proximity of S. schenckii s. str. and S. brasiliensis; scarce genetic flow among them with low migration index and high ancestry coefficient was found. Similarly, S. globosa, S. mexicana and S. pallida sequences showed highly differentiated species with no genetic exchange. The phylogenetic tree suggests that S. mexicana shared a common ancestor with S. pallida; while S. globosa and S. brasiliensis are more related to S. schenckii s. str. and showed less haplotype diversity and restrictions in geographic distribution. In the haplotype network tree S. schenckii s. str. species displayed worldwide distribution without dispersion centres; while S. brasiliensis and S. globosa, exhibited Brazil and Euro-Asia as dispersion centres, respectively. Our data suggest that S. schenckii complex has been submitted to a divergent evolution process, probably due to the pressure of the environment and of the host. In contrast, S. brasiliensis could have been submitted to purifying selection or expansion process.

  7. Phylogenetics, classification, and biogeography of the treefrogs (Amphibia: Anura: Arboranae).

    PubMed

    Duellman, William E; Marion, Angela B; Hedges, S Blair

    2016-01-01

    A phylogenetic analysis of sequences from 503 species of hylid frogs and four outgroup taxa resulted in 16,128 aligned sites of 19 genes. The molecular data were subjected to a maximum likelihood analysis that resulted in a new phylogenetic tree of treefrogs. A conservative new classification based on the tree has (1) three families composing an unranked taxon, Arboranae, (2) nine subfamilies (five resurrected, one new), and (3) six resurrected generic names and five new generic names. Using the results of a maximum likelihood timetree, times of divergence were determined. For the most part these times of divergence correlated well with historical geologic events. The arboranan frogs originated in South America in the Late Mesozoic or Early Cenozoic. The family Pelodryadidae diverged from its South American relative, Phyllomedusidae, in the Eocene and invaded Australia via Antarctica. There were two dispersals from South America to North America in the Paleogene. One lineage was the ancestral stock of Acris and its relatives, whereas the other lineage, subfamily Hylinae, differentiated into a myriad of genera in Middle America. PMID:27394762

  8. Entanglement, Invariants, and Phylogenetics

    NASA Astrophysics Data System (ADS)

    Sumner, J. G.

    2007-10-01

    This thesis develops and expands upon known techniques of mathematical physics relevant to the analysis of the popular Markov model of phylogenetic trees required in biology to reconstruct the evolutionary relationships of taxonomic units from biomolecular sequence data. The techniques of mathematical physics are plethora and have been developed for some time. The Markov model of phylogenetics and its analysis is a relatively new technique where most progress to date has been achieved by using discrete mathematics. This thesis takes a group theoretical approach to the problem by beginning with a remarkable mathematical parallel to the process of scattering in particle physics. This is shown to equate to branching events in the evolutionary history of molecular units. The major technical result of this thesis is the derivation of existence proofs and computational techniques for calculating polynomial group invariant functions on a multi-linear space where the group action is that relevant to a Markovian time evolution. The practical results of this thesis are an extended analysis of the use of invariant functions in distance based methods and the presentation of a new reconstruction technique for quartet trees which is consistent with the most general Markov model of sequence evolution.

  9. Dual phylogenetic origins of Nigerian lions (Panthera leo).

    PubMed

    Tende, Talatu; Bensch, Staffan; Ottosson, Ulf; Hansson, Bengt

    2014-07-01

    Lion fecal DNA extracts from four individuals each from Yankari Game Reserve and Kainji-Lake National Park (central northeast and west Nigeria, respectively) were Sanger-sequenced for the mitochondrial cytochrome b gene. The sequences were aligned against 61 lion reference sequences from other parts of Africa and India. The sequence data were analyzed further for the construction of phylogenetic trees using the maximum-likelihood approach to depict phylogenetic patterns of distribution among sequences. Our results show that Nigerian lions grouped together with lions from West and Central Africa. At the smaller geographical scale, lions from Kainji-Lake National Park in western Nigeria grouped with lions from Benin (located west of Nigeria), whereas lions from Yankari Game Reserve in central northeastern Nigeria grouped with the lion populations in Cameroon (located east of Nigeria). The finding that the two remaining lion populations in Nigeria have different phylogenetic origins is an important aspect to consider in future decisions regarding management and conservation of rapidly shrinking lion populations in West Africa.

  10. Dual phylogenetic origins of Nigerian lions (Panthera leo)

    PubMed Central

    Tende, Talatu; Bensch, Staffan; Ottosson, Ulf; Hansson, Bengt

    2014-01-01

    Lion fecal DNA extracts from four individuals each from Yankari Game Reserve and Kainji-Lake National Park (central northeast and west Nigeria, respectively) were Sanger-sequenced for the mitochondrial cytochrome b gene. The sequences were aligned against 61 lion reference sequences from other parts of Africa and India. The sequence data were analyzed further for the construction of phylogenetic trees using the maximum-likelihood approach to depict phylogenetic patterns of distribution among sequences. Our results show that Nigerian lions grouped together with lions from West and Central Africa. At the smaller geographical scale, lions from Kainji-Lake National Park in western Nigeria grouped with lions from Benin (located west of Nigeria), whereas lions from Yankari Game Reserve in central northeastern Nigeria grouped with the lion populations in Cameroon (located east of Nigeria). The finding that the two remaining lion populations in Nigeria have different phylogenetic origins is an important aspect to consider in future decisions regarding management and conservation of rapidly shrinking lion populations in West Africa. PMID:25077018

  11. Phylogenetic analysis of Maverick/Polinton giant transposons across organisms.

    PubMed

    Haapa-Paananen, Saija; Wahlberg, Niklas; Savilahti, Harri

    2014-09-01

    Polintons are a recently discovered group of large transposable elements (<40Kb in size) encoding up to 10 different proteins. The increasing number of genome sequencing projects has led to the discovery of these elements in genomes of protists, fungi, and animals, but not in plants. The RepBase database of eukaryotic repetitive elements currently contains consensus sequences and information of 70 Polinton elements from 28 organisms. Previous phylogenetic analyses have shown the relationship of Polintons to linear plasmids, bacteriophages, and retroviruses. However, a comprehensive phylogenetic analysis of all known Polintons has been lacking. We retrieved the Polinton consensus sequences from the most recent version of RepBase, and compiled amino acid sequences for the two most common Polinton-specific genes, the DNA polymerase-B and retroviral-like integrase. Open reading frame predictions and homology comparisons revealed partial or full sequences for 54 polymerases and 55 Polinton integrases. Multiple sequence alignments portrayed conservation in several functional motifs of these proteins. Phylogenetic analyses based on Bayesian inference using single- and combined-gene datasets revealed seven distinct lineages of Polintons that broadly follow the tree of life. Two of the seven lineages are found within the same species, indicating that ancient divergences have been retained to this day.

  12. [Analysis phylogenetic relationship of Gynostemma (Cucurbitaceae)].

    PubMed

    Qin, Shuang-shuang; Li, Hai-tao; Wang, Zhou-yong; Cui, Zhan-hu; Yu, Li-ying

    2015-05-01

    The sequences of ITS, matK, rbcL and psbA-trnH of 9 Gynostemma species or variety including 38 samples were compared and analyzed by molecular phylogeny method. Hemsleya macrosperma was designated as outgroup. The MP and NJ phylogenetic tree of Gynostemma was built based on ITS sequence, the results of PAUP phylogenetic analysis showed the following results: (1) The eight individuals of G. pentaphyllum var. pentaphyllum were not supported as monophyletic in the strict consensus trees and NJ trees. (2) It is suspected whether G. longipes and G. laxum should be classified as the independent species. (3)The classification of subgenus units of Gynostemma plants is supported.

  13. Efficient Exploration of the Space of Reconciled Gene Trees

    PubMed Central

    Szöllősi, Gergely J.; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent

    2013-01-01

    Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source

  14. Alignment-free protein interaction network comparison

    PubMed Central

    Ali, Waqar; Rito, Tiago; Reinert, Gesine; Sun, Fengzhu; Deane, Charlotte M.

    2014-01-01

    Motivation: Biological network comparison software largely relies on the concept of alignment where close matches between the nodes of two or more networks are sought. These node matches are based on sequence similarity and/or interaction patterns. However, because of the incomplete and error-prone datasets currently available, such methods have had limited success. Moreover, the results of network alignment are in general not amenable for distance-based evolutionary analysis of sets of networks. In this article, we describe Netdis, a topology-based distance measure between networks, which offers the possibility of network phylogeny reconstruction. Results: We first demonstrate that Netdis is able to correctly separate different random graph model types independent of network size and density. The biological applicability of the method is then shown by its ability to build the correct phylogenetic tree of species based solely on the topology of current protein interaction networks. Our results provide new evidence that the topology of protein interaction networks contains information about evolutionary processes, despite the lack of conservation of individual interactions. As Netdis is applicable to all networks because of its speed and simplicity, we apply it to a large collection of biological and non-biological networks where it clusters diverse networks by type. Availability and implementation: The source code of the program is freely available at http://www.stats.ox.ac.uk/research/proteins/resources. Contact: w.ali@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161230

  15. Maximum Parsimony on Phylogenetic networks

    PubMed Central

    2012-01-01

    Background Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past. Results In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores. Conclusion The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are

  16. Phylogenomics-Based Reconstruction of Protozoan Species Tree

    PubMed Central

    Ocaña, Kary A.C.S.; Dávila, Alberto M.R.

    2011-01-01

    We have developed a semi-automatic methodology to reconstruct the phylogenetic species tree in Protozoa, integrating different phylogenetic algorithms and programs, and demonstrating the utility of a supermatrix approach to construct phylogenomics-based trees using 31 universal orthologs (UO). The species tree obtained was formed by three major clades that were related to three groups of data: i) Species containing at least 80% of UO (25/31) in the concatenated multiple alignment or supermatrix, this clade was called C1, ii) Species containing between 50%–79% (15–24/31) of UO called C2, and iii) Species containing less than 50% (1–14/31) of UO called C3. C1 was composed by only protozoan species, C2 was composed by species related to Protozoa, and C3 was composed by some species of C1 (Protozoa) and C2 (related to Protozoa). Our phylogenomics-based methodology using a supermatrix approach proved to be reliable with protozoan genome data and using at least 25 UO, suggesting that (a) the more UO used the better, (b) using the entire UO sequence or just a conserved block of it for the supermatrix produced similar phylogenomic trees. PMID:21863127

  17. [Phylogenetic analysis of Pleurotus species].

    PubMed

    Shnyreva, A A; Shnyreva, A V

    2015-02-01

    We performed phylogenetic analysis for ten Pleurotus species, based on internal transcribed spacer (ITS) sequences of rDNA. A phylogenetic tree was constructed on the basis of 31 oyster fungi strains of different origin and 10 reference sequences from GenBank. Our analysis demonstrates that the tested Pleurotus species are of monophyletic origin. We evaluated the evolutionary distances between these species. Classic genetic analysis of sexual compatibility based on monocaryon (mon)-mon crosses showed no reproductive barriers within the P. cornucopiae-P. euosmus species complex. Thus, despite the divergence (subclustering) between commercial strains and natural isolates of P. ostreatus revealed by phylogenetic analysis, there is no reproductive isolation between these groups. A common allele of the matB locus was identified for the commercial strains Sommer and L/4, supporting the common origin of these strains. PMID:25966583

  18. A Metric on the Space of Partly Reduced Phylogenetic Networks

    PubMed Central

    2016-01-01

    Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of evolutionary events acting at the population level, such as recombination between genes, hybridization between lineages, and horizontal gene transfer. The researchers have designed several measures for computing the dissimilarity between two phylogenetic networks, and each measure has been proven to be a metric on a special kind of phylogenetic networks. However, none of the existing measures is a metric on the space of partly reduced phylogenetic networks. In this paper, we provide a metric, de-distance, on the space of partly reduced phylogenetic networks, which is polynomial-time computable. PMID:27419137

  19. Genome trees constructed using five different approaches suggest new major bacterial clades

    PubMed Central

    Wolf, Yuri I; Rogozin, Igor B; Grishin, Nick V; Tatusov, Roman L; Koonin, Eugene V

    2001-01-01

    Background The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes. Results Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the

  20. Molecular epidemiology and phylogenetic analysis of Dengue virus type-1 and 2 isolated in Malaysia

    PubMed Central

    Chew, Muhd Hasyim; Rahman, Md. Mostafizur; Hussin, Salasawati

    2015-01-01

    Objective: Detection of different serotypes of dengue virus and provide information on origin, distribution and genotype of the virus. Methods: Dengue virus serotypes identified as DEN-1 and DEN-2 were amplified and sequenced with E gene. The consensus sequences were aligned with references E gene sequences of globally available GenBank. Phylogenetic analysis was performed using Neighbor-joining and Kimura 2-parameter model to construct phylogenetic tree. Results: A total of 53 dengue virus isolates were positive, of which 38 (71.7%) were DENV-1 and 15 (28.3%) were DENV-2. Phylogenetic tree of DENV-1 and DENV-2 showed that the isolates were clustered in genotype I and cosmopolitan genotype, respectively considered the predominant genotypes in Southeast Asian countries. The molecular epidemiology genotype I DENV-1 and cosmopolitan genotype DENV-2 have been co-circulating in Klang Valley areas, Malaysia without shifting of genotype. Conclusion: The study reveals that DENV-1 and DENV-2 have been circulating in Malaysia. The isolates are clustered in genotype 1 and cosmopolitian genotype, respectively. The study results would help in planning for prevention and control of dengue virus in Malaysia. PMID:26150855

  1. Phylogenetics of the laboratory rat Rattus norvegicus.

    PubMed

    Canzian, F

    1997-03-01

    A genealogic tree was constructed for inbred strains of the laboratory rat, including 63 strains and 214 of their substrains. Information on genetic and biochemical marker typings of these lines was collected from the literature and from the World Wide Web. Data on 995 polymorphisms were processed into a phylogenetic distance matrix, and a tree was obtained by the Fitch-Margoliash distance matrix method. The inbred strains of the laboratory rat showed an average polymorphism for pairwise comparison of 53%. Strain BN showed the highest genetic divergence from all the other ones. Comparison with the mouse phylogenetic tree indicated that laboratory rats possess a higher diversity than inbred strains of mice not derived from wild species. These results provide a phylogenetic basis in the choice of rat strains for genetic linkage experiments.

  2. Phylogenetic analysis of Brassiceae based on the nucleotide sequences of the S-locus related gene, SLR1.

    PubMed

    Inaba, Ryuichi; Nishio, Takeshi

    2002-12-01

    Nucleotide sequences of orthologs of the S-locus related gene, SLR1, in 20 species of Brassicaceae were determined and compared with the previously reported SLR1 sequences of six species. Identities of deduced amino-acid sequences with Brassica oleracea SLR1 ranged from 66.0% to 97.6%, and those with B. oleracea SRK and SLR2 were less than 62% and 55%, respectively. In multiple alignment of deduced amino-acid sequences, the 180-190th amino-acid residues from the initial methionine were highly variable, this variable region corresponding to hypervariable region I of SLG and SRK. A phylogenetic tree based on the deduced amino-acid sequences showed a close relationship of SLR1 orthologs of species in the Brassicinae and Raphaninae. Brassica nigra SLR1 was found to belong to the same clade as Sinapis arvensis and Diplotaxis siifolia, while the sequences of the other Brassica species belonged to another clade together with B. oleracea and Brassica rapa. The phylogenetic tree was similar to previously reported trees constructed using the data of electrophoretic band patterns of chloroplast DNA, though minor differences were found. Based on synonymous substitution rates in SLR1, the diversification time of SLR1 orthologs between species in the Brassicinae was estimated. The evolution and function of SLR1 and the phylogenetic relationship of Brassiceae plants are discussed.

  3. Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

    PubMed

    Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

    2013-12-01

    Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.

  4. LSHPlace: fast phylogenetic placement using locality-sensitive hashing.

    PubMed

    Brown, Daniel G; Truszkowski, Jakub

    2013-01-01

    We consider the problem of phylogenetic placement, in which large numbers of sequences (often next-generation sequencing reads) are placed onto an existing phylogenetic tree. We adapt our recent work on phylogenetic tree inference, which uses ancestral sequence reconstruction and locality-sensitive hashing, to this domain. With these ideas, new sequences can be placed onto trees with high fidelity in strikingly fast runtimes. Our results are two orders of magnitude faster than existing programs for this domain, and show a modest accuracy tradeoff. Our results offer the possibility of analyzing many more reads in a next-generation sequencing project than is currently possible.

  5. Alignment validation

    SciTech Connect

    ALICE; ATLAS; CMS; LHCb; Golling, Tobias

    2008-09-06

    The four experiments, ALICE, ATLAS, CMS and LHCb are currently under constructionat CERN. They will study the products of proton-proton collisions at the Large Hadron Collider. All experiments are equipped with sophisticated tracking systems, unprecedented in size and complexity. Full exploitation of both the inner detector andthe muon system requires an accurate alignment of all detector elements. Alignmentinformation is deduced from dedicated hardware alignment systems and the reconstruction of charged particles. However, the system is degenerate which means the data is insufficient to constrain all alignment degrees of freedom, so the techniques are prone to converging on wrong geometries. This deficiency necessitates validation and monitoring of the alignment. An exhaustive discussion of means to validate is subject to this document, including examples and plans from all four LHC experiments, as well as other high energy experiments.

  6. Reconstructing phylogenetic networks using maximum parsimony.

    PubMed

    Nakhleh, Luay; Jin, Guohua; Zhao, Fengmei; Mellor-Crummey, John

    2005-01-01

    Phylogenies - the evolutionary histories of groups of organisms - are one of the most widely used tools throughout the life sciences, as well as objects of research within systematics, evolutionary biology, epidemiology, etc. Almost every tool devised to date to reconstruct phylogenies produces trees; yet it is widely understood and accepted that trees oversimplify the evolutionary histories of many groups of organims, most prominently bacteria (because of horizontal gene transfer) and plants (because of hybrid speciation). Various methods and criteria have been introduced for phylogenetic tree reconstruction. Parsimony is one of the most widely used and studied criteria, and various accurate and efficient heuristics for reconstructing trees based on parsimony have been devised. Jotun Hein suggested a straightforward extension of the parsimony criterion to phylogenetic networks. In this paper we formalize this concept, and provide the first experimental study of the quality of parsimony as a criterion for constructing and evaluating phylogenetic networks. Our results show that, when extended to phylogenetic networks, the parsimony criterion produces promising results. In a great majority of the cases in our experiments, the parsimony criterion accurately predicts the numbers and placements of non-tree events.

  7. Phylogenetic Analysis of Selected Menthol-Producing Species Belonging to the Lamiaceae Family.

    PubMed

    Mirzaei, Motahareh; Mirzaei, Hamed; Sahebkar, Amirhossein; Bagherian, Ali; Masoud Khoi, Mohammad Jaber; Reza Mirzaei, Hamid; Salehi, Rasoul; Reza Jaafari, Mahmoud; Kazemi Oskuee, Reza

    2015-01-01

    Menthol is an organic compound with diverse medicinal and commercial applications, and is made either synthetically or through extraction from mint oils. The aim of the present study was to investigate menthol levels in selected menthol-producing species belonging to the Lamiaceae family, and to determine phylogenetic relationships of menthol dehydrogenase gene sequence among these species. Three genus of Lamiaceae, namely Mentha, Salvia, and Micromeria, were selected for phytochemical and phylogenetic analyses. After identification of each species based on menthol dehydrogenase gene in NCBI, BLAST software was used for the sequence alignment. MEGA4 software was used to draw phylogenetic tree for various species. Phytochemical analysis revealed that the highest and lowest amounts of both essential oil and menthol belonged to Mentha spicata and Micromeria hyssopifolia, respectively. The species Mentha spicata and Mentha piperita, which were assigned to one cluster in the dendrogram, contained the highest amounts of essential oil and menthol while Micromeria species, which was in the distinct cluster and placed in the farther evolutionary distance, contained the lowest amount of essential oil and menthol. Phylogenetic and phytochemistry analyses showed that essential oil and menthol contents of menthol-producing species are associated with menthol dehydrogenase gene sequence. PMID:26252633

  8. The pentatricopeptide repeat (PPR) gene family, a tremendous resource for plant phylogenetic studies.

    PubMed

    Yuan, Yao-Wu; Liu, Chang; Marx, Hannah E; Olmstead, Richard G

    2009-01-01

    * Despite the paramount importance of nuclear gene data in plant phylogenetics, the search for candidate loci is believed to be challenging and time-consuming. Here we report that the pentatricopeptide repeat (PPR) gene family, containing hundreds of members in plant genomes, holds tremendous potential as nuclear gene markers. * We compiled a list of 127 PPR loci that are all intronless and have a single orthologue in both rice (Oryza sativa) and Arabidopsis thaliana. The uncorrected p-distances were calculated for these loci between two Arabidopsis species and among three Poaceae genera. We also selected 13 loci to evaluate their phylogenetic utility in resolving relationships among six Poaceae genera and nine diploid Oryza species. * PPR genes have a rapid rate of evolution and can be best used at intergeneric and interspecific levels. Although with substantial amounts of missing data, almost all individual data sets from the 13 loci generate well-resolved gene trees. * With the unique combination of three characteristics (having a large number of loci with established orthology assessment, being intronless, and being rapidly evolving), the PPR genes have many advantages as phylogenetic markers (e.g. straightforward alignment, minimal effort in generating sequence data, and versatile utilities). We perceive that these loci will play an important role in plant phylogenetics.

  9. Phylogenetic Analysis of Selected Menthol-Producing Species Belonging to the Lamiaceae Family.

    PubMed

    Mirzaei, Motahareh; Mirzaei, Hamed; Sahebkar, Amirhossein; Bagherian, Ali; Masoud Khoi, Mohammad Jaber; Reza Mirzaei, Hamid; Salehi, Rasoul; Reza Jaafari, Mahmoud; Kazemi Oskuee, Reza

    2015-01-01

    Menthol is an organic compound with diverse medicinal and commercial applications, and is made either synthetically or through extraction from mint oils. The aim of the present study was to investigate menthol levels in selected menthol-producing species belonging to the Lamiaceae family, and to determine phylogenetic relationships of menthol dehydrogenase gene sequence among these species. Three genus of Lamiaceae, namely Mentha, Salvia, and Micromeria, were selected for phytochemical and phylogenetic analyses. After identification of each species based on menthol dehydrogenase gene in NCBI, BLAST software was used for the sequence alignment. MEGA4 software was used to draw phylogenetic tree for various species. Phytochemical analysis revealed that the highest and lowest amounts of both essential oil and menthol belonged to Mentha spicata and Micromeria hyssopifolia, respectively. The species Mentha spicata and Mentha piperita, which were assigned to one cluster in the dendrogram, contained the highest amounts of essential oil and menthol while Micromeria species, which was in the distinct cluster and placed in the farther evolutionary distance, contained the lowest amount of essential oil and menthol. Phylogenetic and phytochemistry analyses showed that essential oil and menthol contents of menthol-producing species are associated with menthol dehydrogenase gene sequence.

  10. Phylogenetic analysis to uncover organellar origins of nuclear-encoded genes.

    PubMed

    Foth, Bernardo J

    2007-01-01

    Most proteins that are located in mitochondria or plastids are encoded by the nuclear genome, because the organellar genomes have undergone severe reduction during evolution. In many cases, although not all, the nuclear genes encoding organelle-targeted proteins actually originated from the respective organellar genome and thus carry the phylogenetic fingerprint that still bespeaks their evolutionary origin. Phylogenetic analysis is a powerful in silico method that can yield important insights into the evolutionary history or molecular kinship of any gene or protein and that can thus also be used more specifically in the context of organellar targeting as one means to recognize protein candidates (e.g., from genome data) that may be targeted to mitochondria or plastids. This chapter provides protocols for creating multiple sequence alignments and carrying out phylogenetic analysis with the robust and comprehensive software packages Clustal and PHYLIP, which are both available free of charge for multiple computer platforms. Besides presenting step-by-step instructions on how to run these computer programs, this chapter also covers topics such as data collection and presentation of phylogenetic trees. PMID:17951706

  11. The most parsimonious tree for random data.

    PubMed

    Fischer, Mareike; Galla, Michelle; Herbst, Lina; Steel, Mike

    2014-11-01

    Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree 'shapes'. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of k such characters, as we show. For k=2, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters k grows. However, again there is a twist: MP trees on six taxa for k=2 random binary characters are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts. Moreover, this shape bias appears, from simulations, to be more pronounced for larger values of k.

  12. A phylogenetic analysis of Aquifex pyrophilus

    NASA Technical Reports Server (NTRS)

    Burggraf, S.; Olsen, G. J.; Stetter, K. O.; Woese, C. R.

    1992-01-01

    The 16S rRNA of the bacterion Aquifex pyrophilus, a microaerophilic, oxygen-reducing hyperthermophile, has been sequenced directly from the the PCR amplified gene. Phylogenetic analyses show the Aq. pyrophilus lineage to be probably the deepest (earliest) in the (eu)bacterial tree. The addition of this deep branching to the bacterial tree further supports the argument that the Bacteria are of thermophilic ancestry.

  13. [Foundations of the new phylogenetics].

    PubMed

    Pavlinov, I Ia

    2004-01-01

    phylistics (Rasnitsyn's term; close to Simpsonian evolutionary taxonomy) belonging rather to the classical realm, and Hennigian cladistics that pays attention to origin of monophyletic taxa exclusively. In early of the 20th century, microevolutionary doctrine became predominating in evolutionary studies. Its core is the population thinking accompanied by the phenetic one based on equation of kinship to overall similarity. They were connected to positivist philosophy and hence were characterized by reductionism at both ontological and epistemological levels. It led to fall of classical phylogenetics but created the prerequisites for the new phylogenetics which also appeared to be full of reductionism. The new rise of phylogenetic (rather than tree) thinking during the last third of the 20th century was caused by lost of explanatory power of population one and by development of the new worldview and new epistemological premises. That new worldview is based on the synergetic (Prigoginian) model of development of non-equilibrium systems: evolution of the biota, a part of which is phylogeny, is considered as such a development. At epistemological level, the principal premise appeared to be fall of positivism which was replaced by post-positivism argumentation schemes. Input of cladistics into new phylogenetics is twofold. On the one hand, it reduced phylogeny to cladistic history lacking any adaptivist interpretation and presuming minimal evolution model. From this it followed reduction of kinship relation to sister-group relation lacking any reference to real time scale and to ancestor-descendant relation. On the other hand, cladistics elaborated methodology of phylogenetic reconstructions based on the synapomorphy principle, the outgroup concept became its part. The both inputs served as premises of incorporation of both numerical techniques and molecular data into phylogenetic reconstruction. Numerical phyletics provided the new phylogenetics with easily manipulated algorithms

  14. Correlated mutations in protein sequences: Phylogenetic and structural effects

    SciTech Connect

    Lapedes, A.S. |; Giraud, B.G.; Stormo, G.D.

    1998-12-01

    Covariation analysis of sets of aligned sequences for RNA molecules is relatively successful in elucidating RNA secondary structure, as well as some aspects of tertiary structure. Covariation analysis of sets of aligned sequences for protein molecules is successful in certain instances in elucidating certain structural and functional links, but in general, pairs of sites displaying highly covarying mutations in protein sequences do not necessarily correspond to sites that are spatially close in the protein structure. In this paper the authors identify two reasons why naive use of covariation analysis for protein sequences fails to reliably indicate sequence positions that are spatially proximate. The first reason involves the bias introduced in calculation of covariation measures due to the fact that biological sequences are generally related by a non-trivial phylogenetic tree. The authors present a null-model approach to solve this problem. The second reason involves linked chains of covariation which can result in pairs of sites displaying significant covariation even though they are not spatially proximate. They present a maximum entropy solution to this classic problem of causation versus correlation. The methodologies are validated in simulation.

  15. The tree of eukaryotes.

    PubMed

    Keeling, Patrick J; Burger, Gertraud; Durnford, Dion G; Lang, B Franz; Lee, Robert W; Pearlman, Ronald E; Roger, Andrew J; Gray, Michael W

    2005-12-01

    Recent advances in resolving the tree of eukaryotes are converging on a model composed of a few large hypothetical 'supergroups', each comprising a diversity of primarily microbial eukaryotes (protists, or protozoa and algae). The process of resolving the tree involves the synthesis of many kinds of data, including single-gene trees, multigene analyses, and other kinds of molecular and structural characters. Here, we review the recent progress in assembling the tree of eukaryotes, describing the major evidence for each supergroup, and where gaps in our knowledge remain. We also consider other factors emerging from phylogenetic analyses and comparative genomics, in particular lateral gene transfer, and whether such factors confound our understanding of the eukaryotic tree.

  16. PALI-a database of Phylogeny and ALIgnment of homologous protein structures.

    PubMed

    Balaji, S; Sujatha, S; Kumar, S S; Srinivasan, N

    2001-01-01

    PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous super-position (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 'orphans' (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The

  17. RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits

    PubMed Central

    Teeling, Hanno; Gloeckner, Frank Oliver

    2006-01-01

    Background Until today, analysis of 16S ribosomal RNA (rRNA) sequences has been the de-facto gold standard for the assessment of phylogenetic relationships among prokaryotes. However, the branching order of the individual phlya is not well-resolved in 16S rRNA-based trees. In search of an improvement, new phylogenetic methods have been developed alongside with the growing availability of complete genome sequences. Unfortunately, only a few genes in prokaryotic genomes qualify as universal phylogenetic markers and almost all of them have a lower information content than the 16S rRNA gene. Therefore, emphasis has been placed on methods that are based on multiple genes or even entire genomes. The concatenation of ribosomal protein sequences is one method which has been ascribed an improved resolution. Since there is neither a comprehensive database for ribosomal protein sequences nor a tool that assists in sequence retrieval and generation of respective input files for phylogenetic reconstruction programs, RibAlign has been developed to fill this gap. Results RibAlign serves two purposes: First, it provides a fast and scalable database that has been specifically adapted to eubacterial ribosomal protein sequences and second, it provides sophisticated import and export capabilities. This includes semi-automatic extraction of ribosomal protein sequences from whole-genome GenBank and FASTA files as well as exporting aligned, concatenated and filtered sequence files that can directly be used in conjunction with the PHYLIP and MrBayes phylogenetic reconstruction programs. Conclusion Up to now, phylogeny based on concatenated ribosomal protein sequences is hampered by the limited set of sequenced genomes and high computational requirements. However, hundreds of full and draft genome sequencing projects are on the way, and advances in cluster-computing and algorithms make phylogenetic reconstructions feasible even with large alignments of concatenated marker genes. RibAlign

  18. Estimating Bayesian Phylogenetic Information Content

    PubMed Central

    Lewis, Paul O.; Chen, Ming-Hui; Kuo, Lynn; Lewis, Louise A.; Fučíková, Karolina; Neupane, Suman; Wang, Yu-Bo; Shi, Daoyuan

    2016-01-01

    Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data. [Bayesian; concatenation; conditional clade distribution; entropy; information; phylogenetics; saturation.] PMID:27155008

  19. Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing.

    PubMed

    Sánchez, Rubén; Serra, François; Tárraga, Joaquín; Medina, Ignacio; Carbonell, José; Pulido, Luis; de María, Alejandro; Capella-Gutíerrez, Salvador; Huerta-Cepas, Jaime; Gabaldón, Toni; Dopazo, Joaquín; Dopazo, Hernán

    2011-07-01

    Phylemon 2.0 is a new release of the suite of web tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. It has been designed as a response to the increasing demand of molecular sequence analyses for experts and non-expert users. Phylemon 2.0 has several unique features that differentiates it from other similar web resources: (i) it offers an integrated environment that enables evolutionary analyses, format conversion, file storage and edition of results; (ii) it suggests further analyses, thereby guiding the users through the web server; and (iii) it allows users to design and save phylogenetic pipelines to be used over multiple genes (phylogenomics). Altogether, Phylemon 2.0 integrates a suite of 30 tools covering sequence alignment reconstruction and trimming; tree reconstruction, visualization and manipulation; and evolutionary hypotheses testing.

  20. QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically

    PubMed Central

    Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

    2015-01-01

    Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082

  1. QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically.

    PubMed

    Pérez-Quintero, Alvaro L; Lamy, Léo; Gordon, Jonathan L; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

    2015-01-01

    Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi.

  2. Phylogenetic informativeness reconciles ray-finned fish molecular divergence times

    PubMed Central

    2014-01-01

    Background Discordance among individual molecular age estimates, or between molecular age estimates and the fossil record, is observed in many clades across the Tree of Life. This discordance is attributed to a variety of variables including calibration age uncertainty, calibration placement, nucleotide substitution rate heterogeneity, or the specified molecular clock model. However, the impact of changes in phylogenetic informativeness of individual genes over time on phylogenetic inferences is rarely analyzed. Using nuclear and mitochondrial sequence data for ray-finned fishes (Actinopterygii) as an example, we extend the utility of phylogenetic informativeness profiles to predict the time intervals when nucleotide substitution saturation results in discordance among molecular ages estimated. Results We demonstrate that even with identical calibration regimes and molecular clock methods, mitochondrial based molecular age estimates are systematically older than those estimated from nuclear sequences. This discordance is most severe for highly nested nodes corresponding to more recent (i.e., Jurassic-Recent) divergences. By removing data deemed saturated, we reconcile the competing age estimates and highlight that the older mtDNA based ages were driven by nucleotide saturation. Conclusions Homoplasious site patterns in a DNA sequence alignment can systematically bias molecular divergence time estimates. Our study demonstrates that PI profiles can provide a non-arbitrary criterion for data exclusion to mitigate the influence of homoplasy on time calibrated branch length estimates. Analyses of actinopterygian molecular clocks demonstrate that scrutiny of the time scale on which sequence data is informative is a fundamental, but generally overlooked, step in molecular divergence time estimation. PMID:25103329

  3. Alignment fixture

    DOEpatents

    Bell, Grover C.; Gibson, O. Theodore

    1980-01-01

    A part alignment fixture is provided which may be used for precise variable lateral and tilt alignment relative to the fixture base of various shaped parts. The fixture may be used as a part holder for machining or inspection of parts or alignment of parts during assembly and the like. The fixture includes a precisely machined diameter disc-shaped hub adapted to receive the part to be aligned. The hub is nested in a guide plate which is adapted to carry two oppositely disposed pairs of positioning wedges so that the wedges may be reciprocatively positioned by means of respective micrometer screws. The sloping faces of the wedges contact the hub at respective quadrants of the hub periphery. The lateral position of the hub relative to the guide plate is adjusted by positioning the wedges with the associated micrometer screws. The tilt of the part is adjusted relative to a base plate, to which the guide plate is pivotally connected by means of a holding plate. Two pairs of oppositely disposed wedges are mounted for reciprocative lateral positioning by means of separate micrometer screws between flanges of the guide plate and the base plate. Once the wedges are positioned to achieve the proper tilt of the part or hub on which the part is mounted relative to the base plate, the fixture may be bolted to a machining, inspection, or assembly device.

  4. Curriculum Alignment.

    ERIC Educational Resources Information Center

    Crowell, Ronald; Tissot, Paula

    Curriculum alignment (CA) refers to the congruence of all the elements of a school's curriculum: curriculum goals; instructional program--what is taught and the materials used; and tests used to judge outcomes. CA can be a very powerful can be a very powerful factor in improving schools. Although further research is needed on CA, there is…

  5. Investigating the performance of AIC in selecting phylogenetic models.

    PubMed

    Jhwueng, Dwueng-Chwuan; Huzurbazar, Snehalata; O'Meara, Brian C; Liu, Liang

    2014-08-01

    The popular likelihood-based model selection criterion, Akaike's Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics. PMID:24867284

  6. Morphological and molecular convergences in mammalian phylogenetics

    PubMed Central

    Zou, Zhengting; Zhang, Jianzhi

    2016-01-01

    Phylogenetic trees reconstructed from molecular sequences are often considered more reliable than those reconstructed from morphological characters, in part because convergent evolution, which confounds phylogenetic reconstruction, is believed to be rarer for molecular sequences than for morphologies. However, neither the validity of this belief nor its underlying cause is known. Here comparing thousands of characters of each type that have been used for inferring the phylogeny of mammals, we find that on average morphological characters indeed experience much more convergences than amino acid sites, but this disparity is explained by fewer states per character rather than an intrinsically higher susceptibility to convergence for morphologies than sequences. We show by computer simulation and actual data analysis that a simple method for identifying and removing convergence-prone characters improves phylogenetic accuracy, potentially enabling, when necessary, the inclusion of morphologies and hence fossils for reliable tree inference. PMID:27585543

  7. Morphological and molecular convergences in mammalian phylogenetics.

    PubMed

    Zou, Zhengting; Zhang, Jianzhi

    2016-01-01

    Phylogenetic trees reconstructed from molecular sequences are often considered more reliable than those reconstructed from morphological characters, in part because convergent evolution, which confounds phylogenetic reconstruction, is believed to be rarer for molecular sequences than for morphologies. However, neither the validity of this belief nor its underlying cause is known. Here comparing thousands of characters of each type that have been used for inferring the phylogeny of mammals, we find that on average morphological characters indeed experience much more convergences than amino acid sites, but this disparity is explained by fewer states per character rather than an intrinsically higher susceptibility to convergence for morphologies than sequences. We show by computer simulation and actual data analysis that a simple method for identifying and removing convergence-prone characters improves phylogenetic accuracy, potentially enabling, when necessary, the inclusion of morphologies and hence fossils for reliable tree inference. PMID:27585543

  8. Phylogenetic Analysis of Poliovirus Sequences.

    PubMed

    Jorba, Jaume

    2016-01-01

    Comparative genomic sequencing is a major surveillance tool in the Polio Laboratory Network. Due to the rapid evolution of polioviruses (~1 % per year), pathways of virus transmission can be reconstructed from the pathways of genomic evolution. Here, we describe three main phylogenetic methods; estimation of genetic distances, reconstruction of a maximum-likelihood (ML) tree, and estimation of substitution rates using Bayesian Markov chain Monte Carlo (MCMC). The data set used consists of complete capsid sequences from a survey of poliovirus sequences available in GenBank. PMID:26983737

  9. Phylogenetic analysis of three genes of Penguinpox virus corresponding to Vaccinia virus G8R (VLTF-1), A3L (P4b) and H3L reveals that it is most closely related to Turkeypox virus, Ostrichpox virus and Pigeonpox virus.

    PubMed

    Carulei, Olivia; Douglass, Nicola; Williamson, Anna-Lise

    2009-01-01

    Phylogenetic analysis of three genes of Penguinpox virus, a novel Avipoxvirus isolated from African penguins, reveals its relationship to other poxviruses. The genes corresponding to Vaccinia virus G8R (VLTF-1), A3L (P4b) and H3L were sequenced and phylogenetic trees (Neighbour-Joining and UPGMA) constructed from MUSCLE nucleotide and amino acid alignments of the equivalent sequences from several different poxviruses. Based on this analysis, PEPV was confirmed to belong to the genus Avipoxvirus, specifically, clade A, subclade A2 and to be most closely related to Turkeypox virus (TKPV), Ostrichpox virus (OSPV)and Pigeonpox virus (PGPV).

  10. Probabilistic Graphical Model Representation in Phylogenetics

    PubMed Central

    Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

    2014-01-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559

  11. ALIGNING JIG

    DOEpatents

    Culver, J.S.; Tunnell, W.C.

    1958-08-01

    A jig or device is described for setting or aligning an opening in one member relative to another member or structure, with a predetermined offset, or it may be used for measuring the amount of offset with which the parts have previously been sct. This jig comprises two blocks rabbeted to each other, with means for securing thc upper block to the lower block. The upper block has fingers for contacting one of the members to be a1igmed, the lower block is designed to ride in grooves within the reference member, and calibration marks are provided to determine the amount of offset. This jig is specially designed to align the collimating slits of a mass spectrometer.

  12. Image alignment

    SciTech Connect

    Dowell, Larry Jonathan

    2014-04-22

    Disclosed is a method and device for aligning at least two digital images. An embodiment may use frequency-domain transforms of small tiles created from each image to identify substantially similar, "distinguishing" features within each of the images, and then align the images together based on the location of the distinguishing features. To accomplish this, an embodiment may create equal sized tile sub-images for each image. A "key" for each tile may be created by performing a frequency-domain transform calculation on each tile. A information-distance difference between each possible pair of tiles on each image may be calculated to identify distinguishing features. From analysis of the information-distance differences of the pairs of tiles, a subset of tiles with high discrimination metrics in relation to other tiles may be located for each image. The subset of distinguishing tiles for each image may then be compared to locate tiles with substantially similar keys and/or information-distance metrics to other tiles of other images. Once similar tiles are located for each image, the images may be aligned in relation to the identified similar tiles.

  13. Understanding phylogenetic incongruence: lessons from phyllostomid bats

    PubMed Central

    Dávalos, Liliana M; Cirranello, Andrea L; Geisler, Jonathan H; Simmons, Nancy B

    2012-01-01

    All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive

  14. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  15. The phylogenetic utility and functional constraint of microRNA flanking sequences

    PubMed Central

    Kenny, Nathan J.; Sin, Yung Wa; Hayward, Alexander; Paps, Jordi; Chu, Ka Hou; Hui, Jerome H. L.

    2015-01-01

    MicroRNAs (miRNAs) have recently risen to prominence as novel factors responsible for post-transcriptional regulation of gene expression. miRNA genes have been posited as highly conserved in the clades in which they exist. Consequently, miRNAs have been used as rare genome change characters to estimate phylogeny by tracking their gain and loss. However, their short length (21–23 bp) has limited their perceived utility in sequenced-based phylogenetic inference. Here, using reference taxa with established phylogenetic relationships, we demonstrate that miRNA sequences are of high utility in quantitative, rather than in qualitative, phylogenetic analysis. The clear orthology among miRNA genes from different species makes it straightforward to identify and align these sequences from even fragmentary datasets. We also identify significant sequence conservation in the regions directly flanking miRNA genes, and show that this too is of utility in phylogenetic analysis, as well as highlighting conserved regions that will be of interest to other fields. Employing miRNA sequences from 12 sequenced drosophilid genomes, together with a Tribolium castaneum outgroup, we demonstrate that this approach is robust using Bayesian and maximum-likelihood methods. The utility of these characters is further demonstrated in the rhabditid nematodes and primates. As next-generation sequencing makes it more cost-effective to sequence genomes and small RNA libraries, this methodology provides an alternative data source for phylogenetic analysis. The approach allows rapid resolution of relationships between both closely related and rapidly evolving species, and provides an additional tool for investigation of relationships within the tree of life. PMID:25694624

  16. The inference of gene trees with species trees.

    PubMed

    Szöllősi, Gergely J; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

    2015-01-01

    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.

  17. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided. PMID:17656792

  18. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided.

  19. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters

    PubMed Central

    Sela, Itamar; Ashkenazy, Haim; Katoh, Kazutaka; Pupko, Tal

    2015-01-01

    Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il. PMID:25883146

  20. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.

    PubMed

    Sela, Itamar; Ashkenazy, Haim; Katoh, Kazutaka; Pupko, Tal

    2015-07-01

    Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il.

  1. Prioritizing populations for conservation using phylogenetic networks.

    PubMed

    Volkmann, Logan; Martyn, Iain; Moulton, Vincent; Spillner, Andreas; Mooers, Arne O

    2014-01-01

    In the face of inevitable future losses to biodiversity, ranking species by conservation priority seems more than prudent. Setting conservation priorities within species (i.e., at the population level) may be critical as species ranges become fragmented and connectivity declines. However, existing approaches to prioritization (e.g., scoring organisms by their expected genetic contribution) are based on phylogenetic trees, which may be poor representations of differentiation below the species level. In this paper we extend evolutionary isolation indices used in conservation planning from phylogenetic trees to phylogenetic networks. Such networks better represent population differentiation, and our extension allows populations to be ranked in order of their expected contribution to the set. We illustrate the approach using data from two imperiled species: the spotted owl Strix occidentalis in North America and the mountain pygmy-possum Burramys parvus in Australia. Using previously published mitochondrial and microsatellite data, we construct phylogenetic networks and score each population by its relative genetic distinctiveness. In both cases, our phylogenetic networks capture the geographic structure of each species: geographically peripheral populations harbor less-redundant genetic information, increasing their conservation rankings. We note that our approach can be used with all conservation-relevant distances (e.g., those based on whole-genome, ecological, or adaptive variation) and suggest it be added to the assortment of tools available to wildlife managers for allocating effort among threatened populations. PMID:24586451

  2. Prioritizing Populations for Conservation Using Phylogenetic Networks

    PubMed Central

    Volkmann, Logan; Martyn, Iain; Moulton, Vincent; Spillner, Andreas; Mooers, Arne O.

    2014-01-01

    In the face of inevitable future losses to biodiversity, ranking species by conservation priority seems more than prudent. Setting conservation priorities within species (i.e., at the population level) may be critical as species ranges become fragmented and connectivity declines. However, existing approaches to prioritization (e.g., scoring organisms by their expected genetic contribution) are based on phylogenetic trees, which may be poor representations of differentiation below the species level. In this paper we extend evolutionary isolation indices used in conservation planning from phylogenetic trees to phylogenetic networks. Such networks better represent population differentiation, and our extension allows populations to be ranked in order of their expected contribution to the set. We illustrate the approach using data from two imperiled species: the spotted owl Strix occidentalis in North America and the mountain pygmy-possum Burramys parvus in Australia. Using previously published mitochondrial and microsatellite data, we construct phylogenetic networks and score each population by its relative genetic distinctiveness. In both cases, our phylogenetic networks capture the geographic structure of each species: geographically peripheral populations harbor less-redundant genetic information, increasing their conservation rankings. We note that our approach can be used with all conservation-relevant distances (e.g., those based on whole-genome, ecological, or adaptive variation) and suggest it be added to the assortment of tools available to wildlife managers for allocating effort among threatened populations. PMID:24586451

  3. Nibea coibor growth hormone gene: its phylogenetic significance, microsatellite variation and expression analysis.

    PubMed

    Zhang, Dianchang; Shao, Yanqing; Jiang, Shigui; Li, Jianzhu; Xu, Xinping

    2009-09-15

    The growth hormone (GH) gene has been characterized for a number of fishes and used to establish phylogenetic relationships and as a candidate gene for studies of genetic variation in connection with growth traits. In this study, we report the genomic structure of Nibea coibor GH (designated as ncGH) including its 5'-flanking region, being cloned by homology-cloning and chromosome walking methods. The ncGH gene spans approximately 3.0 kb and consists of six exons and five introns, as found for all cloned teleost GH genes with the exception of carps and catfish. The 5'-flanking region contains consensus sequences for a TATA box, a CRE, a pit-1alpha, a TRE, two HNF-3, a ERE and a GRE. Five microsatellites are identified in the ncGH gene and three of them are polymorphic marker. The open reading frame (ORF) of ncGH is 615 bp in length encoding a polypeptide of 204 amino acids with an estimated molecular mass of 23.04 kDa and theoretical isoelectric point of 7.79. The precursor of ncGH consists of a 17 amino-acid signal peptide and a 187 amino-acid mature peptide. The four Cys residues are located at conserved positions (Cys(69), Cys(177), Cys(194) and Cys(202)), and One possible site for N-glycosylation (Asn-X-Ser/Thr motif) is present at Asn(201). The coding region sequence of ncGH is used to align with the sequences of 18 other species from Percoidei and one species from Anabantoidei using Clustal X. A matrix of 612 bp was used to construct the phylogenetic trees using neighbor-joining and maximum parsimony methods. The phylogenetic trees by two methods are identical in most of the clades with high bootstrap support. Every family all forms independent monophyly on the phylogenetic trees, in the family, the different species also forms the monophyly according to the different genera. The results are also identical to those from morphological data, and demonstrated that the GH gene is very suitable for phylogenetic relationship analysis of Percoidei. To validate the

  4. Coalescent Histories for Lodgepole Species Trees.

    PubMed

    Disanto, Filippo; Rosenberg, Noah A

    2015-10-01

    Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the lodgepole species trees (λn)n ≥ 0, in which tree λn has m = 2n+1 taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with m!! in the number of taxa m. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with m taxa, increasing a previous bound of [Formula: see text] to [Formula: see text]. We discuss the implications of our enumerative results for phylogenetic computations. PMID:25973633

  5. Gene Tree Diameter for Deep Coalescence.

    PubMed

    Górecki, Paweł; Eulenstein, Oliver

    2015-01-01

    The deep coalescence cost accounts for discord caused by deep coalescence between a gene tree and a species tree. It is a major concern that the diameter of a gene tree (the tree's maximum deep coalescence cost across all species trees) depends on its topology, which can largely obfuscate phylogenetic studies. While this bias can be compensated by normalizing the deep coalescence cost using diameters, obtaining them efficiently has been posed as an open problem by Than and Rosenberg. Here, we resolve this problem by describing a linear time algorithm to compute the diameter of a gene tree. In addition, we provide a complete classification of the species trees yielding this diameter to guide phylogenetic analyses.

  6. Application of 16S rRNA, cytochrome b and control region sequences for understanding the phylogenetic relationships in Oryx species.

    PubMed

    Khan, H A; Arif, I A; Al Homaidan, A A; Al Farhan, A H

    2008-01-01

    The present study reports the application of mitochondrial markers for the molecular phylogeny of Oryx species, including the Arabian oryx (AO), scimitar-horned oryx (SHO) and plains oryx (PO), using the Addax as an outgroup. Sequences of three molecular markers, 16S rRNA, cytochrome b and a control region, for the above four taxa were aligned and the topologies of respective phylogenetic trees were compared. All these markers clearly differentiated the genus Addax from Oryx. However, for species-level grouping, while 16S rRNA and cytochrome b produced similar phylogeny (SHO grouped with PO), the control region grouped SHO with AO. Further studies are warranted to generate more sequencing data, apply multiple bioinformatics tools and to include relevant nuclear markers for phylogenetic analysis of Oryx species. PMID:19224456

  7. Reasoning over Taxonomic Change: Exploring Alignments for the Perelleschus Use Case

    PubMed Central

    Franz, Nico M.; Chen, Mingmin; Yu, Shizhuo; Kianmajd, Parisa; Bowers, Shawn; Ludäscher, Bertram

    2015-01-01

    Classifications and phylogenetic inferences of organismal groups change in light of new insights. Over time these changes can result in an imperfect tracking of taxonomic perspectives through the re-/use of Code-compliant or informal names. To mitigate these limitations, we introduce a novel approach for aligning taxonomies through the interaction of human experts and logic reasoners. We explore the performance of this approach with the Perelleschus use case of Franz & Cardona-Duque (2013). The use case includes six taxonomies published from 1936 to 2013, 54 taxonomic concepts (i.e., circumscriptions of names individuated according to their respective source publications), and 75 expert-asserted Region Connection Calculus articulations (e.g., congruence, proper inclusion, overlap, or exclusion). An Open Source reasoning toolkit is used to analyze 13 paired Perelleschus taxonomy alignments under heterogeneous constraints and interpretations. The reasoning workflow optimizes the logical consistency and expressiveness of the input and infers the set of maximally informative relations among the entailed taxonomic concepts. The latter are then used to produce merge visualizations that represent all congruent and non-congruent taxonomic elements among the aligned input trees. In this small use case with 6-53 input concepts per alignment, the information gained through the reasoning process is on average one order of magnitude greater than in the input. The approach offers scalable solutions for tracking provenance among succeeding taxonomic perspectives that may have differential biases in naming conventions, phylogenetic resolution, ingroup and outgroup sampling, or ostensive (member-referencing) versus intensional (property-referencing) concepts and articulations. PMID:25700173

  8. Metrics on multilabeled trees: interrelationships and diameter bounds.

    PubMed

    Huber, Katharina T; Spillner, Andreas; Suchecki, Radosław; Moulton, Vincent

    2011-01-01

    Multilabeled trees or MUL-trees, for short, are trees whose leaves are labeled by elements of some nonempty finite set X such that more than one leaf may be labeled by the same element of X. This class of trees includes phylogenetic trees and tree shapes. MUL-trees arise naturally in, for example, biogeography and gene evolution studies and also in the area of phylogenetic network reconstruction. In this paper, we introduce novel metrics which may be used to compare MUL-trees, most of which generalize well-known metrics on phylogenetic trees and tree shapes. These metrics can be used, for example, to better understand the space of MUL-trees or to help visualize collections of MUL-trees. In addition, we describe some relationships between the MUL-tree metrics that we present and also give some novel diameter bounds for these metrics. We conclude by briefly discussing some open problems as well as pointing out how MUL-tree metrics may be used to define metrics on the space of phylogenetic networks.

  9. Phylogenetic approaches to natural product structure prediction.

    PubMed

    Ziemert, Nadine; Jensen, Paul R

    2012-01-01

    Phylogenetics is the study of the evolutionary relatedness among groups of organisms. Molecular phylogenetics uses sequence data to infer these relationships for both organisms and the genes they maintain. With the large amount of publicly available sequence data, phylogenetic inference has become increasingly important in all fields of biology. In the case of natural product research, phylogenetic relationships are proving to be highly informative in terms of delineating the architecture and function of the genes involved in secondary metabolite biosynthesis. Polyketide synthases and nonribosomal peptide synthetases provide model examples in which individual domain phylogenies display different predictive capacities, resolving features ranging from substrate specificity to structural motifs associated with the final metabolic product. This chapter provides examples in which phylogeny has proven effective in terms of predicting functional or structural aspects of secondary metabolism. The basics of how to build a reliable phylogenetic tree are explained along with information about programs and tools that can be used for this purpose. Furthermore, it introduces the Natural Product Domain Seeker, a recently developed Web tool that employs phylogenetic logic to classify ketosynthase and condensation domains based on established enzyme architecture and biochemical function.

  10. PROMALS web server for accurate multiple protein sequence alignments.

    PubMed

    Pei, Jimin; Kim, Bong-Hyun; Tang, Ming; Grishin, Nick V

    2007-07-01

    Multiple sequence alignments are essential in homology inference, structure modeling, functional prediction and phylogenetic analysis. We developed a web server that constructs multiple protein sequence alignments using PROMALS, a progressive method that improves alignment quality by using additional homologs from PSI-BLAST searches and secondary structure predictions from PSIPRED. PROMALS shows higher alignment accuracy than other advanced methods, such as MUMMALS, ProbCons, MAFFT and SPEM. The PROMALS web server takes FASTA format protein sequences as input. The output includes a colored alignment augmented with information about sequence grouping, predicted secondary structures and positional conservation. The PROMALS web server is available at: http://prodata.swmed.edu/promals/ PMID:17452345

  11. PFAAT version 2.0: A tool for editing, annotating, and analyzing multiple sequence alignments

    PubMed Central

    Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S

    2007-01-01

    Background By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Results Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. Conclusion PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function. PMID:17931421

  12. The Inference of Gene Trees with Species Trees

    PubMed Central

    Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

    2015-01-01

    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970

  13. Global alignment: Finding rearrangements during alignment

    SciTech Connect

    Brudno, Michael; Malde, Sanket; Poliakov, Alexander; Do, Chuong B.; Couronne, Olivier; Dubchak, Inna; Batzoglou, Serafim

    2003-01-06

    Motivation: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps.

  14. High-resolution SAR11 ecotype dynamics at the Bermuda Atlantic Time-series Study site by phylogenetic placement of pyrosequences.

    PubMed

    Vergin, Kevin L; Beszteri, Bánk; Monier, Adam; Thrash, J Cameron; Temperton, Ben; Treusch, Alexander H; Kilpert, Fabian; Worden, Alexandra Z; Giovannoni, Stephen J

    2013-07-01

    Advances in next-generation sequencing technologies are providing longer nucleotide sequence reads that contain more information about phylogenetic relationships. We sought to use this information to understand the evolution and ecology of bacterioplankton at our long-term study site in the Western Sargasso Sea. A bioinformatics pipeline called PhyloAssigner was developed to align pyrosequencing reads to a reference multiple sequence alignment of 16S ribosomal RNA (rRNA) genes and assign them phylogenetic positions in a reference tree using a maximum likelihood algorithm. Here, we used this pipeline to investigate the ecologically important SAR11 clade of Alphaproteobacteria. A combined set of 2.7 million pyrosequencing reads from the 16S rRNA V1-V2 regions, representing 9 years at the Bermuda Atlantic Time-series Study (BATS) site, was quality checked and parsed into a comprehensive bacterial tree, yielding 929 036 Alphaproteobacteria reads. Phylogenetic structure within the SAR11 clade was linked to seasonally recurring spatiotemporal patterns. This analysis resolved four new SAR11 ecotypes in addition to five others that had been described previously at BATS. The data support a conclusion reached previously that the SAR11 clade diversified by subdivision of niche space in the ocean water column, but the new data reveal a more complex pattern in which deep branches of the clade diversified repeatedly across depth strata and seasonal regimes. The new data also revealed the presence of an unrecognized clade of Alphaproteobacteria, here named SMA-1 (Sargasso Mesopelagic Alphaproteobacteria, group 1), in the upper mesopelagic zone. The high-resolution phylogenetic analyses performed herein highlight significant, previously unknown, patterns of evolutionary diversification, within perhaps the most widely distributed heterotrophic marine bacterial clade, and strongly links to ecosystem regimes.

  15. High-resolution SAR11 ecotype dynamics at the Bermuda Atlantic Time-series Study site by phylogenetic placement of pyrosequences.

    PubMed

    Vergin, Kevin L; Beszteri, Bánk; Monier, Adam; Thrash, J Cameron; Temperton, Ben; Treusch, Alexander H; Kilpert, Fabian; Worden, Alexandra Z; Giovannoni, Stephen J

    2013-07-01

    Advances in next-generation sequencing technologies are providing longer nucleotide sequence reads that contain more information about phylogenetic relationships. We sought to use this information to understand the evolution and ecology of bacterioplankton at our long-term study site in the Western Sargasso Sea. A bioinformatics pipeline called PhyloAssigner was developed to align pyrosequencing reads to a reference multiple sequence alignment of 16S ribosomal RNA (rRNA) genes and assign them phylogenetic positions in a reference tree using a maximum likelihood algorithm. Here, we used this pipeline to investigate the ecologically important SAR11 clade of Alphaproteobacteria. A combined set of 2.7 million pyrosequencing reads from the 16S rRNA V1-V2 regions, representing 9 years at the Bermuda Atlantic Time-series Study (BATS) site, was quality checked and parsed into a comprehensive bacterial tree, yielding 929 036 Alphaproteobacteria reads. Phylogenetic structure within the SAR11 clade was linked to seasonally recurring spatiotemporal patterns. This analysis resolved four new SAR11 ecotypes in addition to five others that had been described previously at BATS. The data support a conclusion reached previously that the SAR11 clade diversified by subdivision of niche space in the ocean water column, but the new data reveal a more complex pattern in which deep branches of the clade diversified repeatedly across depth strata and seasonal regimes. The new data also revealed the presence of an unrecognized clade of Alphaproteobacteria, here named SMA-1 (Sargasso Mesopelagic Alphaproteobacteria, group 1), in the upper mesopelagic zone. The high-resolution phylogenetic analyses performed herein highlight significant, previously unknown, patterns of evolutionary diversification, within perhaps the most widely distributed heterotrophic marine bacterial clade, and strongly links to ecosystem regimes. PMID:23466704

  16. Vicariant patterns of fragmentation among gekkonid lizards of the genus Teratoscincus produced by the Indian collision: A molecular phylogenetic perspective and an area cladogram for Central Asia.

    PubMed

    Macey, J R; Wang, Y; Ananjeva, N B; Larson, A; Papenfuss, T J

    1999-08-01

    A well-supported phylogenetic hypothesis is presented for gekkonid lizards of the genus Teratoscincus. Phylogenetic relationships of four of the five species are investigated using 1733 aligned bases of mitochondrial DNA sequence from the genes encoding ND1 (subunit one of NADH dehydrogenase), tRNA(Ile), tRNA(Gln), tRNA(Met), ND2, tRNA(Trp), tRNA(Ala), tRNA(Asn), tRNA(Cys), tRNA(Tyr), and COI (subunit I of cytochrome c oxidase). A single most parsimonious tree depicts T. przewalskii and T. roborowskii as a monophyletic group, with T. scincus as their sister taxon and T. microlepis as the sister taxon to the clade containing the first three species. The aligned sequences contain 341 phylogenetically informative characters. Each node is supported by a bootstrap value of 100% and the shortest suboptimal tree requires 29 additional steps. Allozymic variation is presented for proteins encoded by 19 loci but these data are largely uninformative phylogenetically. Teratoscincus species occur on tectonic plates of Gondwanan origin that were compressed by the impinging Indian Subcontinent, resulting in massive montane uplifting along plate boundaries. Taxa occurring in China (Tarim Block) form a monophyletic group showing vicariant separation from taxa in former Soviet Central Asia and northern Afghanistan (Farah Block); alternative biogeographic hypotheses are statistically rejected. This vicariant event involved the rise of the Tien Shan-Pamir and is well dated to 10 million years before present. Using this date for separation of taxa occurring on opposite sides of the Tien Shan-Pamir, an evolutionary rate of 0.57% divergence per lineage per million years is calculated. This rate is similar to estimates derived from fish, bufonid frogs, and agamid lizards for the same region of the mitochondrial genome ( approximately 0.65% divergence per lineage per million years). Evolutionary divergence of the mitochondrial genome has a surprisingly stable rate across vertebrates. PMID

  17. A Deliberate Practice Approach to Teaching Phylogenetic Analysis

    PubMed Central

    Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.

    2013-01-01

    One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or “one-shot,” in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts. PMID:24297294

  18. Exploration of phylogenetic data using a global sequence analysis method

    PubMed Central

    Chapus, Charles; Dufraigne, Christine; Edwards, Scott; Giron, Alain; Fertil, Bernard; Deschavanne, Patrick

    2005-01-01

    Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis. PMID:16280081

  19. PhyloBLAST: facilitating phylogenetic analysis of BLAST results.

    PubMed

    Brinkman, F S; Wan, I; Hancock, R E; Rose, A M; Jones, S J

    2001-04-01

    PhyloBLAST is an internet-accessed application based on CGI/Perl programming that compares a users protein sequence to a SwissProt/TREMBL database using BLAST2 and then allows phylogenetic analyses to be performed on selected sequences from the BLAST output. Flexible features such as ability to input your own multiple sequence alignment and use PHYLIP program options provide additional web-based phylogenetic analysis functionality beyond the analysis of a BLAST result.

  20. Phylogenetic turnover along local environmental gradients in tropical forest communities.

    PubMed

    Baldeck, C A; Kembel, S W; Harms, K E; Yavitt, J B; John, R; Turner, B L; Madawala, S; Gunatilleke, N; Gunatilleke, S; Bunyavejchewin, S; Kiratiprayoon, S; Yaacob, A; Supardi, M N N; Valencia, R; Navarrete, H; Davies, S J; Chuyong, G B; Kenfack, D; Thomas, D W; Dalling, J W

    2016-10-01

    While the importance of local-scale habitat niches in shaping tree species turnover along environmental gradients in tropical forests is well appreciated, relatively little is known about the influence of phylogenetic signal in species' habitat niches in shaping local community structure. We used detailed maps of the soil resource and topographic variation within eight 24-50 ha tropical forest plots combined with species phylogenies created from the APG III phylogeny to examine how phylogenetic beta diversity (indicating the degree of phylogenetic similarity of two communities) was related to environmental gradients within tropical tree communities. Using distance-based redundancy analysis we found that phylogenetic beta diversity, expressed as either nearest neighbor distance or mean pairwise distance, was significantly related to both soil and topographic variation in all study sites. In general, more phylogenetic beta diversity within a forest plot was explained by environmental variables this was expressed as nearest neighbor distance versus mean pairwise distance (3.0-10.3 % and 0.4-8.8 % of variation explained among plots, respectively), and more variation was explained by soil resource variables than topographic variables using either phylogenetic beta diversity metric. We also found that patterns of phylogenetic beta diversity expressed as nearest neighbor distance were consistent with previously observed patterns of niche similarity among congeneric species pairs in these plots. These results indicate the importance of phylogenetic signal in local habitat niches in shaping the phylogenetic structure of tropical tree communities, especially at the level of close phylogenetic neighbors, where similarity in habitat niches is most strongly preserved.

  1. Phylogenetic turnover along local environmental gradients in tropical forest communities.

    PubMed

    Baldeck, C A; Kembel, S W; Harms, K E; Yavitt, J B; John, R; Turner, B L; Madawala, S; Gunatilleke, N; Gunatilleke, S; Bunyavejchewin, S; Kiratiprayoon, S; Yaacob, A; Supardi, M N N; Valencia, R; Navarrete, H; Davies, S J; Chuyong, G B; Kenfack, D; Thomas, D W; Dalling, J W

    2016-10-01

    While the importance of local-scale habitat niches in shaping tree species turnover along environmental gradients in tropical forests is well appreciated, relatively little is known about the influence of phylogenetic signal in species' habitat niches in shaping local community structure. We used detailed maps of the soil resource and topographic variation within eight 24-50 ha tropical forest plots combined with species phylogenies created from the APG III phylogeny to examine how phylogenetic beta diversity (indicating the degree of phylogenetic similarity of two communities) was related to environmental gradients within tropical tree communities. Using distance-based redundancy analysis we found that phylogenetic beta diversity, expressed as either nearest neighbor distance or mean pairwise distance, was significantly related to both soil and topographic variation in all study sites. In general, more phylogenetic beta diversity within a forest plot was explained by environmental variables this was expressed as nearest neighbor distance versus mean pairwise distance (3.0-10.3 % and 0.4-8.8 % of variation explained among plots, respectively), and more variation was explained by soil resource variables than topographic variables using either phylogenetic beta diversity metric. We also found that patterns of phylogenetic beta diversity expressed as nearest neighbor distance were consistent with previously observed patterns of niche similarity among congeneric species pairs in these plots. These results indicate the importance of phylogenetic signal in local habitat niches in shaping the phylogenetic structure of tropical tree communities, especially at the level of close phylogenetic neighbors, where similarity in habitat niches is most strongly preserved. PMID:27337965

  2. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  3. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

  4. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments

    PubMed Central

    Schwarz, Roland F.; Tamuri, Asif U.; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M.; Schultz, Jörg; Goldman, Nick

    2016-01-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  5. Phylogenetic relationships within Cornus (Cornaceae) based on 26S rDNA sequences.

    PubMed

    Fan, C

    2001-06-01

    Phylogenetic relationships within the dogwood genus Cornus have been highly controversial due to the great morphological heterogeneity. Earlier phylogenetic analyses of Cornus using chloroplast DNA (cpDNA) data (including rbcL and matK sequences, as well as restriction sites) and morphological characters suggested incongruent relationships within the genus. The present study generated sequence data from the nuclear gene 26S rDNA for Cornus to test the phylogenetic hypotheses based on cpDNA and morphological data. The 26S rDNA sequence data obtained represent 16 species, 13 from Cornus and three from outgroups, having an aligned length of 3380 bp. Both parsimony and maximum likelihood analyses of these sequences were conducted. Trees resulting from these analyses suggest relationships among subgroups of Cornus consistent with those inferred from cpDNA data. That is, the dwarf dogwood (subg. Arctocrania) and the big-bracted dogwood (subg. Cynoxylon and subg. Syncarpea) clades are sisters, which are, in turn, sister to the cornelian cherries (subg. Cornus and subg. Afrocrania). This red-fruited clade is sister to the blue- or white-fruited dogwoods (subg. Mesomora, subg. Kraniopsis, and subg. Yinquania). Within the blue- or white-fruited clade, C. oblonga (subg. Yinquania) is sister to the remainder, and subg. Mesomora is sister to subg. Kraniopsis. These relationships were also suggested by the combined 26S rDNA and cpDNA data, but with higher bootstrap and Bremer support in the combined analysis. The 26S rDNA sequence data of Cornus consist of 12 expansion segments spanning 1034 bp. These expansion segments evolve approximately four times as fast as the conserved core regions. The study provides an example of phylogenetic utility of 26S rDNA sequences below the genus level. PMID:11410478

  6. Relaxed Phylogenetics and Dating with Confidence

    PubMed Central

    Ho, Simon Y. W; Phillips, Matthew J

    2006-01-01

    In phylogenetics, the unrooted model of phylogeny and the strict molecular clock model are two extremes of a continuum. Despite their dominance in phylogenetic inference, it is evident that both are biologically unrealistic and that the real evolutionary process lies between these two extremes. Fortunately, intermediate models employing relaxed molecular clocks have been described. These models open the gate to a new field of “relaxed phylogenetics.” Here we introduce a new approach to performing relaxed phylogenetic analysis. We describe how it can be used to estimate phylogenies and divergence times in the face of uncertainty in evolutionary rates and calibration times. Our approach also provides a means for measuring the clocklikeness of datasets and comparing this measure between different genes and phylogenies. We find no significant rate autocorrelation among branches in three large datasets, suggesting that autocorrelated models are not necessarily suitable for these data. In addition, we place these datasets on the continuum of clocklikeness between a strict molecular clock and the alternative unrooted extreme. Finally, we present analyses of 102 bacterial, 106 yeast, 61 plant, 99 metazoan, and 500 primate alignments. From these we conclude that our method is phylogenetically more accurate and precise than the traditional unrooted model while adding the ability to infer a timescale to evolution. PMID:16683862

  7. 11. GAS STATION AND OLD ROAD ALIGNMENT, FACING S. VISITOR ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    11. GAS STATION AND OLD ROAD ALIGNMENT, FACING S. VISITOR CENTER BEHIND TREES. SAME CAMERA POSITION AS AZ-45-10. - South Entrance Road, Between South park boundary & Village Loop Road, Grand Canyon, Coconino County, AZ

  8. Genomic Repeat Abundances Contain Phylogenetic Signal

    PubMed Central

    Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.

    2015-01-01

    A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464

  9. Genomic repeat abundances contain phylogenetic signal.

    PubMed

    Dodsworth, Steven; Chase, Mark W; Kelly, Laura J; Leitch, Ilia J; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R

    2015-01-01

    A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution.

  10. In Silico Phylogenetic Analysis and Molecular Modelling Study of 2-Haloalkanoic Acid Dehalogenase Enzymes from Bacterial and Fungal Origin

    PubMed Central

    Satpathy, Raghunath; Konkimalla, V. B.; Ratha, Jagnyeswar

    2016-01-01

    2-Haloalkanoic acid dehalogenase enzymes have broad range of applications, starting from bioremediation to chemical synthesis of useful compounds that are widely distributed in fungi and bacteria. In the present study, a total of 81 full-length protein sequences of 2-haloalkanoic acid dehalogenase from bacteria and fungi were retrieved from NCBI database. Sequence analysis such as multiple sequence alignment (MSA), conserved motif identification, computation of amino acid composition, and phylogenetic tree construction were performed on these primary sequences. From MSA analysis, it was observed that the sequences share conserved lysine (K) and aspartate (D) residues in them. Also, phylogenetic tree indicated a subcluster comprised of both fungal and bacterial species. Due to nonavailability of experimental 3D structure for fungal 2-haloalkanoic acid dehalogenase in the PDB, molecular modelling study was performed for both fungal and bacterial sources of enzymes present in the subcluster. Further structural analysis revealed a common evolutionary topology shared between both fungal and bacterial enzymes. Studies on the buried amino acids showed highly conserved Leu and Ser in the core, despite variation in their amino acid percentage. Additionally, a surface exposed tryptophan was conserved in all of these selected models. PMID:26880911

  11. Phylogenetic comparison of metabolic capacities of organisms at genome level.

    PubMed

    Ma, Hong-Wu; Zeng, An-Ping

    2004-04-01

    Horizontal gene transfer (HGT) has been shown to widely spread in organisms by comparative genomic studies. However, its effect on the phylogenetic relationship of organisms, especially at a system level of different cellular functions, is still not well understood. In this work, we have constructed phylogenetic trees based on the enzyme, reaction, and gene contents of metabolic networks reconstructed from annotated genome information of 82 sequenced organisms. Results from different phylogenetic distance definitions and based on three different functional subsystems (i.e., metabolism, cellular processes, information storage and processing) were compared. Results based on the three different functional subsystems give different pictures on the phylogenetic relationship of organisms, reflecting the different extents of HGT in the different functional systems. In general, horizontal transfer is prevailing in genes for metabolism, but less in genes for information processing. Nevertheless, the major results of metabolic network-based phylogenetic trees are in good agreement with the tree based on 16S rRNA and genome trees, confirming the three domain classification and the close relationship between eukaryotes and archaea at the level of metabolic networks. These results strongly support the hypothesis that although HGT is widely distributed, it is nevertheless constrained by certain pre-existing metabolic organization principle(s) during the evolution. Further research is needed to identify the organization principle and constraints of metabolic network on HGT which have large impacts on understanding the evolution of life and in purposefully manipulating cellular metabolism.

  12. Identification of Tunisian Leishmania spp. by PCR amplification of cysteine proteinase B (cpb) genes and phylogenetic analysis.

    PubMed

    Chaouch, Melek; Fathallah-Mili, Akila; Driss, Mehdi; Lahmadi, Ramzi; Ayari, Chiraz; Guizani, Ikram; Ben Said, Moncef; Benabderrazak, Souha

    2013-03-01

    Discrimination of the Old World Leishmania parasites is important for diagnosis and epidemiological studies of leishmaniasis. We have developed PCR assays that allow the discrimination between Leishmania major, Leishmania tropica and Leishmania infantum Tunisian species. The identification was performed by a simple PCR targeting cysteine protease B (cpb) gene copies. These PCR can be a routine molecular biology tools for discrimination of Leishmania spp. from different geographical origins and different clinical forms. Our assays can be an informative source for cpb gene studying concerning drug, diagnostics and vaccine research. The PCR products of the cpb gene and the N-acetylglucosamine-1-phosphate transferase (nagt) Leishmania gene were sequenced and aligned. Phylogenetic trees of Leishmania based cpb and nagt sequences are close in topology and present the classic distribution of Leishmania in the Old World. The phylogenetic analysis has enabled the characterization and identification of different strains, using both multicopy (cpb) and single copy (nagt) genes. Indeed, the cpb phylogenetic analysis allowed us to identify the Tunisian Leishmania killicki species, and a group which gathers the least evolved isolates of the Leishmania donovani complex, that was originated from East Africa. This clustering confirms the African origin for the visceralizing species of the L. donovani complex. PMID:23228525

  13. Evolutionary relationships of the Critically Endangered frog Ericabatrachus baleensis Largen, 1991 with notes on incorporating previously unsampled taxa into large-scale phylogenetic analyses

    PubMed Central

    2014-01-01

    Background The phylogenetic relationships of many taxa remain poorly known because of a lack of appropriate data and/or analyses. Despite substantial recent advances, amphibian phylogeny remains poorly resolved in many instances. The phylogenetic relationships of the Ethiopian endemic monotypic genus Ericabatrachus has been addressed thus far only with phenotypic data and remains contentious. Results We obtained fresh samples of the now rare and Critically Endangered Ericabatrachus baleensis and generated DNA sequences for two mitochondrial and four nuclear genes. Analyses of these new data using de novo and constrained-tree phylogenetic reconstructions strongly support a close relationship between Ericabatrachus and Petropedetes, and allow us to reject previously proposed alternative hypotheses of a close relationship with cacosternines or Phrynobatrachus. Conclusions We discuss the implications of our results for the taxonomy, biogeography and conservation of E. baleensis, and suggest a two-tiered approach to the inclusion and analyses of new data in order to assess the phylogenetic relationships of previously unsampled taxa. Such approaches will be important in the future given the increasing availability of relevant mega-alignments and potential framework phylogenies. PMID:24612655

  14. Rapid ribosomal RNA sequencing and the phylogenetic analysis of protists.

    PubMed

    Johnson, A M; Baverstock, P R

    1989-04-01

    A newly described technique for rapidly obtaining the partial nucleotide sequence of ribosomal RNA is being applied to investigate phylogenetic relationships among living organisms. Alan Johnson and Peter Boverstock describe the importance of this method to parasitology in providing new information on the phylogenetic relationships of parasitic organisms previously placed in groups of convenience. The phylum Apicomplexo in particular, has been the object of much study using this technique, but the technology is likely to extend soon to the restructuring of the phylogenetic trees of many groups of parasites.

  15. Identification, phylogenetic evolutionary analysis of GDQY orf virus isolated from Qingyuan City, Guangdong Province, southern China.

    PubMed

    Duan, Chaohui; Liao, Meiying; Wang, Han; Luo, Xiaohong; Shao, Jing; Xu, Ying; Li, Wei; Hao, Wenbo; Luo, Shuhong

    2015-01-25

    Infection with the orf virus (ORFV) leads to contagious ecthyma, also called contagious pustular dermatitis, which usually affects sheep, goats and other small ruminants. It has a great distribution throughout the world and has also been reported to infect humans. Though many strains have been isolated from differing parts of mainland China, rarely has any strain been reported from the southern provinces of China. We studied a case of orf virus infection that occurred at Qingyuan City, Guangdong Province in southern China. An orf virus strain, GDQY, was successfully isolated and identified through cell culture techniques and transmission electron microscopy. Complete genes of ORFV011, ORFV059, ORFV106 and ORFV107 were amplified for the sequence analysis based on their nucleotide or amino acid level. In order to discuss the genetic variation, precise sequences were used to compare to other reference strains isolated from different districts or countries. Phylogenetic trees based on those strains were built up and evolutionary distances were calculated based on the alignment of their complete sequences. The typical structure of the orf virus was observed in cell-culture suspensions inoculated with GDQY, and the full-length of four genes was amplified and sequenced. Phylogenetic analysis indicated that GDQY is homologous to FJ-DS and CQ/WZ on ORFV011 nucleotides. ORFV059 may be more variable than ORFV011 based on the comparison between GDQY and other isolates. Genetic studies of ORFV106 and 107 are reported for the first time in the presented study.

  16. Phylogenetic Relationships and Species Delimitation in Pinus Section Trifoliae Inferrred from Plastid DNA

    PubMed Central

    Hernández-León, Sergio; Gernandt, David S.; Pérez de la Rosa, Jorge A.; Jardón-Barbolla, Lev

    2013-01-01

    Recent diversification followed by secondary contact and hybridization may explain complex patterns of intra- and interspecific morphological and genetic variation in the North American hard pines (Pinus section Trifoliae), a group of approximately 49 tree species distributed in North and Central America and the Caribbean islands. We concatenated five plastid DNA markers for an average of 3.9 individuals per putative species and assessed the suitability of the five regions as DNA bar codes for species identification, species delimitation, and phylogenetic reconstruction. The ycf1 gene accounted for the greatest proportion of the alignment (46.9%), the greatest proportion of variable sites (74.9%), and the most unique sequences (75 haplotypes). Phylogenetic analysis recovered clades corresponding to subsections Australes, Contortae, and Ponderosae. Sequences for 23 of the 49 species were monophyletic and sequences for another 9 species were paraphyletic. Morphologically similar species within subsections usually grouped together, but there were exceptions consistent with incomplete lineage sorting or introgression. Bayesian relaxed molecular clock analyses indicated that all three subsections diversified relatively recently during the Miocene. The general mixed Yule-coalescent method gave a mixed model estimate of only 22 or 23 evolutionary entities for the plastid sequences, which corresponds to less than half the 49 species recognized based on morphological species assignments. Including more unique haplotypes per species may result in higher estimates, but low mutation rates, recent diversification, and large effective population sizes may limit the effectiveness of this method to detect evolutionary entities. PMID:23936218

  17. Phylogenetic analysis of chloroplast matK gene from Zingiberaceae for plant DNA barcoding.

    PubMed

    Selvaraj, Dhivya; Sarma, Rajeev Kumar; Sathishkumar, Ramalingam

    2008-01-01

    MaturaseK gene (MatK) of chloroplast is highly conserved in plant systematics which is involved in Group II intron splicing. The size of the gene is 1500 bp in length, located with in the intron of trnK. In the present study, matK gene from Zingiberaceae was taken for the analysis of variants, parsimony site, patterns, transition/tranversion rates and phylogeny. The family of Zingiberaceae comprises 47 genera with medicinal values. The matK gene sequence have been obtained from genbank and used for the analysis. The sequence alignments were performed by Clustal X, transition/transversion rates were predicted by MEGA and phylogenetic analyses were carried out by PHYLIP package. The result indicates that the Zingiberaceae genus Afromonum, Alpinia, Globba, Curcuma and Zingiber shows polyphylogeny. The overall variants between the species are 24% and transition/transversion rate is 1.54. Phylogenetic tree was designed to identify the ideal regions that could be used for defining the inter and intera-generic relationships. From this study it could be concluded that the matK gene is a good candidate for DNA barcoding of plant family Zingiberaceae. PMID:19052662

  18. Phylogenetic effective sample size.

    PubMed

    Bartoszek, Krzysztof

    2016-10-21

    In this paper I address the question-how large is a phylogenetic sample? I propose a definition of a phylogenetic effective sample size for Brownian motion and Ornstein-Uhlenbeck processes-the regression effective sample size. I discuss how mutual information can be used to define an effective sample size in the non-normal process case and compare these two definitions to an already present concept of effective sample size (the mean effective sample size). Through a simulation study I find that the AICc is robust if one corrects for the number of species or effective number of species. Lastly I discuss how the concept of the phylogenetic effective sample size can be useful for biodiversity quantification, identification of interesting clades and deciding on the importance of phylogenetic correlations. PMID:27343033

  19. Phylogenetic lineages in Entomophthoromycota

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Entomophthoromycota Humber is one of five major phylogenetic lineages among the former phylum Zygomycota. These early terrestrial fungi share evolutionarily ancestral characters such as coenocytic mycelium and gametangiogamy as a sexual process resulting in zygospore formation. Previous molecular st...

  20. Impacts of Terraces on Phylogenetic Inference.

    PubMed

    Sanderson, Michael J; McMahon, Michelle M; Stamatakis, Alexandros; Zwickl, Derrick J; Steel, Mike

    2015-09-01

    Terraces are sets of trees with precisely the same likelihood or parsimony score, which can be induced by missing sequences in partitioned multi-locus phylogenetic data matrices. The potentially large set of trees on a terrace can be characterized by enumeration algorithms or consensus methods that exploit the pattern of partial taxon coverage in the data, independent of the sequence data themselves. Terraces can add ambiguity and complexity to phylogenetic inference, particularly in settings where inference is already challenging: data sets with many taxa and relatively few loci. In this article we present five new findings about terraces and their impacts on phylogenetic inference. First, we clarify assumptions about partitioning scheme model parameters that are necessary for the existence of terraces. Second, we explore the dependence of terrace size on partitioning scheme and indicate how to find the partitioning scheme associated with the largest terrace containing a given tree. Third, we highlight the impact of terrace size on bootstrap estimates of confidence limits in clades, and characterize the surprising result that the bootstrap proportion for a clade, as it is usually calculated, can be entirely determined by the frequency of bipartitions on a terrace, with some bipartitions receiving high support even when incorrect. Fourth, we dissect some effects of prior distributions of edge lengths on the computed posterior probabilities of clades on terraces, to understand an example in which long edges "attract" each other in Bayesian inference. Fifth, we describe how assuming relationships between edge-lengths of different loci, as an attempt to avoid terraces, can also be problematic when taxon coverage is partial, specifically when heterotachy is present. Finally, we discuss strategies for remediation of some of these problems. One promising approach finds a minimal set of taxa which, when deleted from the data matrix, reduces the size of a terrace to a

  1. Two Hybrid Algorithms for Multiple Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Naznin, Farhana; Sarker, Ruhul; Essam, Daryl

    2010-01-01

    In order to design life saving drugs, such as cancer drugs, the design of Protein or DNA structures has to be accurate. These structures depend on Multiple Sequence Alignment (MSA). MSA is used to find the accurate structure of Protein and DNA sequences from existing approximately correct sequences. To overcome the overly greedy nature of the well known global progressive alignment method for multiple sequence alignment, we have proposed two different algorithms in this paper; one is using an iterative approach with a progressive alignment method (PAMIM) and the second one is using a genetic algorithm with a progressive alignment method (PAMGA). Both of our methods started with a "kmer" distance table to generate single guide-tree. In the iterative approach, we have introduced two new techniques: the first technique is to generate Guide-trees with randomly selected sequences and the second is of shuffling the sequences inside that tree. The output of the tree is a multiple sequence alignment which has been evaluated by the Sum of Pairs Method (SPM) considering the real value data from PAM250. In our second GA approach, these two techniques are used to generate an initial population and also two different approaches of genetic operators are implemented in crossovers and mutation. To test the performance of our two algorithms, we have compared these with the existing well known methods: T-Coffee, MUSCEL, MAFFT and Probcon, using BAliBase benchmarks. The experimental results show that the first algorithm works well for some situations, where other existing methods face difficulties in obtaining better solutions. The proposed second method works well compared to the existing methods for all situations and it shows better performance over the first one.

  2. Tree Lifecycle.

    ERIC Educational Resources Information Center

    Nature Study, 1998

    1998-01-01

    Presents a Project Learning Tree (PLT) activity that has students investigate and compare the lifecycle of a tree to other living things and the tree's role in the ecosystem. Includes background material as well as step-by-step instructions, variation and enrichment ideas, assessment opportunities, and student worksheets. (SJR)

  3. Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial.

    PubMed

    Le, Si Quang; Gascuel, Olivier

    2010-05-01

    Amino acid substitution models are essential to most methods to infer phylogenies from protein data. These models represent the ways in which proteins evolve and substitutions accumulate along the course of time. It is widely accepted that the substitution processes vary depending on the structural configuration of the protein residues. However, this information is very rarely used in phylogenetic studies, though the 3-dimensional structure of dozens of thousands of proteins has been elucidated. Here, we reinvestigate the question in order to fill this gap. We use an improved estimation methodology and a very large database comprising 1471 nonredundant globular protein alignments with structural annotations to estimate new amino acid substitution models accounting for the secondary structure and solvent accessibility of the residues. These models incorporate a confidence coefficient that is estimated from the data and reflects the reliability and usefulness of structural annotations in the analyzed sequences. Our results with 300 independent test alignments show an impressive likelihood gain compared with standard models such as JTT or WAG. Moreover, the use of these models induces significant topological changes in the inferred trees, which should be of primary interest to phylogeneticists. Our data, models, and software are available for download from http://atgc.lirmm.fr/phyml-structure/.

  4. Molecular and phylogenetic analysis of pyridoxal phosphate-dependent acyltransferase of Exiguobacterium acetylicum.

    PubMed

    Rajendran, Narayanan; Smith, Colby; Mazhawidza, Williard

    2009-01-01

    The pyridoxal-5'-phosphate (PLP)-dependent family of enzymes is a very diverse group of proteins that metabolize small molecules like amino acids and sugars, and synthesize cofactors for other metabolic pathways through transamination, decarboxylation, racemization, and substitution reactions. In this study we employed degenerated primer-based PCR amplification, using genomic DNA isolated from the soil bacterium Exiguobacterium acetylicum strain SN as template. We revealed the presence of a PLP-dependent family of enzymes, such as PLP-dependent acyltransferase, and similarity to 8-amino-7-oxononoate synthase. Sequencing analysis and multiple alignment of the thymidine-adenine-cloned PCR amplicon revealed PLP-dependent family enzymes with specific confering codes and consensus amino acid residues specific to this group of functional proteins. Amino acid residues common to the majority of PLP-dependent enzymes were also revealed by the Lasergene MegAlign software. A phylogenetic tree was constructed. Its analysis revealed a close relationship of E. acetylicum to other bacteria isolated from extreme environments suggesting similarities in anabolic adaptability and evolutionary development. PMID:20158163

  5. Distance-Based Phylogenetic Methods Around a Polytomy.

    PubMed

    Davidson, Ruth; Sullivant, Seth

    2014-01-01

    Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.

  6. Phylogenetic Stochastic Mapping Without Matrix Exponentiation

    PubMed Central

    Irvahn, Jan; Minin, Vladimir N.

    2014-01-01

    Abstract Phylogenetic stochastic mapping is a method for reconstructing the history of trait changes on a phylogenetic tree relating species/organism carrying the trait. State-of-the-art methods assume that the trait evolves according to a continuous-time Markov chain (CTMC) and works well for small state spaces. The computations slow down considerably for larger state spaces (e.g., space of codons), because current methodology relies on exponentiating CTMC infinitesimal rate matrices—an operation whose computational complexity grows as the size of the CTMC state space cubed. In this work, we introduce a new approach, based on a CTMC technique called uniformization, which does not use matrix exponentiation for phylogenetic stochastic mapping. Our method is based on a new Markov chain Monte Carlo (MCMC) algorithm that targets the distribution of trait histories conditional on the trait data observed at the tips of the tree. The computational complexity of our MCMC method grows as the size of the CTMC state space squared. Moreover, in contrast to competing matrix exponentiation methods, if the rate matrix is sparse, we can leverage this sparsity and increase the computational efficiency of our algorithm further. Using simulated data, we illustrate advantages of our MCMC algorithm and investigate how large the state space needs to be for our method to outperform matrix exponentiation approaches. We show that even on the moderately large state space of codons our MCMC method can be significantly faster than currently used matrix exponentiation methods. PMID:24918812

  7. A case study for effects of operational taxonomic units from intracellular endoparasites and ciliates on the eukaryotic phylogeny: phylogenetic position of the haptophyta in analyses of multiple slowly evolving genes.

    PubMed

    Nozaki, Hisayoshi; Yang, Yi; Maruyama, Shinichiro; Suzaki, Toshinobu

    2012-01-01

    Recent multigene phylogenetic analyses have contributed much to our understanding of eukaryotic phylogeny. However, the phylogenetic positions of various lineages within the eukaryotes have remained unresolved or in conflict between different phylogenetic studies. These phylogenetic ambiguities might have resulted from mixtures or integration from various factors including limited taxon sampling, missing data in the alignment, saturations of rapidly evolving genes, mixed analyses of short- and long-branched operational taxonomic units (OTUs), intracellular endoparasite and ciliate OTUs with unusual substitution etc. In order to evaluate the effects from intracellular endoparasite and ciliate OTUs co-analyzed on the eukaryotic phylogeny and simplify the results, we here used two different sets of data matrices of multiple slowly evolving genes with small amounts of missing data and examined the phylogenetic position of the secondary photosynthetic chromalveolates Haptophyta, one of the most abundant groups of oceanic phytoplankton and significant primary producers. In both sets, a robust sister relationship between Haptophyta and SAR (stramenopiles, alveolates, rhizarians, or SA [stramenopiles and alveolates]) was resolved when intracellular endoparasite/ciliate OTUs were excluded, but not in their presence. Based on comparisons of character optimizations on a fixed tree (with a clade composed of haptophytes and SAR or SA), disruption of the monophyly between haptophytes and SAR (or SA) in the presence of intracellular endoparasite/ciliate OTUs can be considered to be a result of multiple evolutionary reversals of character positions that supported the synapomorphy of the haptophyte and SAR (or SA) clade in the absence of intracellular endoparasite/ciliate OTUs.

  8. 6. Aerial view of turnpike alignment running from lower left ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    6. Aerial view of turnpike alignment running from lower left diagonally up to right along row of trees. Migel Estate and Farm buildings (HABS No. NY-6356) located at lower right of photograph. W.K. Smith house (HABS No. NY-6356-A) located within clump of trees at lower center, with poultry houses (HABS No. NY-6356-F and G) visible left of the clump of trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY

  9. The molecular symplesiomorphies shared by the stem groups of metazoan evolution: can sites as few as 1% have a significant impact on recognizing the phylogenetic position of myzostomida?

    PubMed

    Wang, Yanhui; Xie, Qiang

    2014-08-01

    Although it is clear that taxon sampling, alignments, gene sampling, tree reconstruction methods and the total length of the sequences used are critical to the reconstruction of evolutionary history, weakly supported or misleading nodes exist in phylogenetic studies with no obvious flaw in those aspects. The phylogenetic studies focusing on the basal part of bilaterian evolution are such a case. During the past decade, Myzostomida has appeared in the basal part of Bilateria in several phylogenetic studies of Metazoa. However, most researchers have entertained only two competing hypotheses about the position of Myzostomida-an affinity with Annelida and an affinity with Platyhelminthes. In this study, dozens of symplesiomorphies were discovered by means of ancestral state reconstruction in the complete 18S and 28S rDNAs shared by the stem groups of Metazoa. By contrastive analysis on the datasets with or without such symplesiomorphic sites, we discovered that Myzostomida and other basal groups are basal lineages of Bilateria due to the corresponding symplesiomorphies shared with earlier lineages. As such, symplesiomorphies account for approximately 1-2% of the whole dataset have an essential impact on phylogenetic inference, and this study reminds molecular systematists of the importance of carrying out ancestral state reconstruction at each site in sequence-based phylogenetic studies. In addition, reasons should be explored for the low support of the hypothesis that Myzostomida belongs to Annelida in the results of phylogenomic studies. Future phylogenetic studies concerning Myzostomida should include all of the basal lineages of Bilateria to avoid directly neglecting the stand-alone basal position of Myzostomida as a potential hypothesis. PMID:25128981

  10. XplorSeq: A software environment for integrated management and phylogenetic analysis of metagenomic sequence data

    PubMed Central

    Frank, Daniel N

    2008-01-01

    Background Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. Results XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; [1-3]) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. Conclusion XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at . PMID:18840282

  11. Cloning, in Vitro expression, and novel phylogenetic classification of a channel catfish estrogen receptor

    USGS Publications Warehouse

    Xia, Z.; Patino, R.; Gale, W.L.; Maule, A.G.; Densmore, L.D.

    1999-01-01

    We obtained two channel catfish estrogen receptor (ccER) cDNA from liver of female fish using RT–PCR. The two fragments were identical in sequence except that the smaller one had an out-of-frame deletion in the E domain, suggesting the existence of ccER splice variants. The larger fragment was used to screen a cDNA library from liver of a prepubescent female. A cDNA was obtained that encoded a 581-amino-acid ER with a deduced molecular weight of 63.8 kDa. Extracts of COS-7 cells transfected with ccER cDNA bound estrogen with high affinity (Kd = 4.7 nM) and specificity. Maximum parsimony and Neighbor Joining analyses were used to generate a phylogenetic classification of ccER on the basis of 18 full-length ER sequences. The tree suggested the existence of two major ER branches. One branch contained two clearly divergent clades which included all piscine ER (except Japanese eel ER) and all tetrapod ERα, respectively. The second major branch contained the eel ER and the mammalian ERβ. The high degree of divergence between the eel ER and mammalian ERβ suggested that they also represent distinct piscine and tetrapod ER. These data suggest that ERα and ERβ are present throughout vertebrates and that these two major ER types evolved by duplication of an ancestral ER gene. Sequence alignments with other members of the nuclear hormone receptor superfamily indicated the presence of 8 amino acids in the E domain that align exclusively among ER. Four of these amino acids have not received prior research attention and their function is unknown. The novel finding of putative ER splice variants in a nonmammalian vertebrate and the novel phylogenetic classification of ER offer new perspectives in understanding the diversification and function of ER.

  12. Identifiability of large phylogenetic mixture models.

    PubMed

    Rhodes, John A; Sullivant, Seth

    2012-01-01

    Phylogenetic mixture models are statistical models of character evolution allowing for heterogeneity. Each of the classes in some unknown partition of the characters may evolve by different processes, or even along different trees. Such models are of increasing interest for data analysis, as they can capture the variety of evolutionary processes that may be occurring across long sequences of DNA or proteins. The fundamental question of whether parameters of such a model are identifiable is difficult to address, due to the complexity of the parameterization. Identifiability is, however, essential to their use for statistical inference.We analyze mixture models on large trees, with many mixture components, showing that both numerical and tree parameters are indeed identifiable in these models when all trees are the same. This provides a theoretical justification for some current empirical studies, and indicates that extensions to even more mixture components should be theoretically well behaved. We also extend our results to certain mixtures on different trees, using the same algebraic techniques.

  13. Complete mitochondrial genome of Cervus elaphus songaricus (Cetartiodactyla: Cervinae) and a phylogenetic analysis with related species.

    PubMed

    Li, Yiqing; Ba, Hengxing; Yang, Fuhe

    2016-01-01

    Complete mitochondrial genome of Tianshan wapiti, Cervus elaphus songaricus, is 16,419 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 1 control region. The phylogenetic trees were reconstructed with the concatenated nucleotide sequences of the 13 protein-coding genes using maximum parsimony (MP) and Bayesian inference (BI) methods. MP and BI phylogenetic trees here showed an identical tree topology. The monopoly of red deer, wapiti and sika deer was well supported, and wapiti was found to share a closer relationship with sika deer. Tianshan wapiti shared a closer relationship with xanthopygus than yarkandensis. Rusa unicolor and Rucervus eldi were given a basal phylogenetic position. Our phylogenetic analysis provided a robust phylogenetic resolution spanning the entire evolutionary relationship of the subfamily Cervinae. PMID:24725059

  14. A Distance Measure for Genome Phylogenetic Analysis

    NASA Astrophysics Data System (ADS)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  15. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.

    PubMed

    Waddell, Peter J; Ota, Rissa; Penny, David

    2009-10-01

    Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P < 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report. PMID:19851702

  16. SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees

    PubMed Central

    Mallo, Diego; De Oliveira Martins, Leonardo; Posada, David

    2016-01-01

    We present a fast and flexible software package—SimPhy—for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases. PMID:26526427

  17. On the analysis of phylogenetically paired designs

    PubMed Central

    Funk, Jennifer L; Rakovski, Cyril S; Macpherson, J Michael

    2015-01-01

    As phylogenetically controlled experimental designs become increasingly common in ecology, the need arises for a standardized statistical treatment of these datasets. Phylogenetically paired designs circumvent the need for resolved phylogenies and have been used to compare species groups, particularly in the areas of invasion biology and adaptation. Despite the widespread use of this approach, the statistical analysis of paired designs has not been critically evaluated. We propose a mixed model approach that includes random effects for pair and species. These random effects introduce a “two-layer” compound symmetry variance structure that captures both the correlations between observations on related species within a pair as well as the correlations between the repeated measurements within species. We conducted a simulation study to assess the effect of model misspecification on Type I and II error rates. We also provide an illustrative example with data containing taxonomically similar species and several outcome variables of interest. We found that a mixed model with species and pair as random effects performed better in these phylogenetically explicit simulations than two commonly used reference models (no or single random effect) by optimizing Type I error rates and power. The proposed mixed model produces acceptable Type I and II error rates despite the absence of a phylogenetic tree. This design can be generalized to a variety of datasets to analyze repeated measurements in clusters of related subjects/species. PMID:25750719

  18. Phylogenetic analysis of the spirochetes.

    PubMed Central

    Paster, B J; Dewhirst, F E; Weisburg, W G; Tordoff, L A; Fraser, G J; Hespell, R B; Stanton, T B; Zablen, L; Mandelco, L; Woese, C R

    1991-01-01

    The 16S rRNA sequences were determined for species of Spirochaeta, Treponema, Borrelia, Leptospira, Leptonema, and Serpula, using a modified Sanger method of direct RNA sequencing. Analysis of aligned 16S rRNA sequences indicated that the spirochetes form a coherent taxon composed of six major clusters or groups. The first group, termed the treponemes, was divided into two subgroups. The first treponeme subgroup consisted of Treponema pallidum, Treponema phagedenis, Treponema denticola, a thermophilic spirochete strain, and two species of Spirochaeta, Spirochaeta zuelzerae and Spirochaeta stenostrepta, with an average interspecies similarity of 89.9%. The second treponeme subgroup contained Treponema bryantii, Treponema pectinovorum, Treponema saccharophilum, Treponema succinifaciens, and rumen strain CA, with an average interspecies similarity of 86.2%. The average interspecies similarity between the two treponeme subgroups was 84.2%. The division of the treponemes into two subgroups was verified by single-base signature analysis. The second spirochete group contained Spirochaeta aurantia, Spirochaeta halophila, Spirochaeta bajacaliforniensis, Spirochaeta litoralis, and Spirochaeta isovalerica, with an average similarity of 87.4%. The Spirochaeta group was related to the treponeme group, with an average similarity of 81.9%. The third spirochete group contained borrelias, including Borrelia burgdorferi, Borrelia anserina, Borrelia hermsii, and a rabbit tick strain. The borrelias formed a tight phylogenetic cluster, with average similarity of 97%. THe borrelia group shared a common branch with the Spirochaeta group and was closer to this group than to the treponemes. A single spirochete strain isolated fromt the shew constituted the fourth group. The fifth group was composed of strains of Serpula (Treponema) hyodysenteriae and Serpula (Treponema) innocens. The two species of this group were closely related, with a similarity of greater than 99%. Leptonema illini

  19. Molecular phylogenetics of mastodon and Tyrannosaurus rex.

    PubMed

    Organ, Chris L; Schweitzer, Mary H; Zheng, Wenxia; Freimark, Lisa M; Cantley, Lewis C; Asara, John M

    2008-04-25

    We report a molecular phylogeny for a nonavian dinosaur, extending our knowledge of trait evolution within nonavian dinosaurs into the macromolecular level of biological organization. Fragments of collagen alpha1(I) and alpha2(I) proteins extracted from fossil bones of Tyrannosaurus rex and Mammut americanum (mastodon) were analyzed with a variety of phylogenetic methods. Despite missing sequence data, the mastodon groups with elephant and the T. rex groups with birds, consistent with predictions based on genetic and morphological data for mastodon and on morphological data for T. rex. Our findings suggest that molecular data from long-extinct organisms may have the potential for resolving relationships at critical areas in the vertebrate evolutionary tree that have, so far, been phylogenetically intractable.

  20. Phylogenetic analysis of the Australian rosella parrots (Platycercus) reveals discordance among molecules and plumage.

    PubMed

    Shipham, Ashlee; Schmidt, Daniel J; Joseph, Leo; Hughes, Jane M

    2015-10-01

    Relationships and species limits among the colourful Australian parrots known as rosellas (Platycercus) are contentious because of poorly understood patterns of parapatry, sympatry and hybridization as well as complex patterns of geographical replacement of phenotypic forms. Two subgenera are, however, conventionally recognised: Platycercus comprises the blue-cheeked crimson rosella complex (Crimson Rosella P. elegans and Green Rosella P. caledonicus), and Violania contains the remaining four currently recognised species (Pale-headed Rosella P. adscitus, Eastern Rosella P. eximius, Northern Rosella P. venustus, and Western Rosella P. icterotis). We used phylogenetic analysis of ten loci (one mitochondrial, eight autosomal and one z-linked) and several individuals per nominal species primarily to examine relationships within the subgenera, especially the relationships and species limits within Violania. Of these, P. adscitus and P. eximius have long been considered sister species or conspecific due to a morphology-based hybrid zone and an early phylogenetic analysis of mitochondrial DNA restriction fragment length polymorphisms. The multilocus phylogenetic analysis presented here supports an alternative hypothesis aligning P. adscitus and P. venustus as sister species. Using divergence rates published in other avian studies, we estimated the divergence between P. venustus and P. adscitus at 0.0148-0.6124MYA and that between the P. adscitus/P. venustus ancestor and P. eximius earlier at 0.1617-1.0816MYA, both within the Pleistocene. Discordant topologies among gene and species trees are discussed and proposed to be the result of historical gene flow and/or incomplete lineage sorting (ILS). In particular, we suggest that discordance between mitochondrial and nuclear data may be the result of asymmetrical mitochondrial introgression from P. adscitus into P. eximius. The biogeographical implications of our findings are discussed relative to similarly distributed groups

  1. The phylogenetic position of Amoebophrya sp. infecting Gymnodinium sanguineum.

    PubMed

    Gunderson, J H; Goss, S H; Coats, D W

    1999-01-01

    The small-subunit rRNA sequence of a species of Amoebophrya infecting Gymnodinium sanguineum in Chesapeake Bay was obtained and compared to the small subunit rRNA sequences of other protists. Phylogenetic trees constructed with the new sequence place Amoebophrya between the remaining dinoflagellates and other protists.

  2. Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

    PubMed Central

    Stadler, Tanja; Degnan, James H.; Rosenberg, Noah A.

    2016-01-01

    Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth–death and multispecies coalescent model can explain the difference in empirical trees and birth–death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785

  3. Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

    PubMed

    Stadler, Tanja; Degnan, James H; Rosenberg, Noah A

    2016-07-01

    Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth-death and multispecies coalescent model can explain the difference in empirical trees and birth-death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785

  4. Phylogenetic diversity (PD) and biodiversity conservation: some bioinformatics challenges

    PubMed Central

    Faith, Daniel P.; Baker, Andrew M.

    2007-01-01

    Biodiversity conservation addresses information challenges through estimations encapsulated in measures of diversity. A quantitative measure of phylogenetic diversity, “PD”, has been defined as the minimum total length of all the phylogenetic branches required to span a given set of taxa on the phylogenetic tree (Faith 1992a). While a recent paper incorrectly characterizes PD as not including information about deeper phylogenetic branches, PD applications over the past decade document the proper incorporation of shared deep branches when assessing the total PD of a set of taxa. Current PD applications to macroinvertebrate taxa in streams of New South Wales, Australia illustrate the practical importance of this definition. Phylogenetic lineages, often corresponding to new, “cryptic”, taxa, are restricted to a small number of stream localities. A recent case of human impact causing loss of taxa in one locality implies a higher PD value for another locality, because it now uniquely represents a deeper branch. This molecular-based phylogenetic pattern supports the use of DNA barcoding programs for biodiversity conservation planning. Here, PD assessments side-step the contentious use of barcoding-based “species” designations. Bio-informatics challenges include combining different phylogenetic evidence, optimization problems for conservation planning, and effective integration of phylogenetic information with environmental and socio-economic data. PMID:19455206

  5. Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix

    SciTech Connect

    N. Price, Morgan; S. Dehal, Paramvir; P. Arkin, Adam

    2009-07-31

    Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

  6. Space, time, form: viewing the Tree of Life.

    PubMed

    Page, Roderic D M

    2012-02-01

    There are numerous ways to display a phylogenetic tree, which is reflected in the diversity of software tools available to phylogenetists. Displaying very large trees continues to be a challenge, made ever harder as increasing computing power enables researchers to construct ever-larger trees. At the same time, computing technology is enabling novel visualisations, ranging from geophylogenies embedded on digital globes to touch-screen interfaces that enable greater interaction with evolutionary trees. In this review, I survey recent developments in phylogenetic visualisation, highlighting successful (and less successful) approaches and sketching some future directions.

  7. Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters

    PubMed Central

    van Iersel, Leo; Kelk, Steven; Rupp, Regula; Huson, Daniel

    2010-01-01

    Phylogenetic trees are widely used to display estimates of how groups of species are evolved. Each phylogenetic tree can be seen as a collection of clusters, subgroups of the species that evolved from a common ancestor. When phylogenetic trees are obtained for several datasets (e.g. for different genes), then their clusters are often contradicting. Consequently, the set of all clusters of such a dataset cannot be combined into a single phylogenetic tree. Phylogenetic networks are a generalization of phylogenetic trees that can be used to display more complex evolutionary histories, including reticulate events, such as hybridizations, recombinations and horizontal gene transfers. Here, we present the new Cass algorithm that can combine any set of clusters into a phylogenetic network. We show that the networks constructed by Cass are usually simpler than networks constructed by other available methods. Moreover, we show that Cass is guaranteed to produce a network with at most two reticulations per biconnected component, whenever such a network exists. We have implemented Cass and integrated it into the freely available Dendroscope software. Contact: l.j.j.v.iersel@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20529896

  8. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications?

    PubMed

    Cao, Heping

    2015-09-01

    Trees contribute to enormous plant oil reserves because many trees contain 50%-80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the "proline knot" motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs.

  9. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications?

    PubMed

    Cao, Heping

    2015-09-01

    Trees contribute to enormous plant oil reserves because many trees contain 50%-80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the "proline knot" motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs. PMID:26258573

  10. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications?

    PubMed Central

    2015-01-01

    Abstract Trees contribute to enormous plant oil reserves because many trees contain 50%–80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the “proline knot” motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs. PMID:26258573

  11. A taxonomic and phylogenetic re-appraisal of the genus Curvularia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Species of Curvularia are important plant and human pathogens worldwide. In this study, the genus Curvularia is re-assessed based on molecular phylogenetic analysis and morphological observations of available isolates and specimens. A multi-gene phylogenetic tree inferred from ITS, TEF and GPDH gene...

  12. Nearest Alignment Space Termination

    2006-07-13

    Near Alignment Space Termination (NAST) is the Greengenes algorithm that matches up submitted sequences with the Greengenes database to look for similarities and align the submitted sequences based on those similarities.

  13. Budgeted phylogenetic diversity on circular split systems.

    PubMed

    Minh, Bui Quang; Pardi, Fabio; Klaere, Steffen; von Haeseler, Arndt

    2009-01-01

    In the last 15 years, Phylogenetic Diversity (PD) has gained interest in the community of conservation biologists as a surrogate measure for assessing biodiversity. We have recently proposed two approaches to select taxa for maximizing PD, namely PD with budget constraints and PD on split systems. In this paper, we will unify these two strategies and present a dynamic programming algorithm to solve the unified framework of selecting taxa with maximal PD under budget constraints on circular split systems. An improved algorithm will also be given if the underlying split system is a tree.

  14. Genome alignment with graph data structures: a comparison

    PubMed Central

    2014-01-01

    Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884

  15. Large-scale analysis of phylogenetic search behavior.

    PubMed

    Park, Hyun Jung; Sul, Seung-Jin; Williams, Tiffani L

    2010-01-01

    Phylogenetic analysis is used in all branches of biology with applications ranging from studies on the origin of human populations to investigations of the transmission patterns of HIV. Most phylogenetic analyses rely on effective heuristics for obtaining accurate trees. However, relatively little work has been done to analyze quantitatively the behavior of phylogenetic heuristics in tree space. A better understanding of local search behavior can facilitate the design of better heuristics, which ultimately lead to more accurate depictions of the true evolutionary relationships. In this paper, we present new and novel insights into local search behavior for maximum parsimony on three biological datasets consisting of 44, 60, and 174 taxa. By analyzing all trees from search, we find that, as the search algorithm climbs the hill to local optima, the trees in the neighborhood surrounding the current solution improve as well. Furthermore, the search is quite robust to a small number of randomly selected neighbors. Thus, our work shows how to gain insights into the behavior of local search algorithm by exploring a large diverse collection of trees.

  16. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    PubMed

    Chen, Shu-Chuan; Ogata, Aaron

    2015-01-01

    The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process. PMID:25826378

  17. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    PubMed

    Chen, Shu-Chuan; Ogata, Aaron

    2015-01-01

    The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  18. Neighborhoods of trees in circular orderings.

    PubMed

    Bastkowski, Sarah; Baskowski, Sarah; Moulton, Vincent; Spillner, Andreas; Wu, Taoyang

    2015-01-01

    In phylogenetics, a common strategy used to construct an evolutionary tree for a set of species [Formula: see text] is to search in the space of all such trees for one that optimizes some given score function (such as the minimum evolution, parsimony or likelihood score). As this can be computationally intensive, it was recently proposed to restrict such searches to the set of all those trees that are compatible with some circular ordering of the set [Formula: see text]. To inform the design of efficient algorithms to perform such searches, it is therefore of interest to find bounds for the number of trees compatible with a fixed ordering in the neighborhood of a tree that is determined by certain tree operations commonly used to search for trees: the nearest neighbor interchange (NNI), the subtree prune and regraft (SPR) and the tree bisection and reconnection (TBR) operations. We show that the size of such a neighborhood of a binary tree associated with the NNI operation is independent of the tree's topology, but that this is not the case for the SPR and TBR operations. We also give tight upper and lower bounds for the size of the neighborhood of a binary tree for the SPR and TBR operations and characterize those trees for which these bounds are attained.

  19. Investigating how students communicate tree-thinking

    NASA Astrophysics Data System (ADS)

    Boyce, Carrie Jo

    Learning is often an active endeavor that requires students work at building conceptual understandings of complex topics. Personal experiences, ideas, and communication all play large roles in developing knowledge of and understanding complex topics. Sometimes these experiences can promote formation of scientifically inaccurate or incomplete ideas. Representations are tools used to help individuals understand complex topics. In biology, one way that educators help people understand evolutionary histories of organisms is by using representations called phylogenetic trees. In order to understand phylogenetics trees, individuals need to understand the conventions associated with phylogenies. My dissertation, supported by the Tree-Thinking Representational Competence and Word Association frameworks, is a mixed-methods study investigating the changes in students' tree-reading, representational competence and mental association of phylogenetic terminology after participation in varied instruction. Participants included 128 introductory biology majors from a mid-sized southern research university. Participants were enrolled in either Introductory Biology I, where they were not taught phylogenetics, or Introductory Biology II, where they were explicitly taught phylogenetics. I collected data using a pre- and post-assessment consisting of a word association task and tree-thinking diagnostic (n=128). Additionally, I recruited a subset of students from both courses (n=37) to complete a computer simulation designed to teach students about phylogenetic trees. I then conducted semi-structured interviews consisting of a word association exercise with card sort task, a retrospective pre-assessment discussion, a post-assessment discussion, and interview questions. I found that students who received explicit lecture instruction had a significantly higher increase in scores on a tree-thinking diagnostic than students who did not receive lecture instruction. Students who received both

  20. Molecular phylogenetic study on the origin and evolution of Mustelidae.

    PubMed

    Yonezawa, Takahiro; Nikaido, Masato; Kohno, Naoki; Fukumoto, Yukio; Okada, Norihiro; Hasegawa, Masami

    2007-07-01

    The family Mustelidae, which consists of Mustelinae, Lutrinae, Melinae, and Taxidiinae, is the largest family among Carnivora and is a highly diverse group. Recent molecular phylogenetic studies have clarified the phylogenetic relations among Mustelidae, but there remain several unresolved problems, particularly concerning the deep branchings. Whereas many studies support the monophyly of Mustelidae+Procyonidae among Musteloidea, the relations between Mustelidae+Procyonidae, Ailuridae, and Miphitidae are still unclear. To address these problems, we inferred a tree on the basis of the sequences of mitochondrial genomes and of multiple nuclear genes using the maximum likelihood method. Our results strongly support the hypothesis that the Taxidiinae branched at first, followed by the branching of the Melinae. After that, Mustelinae diversified, and Lutrinae evolved within Mustelinae. With respect to the deep branchings in Musteloidea, the Ailuridae/Mephitidae monophyly tree and the Mephitidae-basal tree are indistinguishable in log-likelihood score, and this problem remains unresolved.

  1. Synthesis of phylogeny and taxonomy into a comprehensive tree of life.

    PubMed

    Hinchliff, Cody E; Smith, Stephen A; Allman, James F; Burleigh, J Gordon; Chaudhary, Ruchi; Coghill, Lyndon M; Crandall, Keith A; Deng, Jiabin; Drew, Bryan T; Gazis, Romina; Gude, Karl; Hibbett, David S; Katz, Laura A; Laughinghouse, H Dail; McTavish, Emily Jane; Midford, Peter E; Owen, Christopher L; Ree, Richard H; Rees, Jonathan A; Soltis, Douglas E; Williams, Tiffani; Cranston, Karen A

    2015-10-13

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.

  2. Synthesis of phylogeny and taxonomy into a comprehensive tree of life.

    PubMed

    Hinchliff, Cody E; Smith, Stephen A; Allman, James F; Burleigh, J Gordon; Chaudhary, Ruchi; Coghill, Lyndon M; Crandall, Keith A; Deng, Jiabin; Drew, Bryan T; Gazis, Romina; Gude, Karl; Hibbett, David S; Katz, Laura A; Laughinghouse, H Dail; McTavish, Emily Jane; Midford, Peter E; Owen, Christopher L; Ree, Richard H; Rees, Jonathan A; Soltis, Douglas E; Williams, Tiffani; Cranston, Karen A

    2015-10-13

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics. PMID:26385966

  3. Synthesis of phylogeny and taxonomy into a comprehensive tree of life

    PubMed Central

    Hinchliff, Cody E.; Smith, Stephen A.; Allman, James F.; Burleigh, J. Gordon; Chaudhary, Ruchi; Coghill, Lyndon M.; Crandall, Keith A.; Deng, Jiabin; Drew, Bryan T.; Gazis, Romina; Gude, Karl; Hibbett, David S.; Katz, Laura A.; Laughinghouse, H. Dail; McTavish, Emily Jane; Midford, Peter E.; Owen, Christopher L.; Ree, Richard H.; Rees, Jonathan A.; Soltis, Douglas E.; Williams, Tiffani; Cranston, Karen A.

    2015-01-01

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics. PMID:26385966

  4. Tree Amigos.

    ERIC Educational Resources Information Center

    Center for Environmental Study, Grand Rapids, MI.

    Tree Amigos is a special cross-cultural program that uses trees as a common bond to bring the people of the Americas together in unique partnerships to preserve and protect the shared global environment. It is a tangible program that embodies the philosophy that individuals, acting together, can make a difference. This resource book contains…

  5. Talking Trees

    ERIC Educational Resources Information Center

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  6. Uncertainty in homology inferences: Assessing and improving genomic sequence alignment

    PubMed Central

    Lunter, Gerton; Rocco, Andrea; Mimouni, Naila; Heger, Andreas; Caldeira, Alexandre; Hein, Jotun

    2008-01-01

    Sequence alignment underpins all of comparative genomics, yet it remains an incompletely solved problem. In particular, the statistical uncertainty within inferred alignments is often disregarded, while parametric or phylogenetic inferences are considered meaningless without confidence estimates. Here, we report on a theoretical and simulation study of pairwise alignments of genomic DNA at human–mouse divergence. We find that >15% of aligned bases are incorrect in existing whole-genome alignments, and we identify three types of alignment error, each leading to systematic biases in all algorithms considered. Careful modeling of the evolutionary process improves alignment quality; however, these improvements are modest compared with the remaining alignment errors, even with exact knowledge of the evolutionary model, emphasizing the need for statistical approaches to account for uncertainty. We develop a new algorithm, Marginalized Posterior Decoding (MPD), which explicitly accounts for uncertainties, is less biased and more accurate than other algorithms we consider, and reduces the proportion of misaligned bases by a third compared with the best existing algorithm. To our knowledge, this is the first nonheuristic algorithm for DNA sequence alignment to show robust improvements over the classic Needleman–Wunsch algorithm. Despite this, considerable uncertainty remains even in the improved alignments. We conclude that a probabilistic treatment is essential, both to improve alignment quality and to quantify the remaining uncertainty. This is becoming increasingly relevant with the growing appreciation of the importance of noncoding DNA, whose study relies heavily on alignments. Alignment errors are inevitable, and should be considered when drawing conclusions from alignments. Software and alignments to assist researchers in doing this are provided at http://genserv.anat.ox.ac.uk/grape/. PMID:18073381

  7. Phyloclimatic modeling: combining phylogenetics and bioclimatic modeling.

    PubMed

    Yesson, C; Culham, A

    2006-10-01

    We investigate the impact of past climates on plant diversification by tracking the "footprint" of climate change on a phylogenetic tree. Diversity within the cosmopolitan carnivorous plant genus Drosera (Droseraceae) is focused within Mediterranean climate regions. We explore whether this diversity is temporally linked to Mediterranean-type climatic shifts of the mid-Miocene and whether climate preferences are conservative over phylogenetic timescales. Phyloclimatic modeling combines environmental niche (bioclimatic) modeling with phylogenetics in order to study evolutionary patterns in relation to climate change. We present the largest and most complete such example to date using Drosera. The bioclimatic models of extant species demonstrate clear phylogenetic patterns; this is particularly evident for the tuberous sundews from southwestern Australia (subgenus Ergaleium). We employ a method for establishing confidence intervals of node ages on a phylogeny using replicates from a Bayesian phylogenetic analysis. This chronogram shows that many clades, including subgenus Ergaleium and section Bryastrum, diversified during the establishment of the Mediterranean-type climate. Ancestral reconstructions of bioclimatic models demonstrate a pattern of preference for this climate type within these groups. Ancestral bioclimatic models are projected into palaeo-climate reconstructions for the time periods indicated by the chronogram. We present two such examples that each generate plausible estimates of ancestral lineage distribution, which are similar to their current distributions. This is the first study to attempt bioclimatic projections on evolutionary time scales. The sundews appear to have diversified in response to local climate development. Some groups are specialized for Mediterranean climates, others show wide-ranging generalism. This demonstrates that Phyloclimatic modeling could be repeated for other plant groups and is fundamental to the understanding of

  8. Phyloclimatic modeling: combining phylogenetics and bioclimatic modeling.

    PubMed

    Yesson, C; Culham, A

    2006-10-01

    We investigate the impact of past climates on plant diversification by tracking the "footprint" of climate change on a phylogenetic tree. Diversity within the cosmopolitan carnivorous plant genus Drosera (Droseraceae) is focused within Mediterranean climate regions. We explore whether this diversity is temporally linked to Mediterranean-type climatic shifts of the mid-Miocene and whether climate preferences are conservative over phylogenetic timescales. Phyloclimatic modeling combines environmental niche (bioclimatic) modeling with phylogenetics in order to study evolutionary patterns in relation to climate change. We present the largest and most complete such example to date using Drosera. The bioclimatic models of extant species demonstrate clear phylogenetic patterns; this is particularly evident for the tuberous sundews from southwestern Australia (subgenus Ergaleium). We employ a method for establishing confidence intervals of node ages on a phylogeny using replicates from a Bayesian phylogenetic analysis. This chronogram shows that many clades, including subgenus Ergaleium and section Bryastrum, diversified during the establishment of the Mediterranean-type climate. Ancestral reconstructions of bioclimatic models demonstrate a pattern of preference for this climate type within these groups. Ancestral bioclimatic models are projected into palaeo-climate reconstructions for the time periods indic