Sample records for combining evolutionary sequence

  1. Combining Physicochemical and Evolutionary Information for Protein Contact Prediction

    PubMed Central

    Schneider, Michael; Brock, Oliver

    2014-01-01

    We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. PMID:25338092

  2. EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

    PubMed Central

    Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

    2006-01-01

    Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150

  3. A Generative Angular Model of Protein Structure Evolution

    PubMed Central

    Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

    2017-01-01

    Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724

  4. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  5. Beyond Reasonable Doubt: Evolution from DNA Sequences

    PubMed Central

    Penny, David

    2013-01-01

    We demonstrate quantitatively that, as predicted by evolutionary theory, sequences of homologous proteins from different species converge as we go further and further back in time. The converse, a non-evolutionary model can be expressed as probabilities, and the test works for chloroplast, nuclear and mitochondrial sequences, as well as for sequences that diverged at different time depths. Even on our conservative test, the probability that chance could produce the observed levels of ancestral convergence for just one of the eight datasets of 51 proteins is ≈1×10−19 and combined over 8 datasets is ≈1×10−132. By comparison, there are about 1080 protons in the universe, hence the probability that the sequences could have been produced by a process involving unrelated ancestral sequences is about 1050 lower than picking, among all protons, the same proton at random twice in a row. A non-evolutionary control model shows no convergence, and only a small number of parameters are required to account for the observations. It is time that that researchers insisted that doubters put up testable alternatives to evolution. PMID:23950906

  6. The Ancient Evolutionary History of Polyomaviruses

    PubMed Central

    Buck, Christopher B.; Van Doorslaer, Koenraad; Peretti, Alberto; Geoghegan, Eileen M.; Tisza, Michael J.; An, Ping; Katz, Joshua P.; Pipas, James M.; McBride, Alison A.; Camus, Alvin C.; McDermott, Alexa J.; Dill, Jennifer A.; Delwart, Eric; Ng, Terry F. F.; Farkas, Kata; Austin, Charlotte; Kraberger, Simona; Davison, William; Pastrana, Diana V.; Varsani, Arvind

    2016-01-01

    Polyomaviruses are a family of DNA tumor viruses that are known to infect mammals and birds. To investigate the deeper evolutionary history of the family, we used a combination of viral metagenomics, bioinformatics, and structural modeling approaches to identify and characterize polyomavirus sequences associated with fish and arthropods. Analyses drawing upon the divergent new sequences indicate that polyomaviruses have been gradually co-evolving with their animal hosts for at least half a billion years. Phylogenetic analyses of individual polyomavirus genes suggest that some modern polyomavirus species arose after ancient recombination events involving distantly related polyomavirus lineages. The improved evolutionary model provides a useful platform for developing a more accurate taxonomic classification system for the viral family Polyomaviridae. PMID:27093155

  7. Determinants of the rate of protein sequence evolution

    PubMed Central

    Zhang, Jianzhi; Yang, Jian-Rong

    2015-01-01

    The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what constitutes functional constraint has remained unclear. The increasing availability of genomic data has allowed for much needed empirical examinations on the nature of functional constraint. These studies found that the evolutionary rate of a protein is predominantly influenced by its expression level rather than functional importance. A combination of theoretical and empirical analyses have identified multiple mechanisms behind these observations and demonstrated a prominent role that selection against errors in molecular and cellular processes plays in protein evolution. PMID:26055156

  8. A single determinant dominates the rate of yeast protein evolution.

    PubMed

    Drummond, D Allan; Raval, Alpan; Wilke, Claus O

    2006-02-01

    A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.

  9. Phylogenetic relationships of bears (the Ursidae) inferred from mitochondrial DNA sequences.

    PubMed

    Zhang, Y P; Ryder, O A

    1994-12-01

    The phylogenetic relationships among some bear species are still open questions. We present here mitochondrial DNA sequences of D-loop region, cytochrome b, 12S rRNA, tRNA(Pro), and tRNA(Thr) genes from all bear species and the giant panda. A series of evolutionary trees with concordant topology has been derived based on the combined data set of all of the mitochondrial DNA sequences, which may have resolved the evolutionary relationships of all bear species: the ancestor of the spectacled bear diverged first, followed by the sloth bear; the brown bear and polar bear are sister taxa relative to the Asiatic black bear; the closest relative of the American black bear is the sun bear. Primers for forensic identification of the giant panda and bears are proposed. Analysis of these data, in combination with data from primates and antelopes, suggests that relative substitutional rates between different mitochondrial DNA regions may vary greatly among different taxa of the vertebrates.

  10. Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily

    PubMed Central

    Akiva, Eyal; Copp, Janine N.; Tokuriki, Nobuhiko; Babbitt, Patricia C.

    2017-01-01

    Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold. PMID:29078300

  11. Adaptive evolutionary walks require neutral intermediates in RNA fitness landscapes.

    PubMed

    Rendel, Mark D

    2011-01-01

    In RNA fitness landscapes with interconnected networks of neutral mutations, neutral precursor mutations can play an important role in facilitating the accessibility of epistatic adaptive mutant combinations. I use an exhaustively surveyed fitness landscape model based on short sequence RNA genotypes (and their secondary structure phenotypes) to calculate the minimum rate at which mutants initially appearing as neutral are incorporated into an adaptive evolutionary walk. I show first, that incorporating neutral mutations significantly increases the number of point mutations in a given evolutionary walk when compared to estimates from previous adaptive walk models. Second, that incorporating neutral mutants into such a walk significantly increases the final fitness encountered on that walk - indeed evolutionary walks including neutral steps often reach the global optimum in this model. Third, and perhaps most importantly, evolutionary paths of this kind are often extremely winding in their nature and have the potential to undergo multiple mutations at a given sequence position within a single walk; the potential of these winding paths to mislead phylogenetic reconstruction is briefly considered. Copyright © 2010 Elsevier Inc. All rights reserved.

  12. Historian: accurate reconstruction of ancestral sequences and evolutionary rates.

    PubMed

    Holmes, Ian H

    2017-04-15

    Reconstruction of ancestral sequence histories, and estimation of parameters like indel rates, are improved by using explicit evolutionary models and summing over uncertain alignments. The previous best tool for this purpose (according to simulation benchmarks) was ProtPal, but this tool was too slow for practical use. Historian combines an efficient reimplementation of the ProtPal algorithm with performance-improving heuristics from other alignment tools. Simulation results on fidelity of rate estimation via ancestral reconstruction, along with evaluations on the structurally informed alignment dataset BAliBase 3.0, recommend Historian over other alignment tools for evolutionary applications. Historian is available at https://github.com/evoldoers/historian under the Creative Commons Attribution 3.0 US license. ihholmes+historian@gmail.com. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. The complex evolutionary dynamics of ancient and recent polyploidy in Leucaena (Leguminosae; Mimosoideae).

    PubMed

    Govindarajulu, Rajanikanth; Hughes, Colin E; Alexander, Patrick J; Bailey, C Donovan

    2011-12-01

    The evolutionary history of Leucaena has been impacted by polyploidy, hybridization, and divergent allopatric species diversification, suggesting that this is an ideal group to investigate the evolutionary tempo of polyploidy and the complexities of reticulation and divergence in plant diversification. Parsimony- and ML-based phylogenetic approaches were applied to 105 accessions sequenced for six sequence characterized amplified region-based nuclear encoded loci, nrDNA ITS, and four cpDNA regions. Hypotheses for the origin of tetraploid species were inferred using results derived from a novel species tree and established gene tree methods and from data on genome sizes and geographic distributions. The combination of comprehensively sampled multilocus DNA sequence data sets and a novel methodology provide strong resolution and support for the origins of all five tetraploid species. A minimum of four allopolyploidization events are required to explain the origins of these species. The origin(s) of one tetraploid pair (L. involucrata/L. pallida) can be equally explained by two unique allopolyploidizations or a single event followed by divergent speciation. Alongside other recent findings, a comprehensive picture of the complex evolutionary dynamics of polyploidy in Leucaena is emerging that includes paleotetraploidization, diploidization of the last common ancestor to Leucaena, allopatric divergence among diploids, and recent allopolyploid origins for tetraploid species likely associated with human translocation of seed. These results provide insights into the role of divergence and reticulation in a well-characterized angiosperm lineage and into traits of diploid parents and derived tetraploids (particularly self-compatibility and year-round flowering) favoring the formation and establishment of novel tetraploids combinations.

  14. LS³: A Method for Improving Phylogenomic Inferences When Evolutionary Rates Are Heterogeneous among Taxa

    PubMed Central

    Rivera-Rivera, Carlos J.; Montoya-Burgos, Juan I.

    2016-01-01

    Phylogenetic inference artifacts can occur when sequence evolution deviates from assumptions made by the models used to analyze them. The combination of strong model assumption violations and highly heterogeneous lineage evolutionary rates can become problematic in phylogenetic inference, and lead to the well-described long-branch attraction (LBA) artifact. Here, we define an objective criterion for assessing lineage evolutionary rate heterogeneity among predefined lineages: the result of a likelihood ratio test between a model in which the lineages evolve at the same rate (homogeneous model) and a model in which different lineage rates are allowed (heterogeneous model). We implement this criterion in the algorithm Locus Specific Sequence Subsampling (LS³), aimed at reducing the effects of LBA in multi-gene datasets. For each gene, LS³ sequentially removes the fastest-evolving taxon of the ingroup and tests for lineage rate homogeneity until all lineages have uniform evolutionary rates. The sequences excluded from the homogeneously evolving taxon subset are flagged as potentially problematic. The software implementation provides the user with the possibility to remove the flagged sequences for generating a new concatenated alignment. We tested LS³ with simulations and two real datasets containing LBA artifacts: a nucleotide dataset regarding the position of Glires within mammals and an amino-acid dataset concerning the position of nematodes within bilaterians. The initially incorrect phylogenies were corrected in all cases upon removing data flagged by LS³. PMID:26912812

  15. Interchromosomal Duplications on the Bactrocera oleae Y Chromosome Imply a Distinct Evolutionary Origin of the Sex Chromosomes Compared to Drosophila

    PubMed Central

    Gabrieli, Paolo; Gomulski, Ludvik M.; Bonomi, Angelica; Siciliano, Paolo; Scolari, Francesca; Franz, Gerald; Jessup, Andrew; Malacrida, Anna R.; Gasperi, Giuliano

    2011-01-01

    Background Diptera have an extraordinary variety of sex determination mechanisms, and Drosophila melanogaster is the paradigm for this group. However, the Drosophila sex determination pathway is only partially conserved and the family Tephritidae affords an interesting example. The tephritid Y chromosome is postulated to be necessary to determine male development. Characterization of Y sequences, apart from elucidating the nature of the male determining factor, is also important to understand the evolutionary history of sex chromosomes within the Tephritidae. We studied the Y sequences from the olive fly, Bactrocera oleae. Its Y chromosome is minute and highly heterochromatic, and displays high heteromorphism with the X chromosome. Methodology/Principal Findings A combined Representational Difference Analysis (RDA) and fluorescence in-situ hybridization (FISH) approach was used to investigate the Y chromosome to derive information on its sequence content. The Y chromosome is strewn with repetitive DNA sequences, the majority of which are also interdispersed in the pericentromeric regions of the autosomes. The Y chromosome appears to have accumulated small and large repetitive interchromosomal duplications. The large interchromosomal duplications harbour an importin-4-like gene fragment. Apart from these importin-4-like sequences, the other Y repetitive sequences are not shared with the X chromosome, suggesting molecular differentiation of these two chromosomes. Moreover, as the identified Y sequences were not detected on the Y chromosomes of closely related tephritids, we can infer divergence in the repetitive nature of their sequence contents. Conclusions/Significance The identification of Y-linked sequences may tell us much about the repetitive nature, the origin and the evolution of Y chromosomes. We hypothesize how these repetitive sequences accumulated and were maintained on the Y chromosome during its evolutionary history. Our data reinforce the idea that the sex chromosomes of the Tephritidae may have distinct evolutionary origins with respect to those of the Drosophilidae and other Dipteran families. PMID:21408187

  16. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses

    PubMed Central

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A.; Janke, Axel

    2015-01-01

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. PMID:26019166

  17. On the Statistical Properties of the Lower Main Sequence

    NASA Astrophysics Data System (ADS)

    Angelou, George C.; Bellinger, Earl P.; Hekker, Saskia; Basu, Sarbani

    2017-04-01

    Astronomy is in an era where all-sky surveys are mapping the Galaxy. The plethora of photometric, spectroscopic, asteroseismic, and astrometric data allows us to characterize the comprising stars in detail. Here we quantify to what extent precise stellar observations reveal information about the properties of a star, including properties that are unobserved, or even unobservable. We analyze the diagnostic potential of classical and asteroseismic observations for inferring stellar parameters such as age, mass, and radius from evolutionary tracks of solar-like oscillators on the lower main sequence. We perform rank correlation tests in order to determine the capacity of each observable quantity to probe structural components of stars and infer their evolutionary histories. We also analyze the principal components of classic and asteroseismic observables to highlight the degree of redundancy present in the measured quantities and demonstrate the extent to which information of the model parameters can be extracted. We perform multiple regression using combinations of observable quantities in a grid of evolutionary simulations and appraise the predictive utility of each combination in determining the properties of stars. We identify the combinations that are useful and provide limits to where each type of observable quantity can reveal information about a star. We investigate the accuracy with which targets in the upcoming TESS and PLATO missions can be characterized. We demonstrate that the combination of observations from GAIA and PLATO will allow us to tightly constrain stellar masses, ages, and radii with machine learning for the purposes of Galactic and planetary studies.

  18. Reconstructing evolutionary trees in parallel for massive sequences.

    PubMed

    Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

    2017-12-14

    Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .

  19. Evolutionary Algorithms for Boolean Functions in Diverse Domains of Cryptography.

    PubMed

    Picek, Stjepan; Carlet, Claude; Guilley, Sylvain; Miller, Julian F; Jakobovic, Domagoj

    2016-01-01

    The role of Boolean functions is prominent in several areas including cryptography, sequences, and coding theory. Therefore, various methods for the construction of Boolean functions with desired properties are of direct interest. New motivations on the role of Boolean functions in cryptography with attendant new properties have emerged over the years. There are still many combinations of design criteria left unexplored and in this matter evolutionary computation can play a distinct role. This article concentrates on two scenarios for the use of Boolean functions in cryptography. The first uses Boolean functions as the source of the nonlinearity in filter and combiner generators. Although relatively well explored using evolutionary algorithms, it still presents an interesting goal in terms of the practical sizes of Boolean functions. The second scenario appeared rather recently where the objective is to find Boolean functions that have various orders of the correlation immunity and minimal Hamming weight. In both these scenarios we see that evolutionary algorithms are able to find high-quality solutions where genetic programming performs the best.

  20. LS³: A Method for Improving Phylogenomic Inferences When Evolutionary Rates Are Heterogeneous among Taxa.

    PubMed

    Rivera-Rivera, Carlos J; Montoya-Burgos, Juan I

    2016-06-01

    Phylogenetic inference artifacts can occur when sequence evolution deviates from assumptions made by the models used to analyze them. The combination of strong model assumption violations and highly heterogeneous lineage evolutionary rates can become problematic in phylogenetic inference, and lead to the well-described long-branch attraction (LBA) artifact. Here, we define an objective criterion for assessing lineage evolutionary rate heterogeneity among predefined lineages: the result of a likelihood ratio test between a model in which the lineages evolve at the same rate (homogeneous model) and a model in which different lineage rates are allowed (heterogeneous model). We implement this criterion in the algorithm Locus Specific Sequence Subsampling (LS³), aimed at reducing the effects of LBA in multi-gene datasets. For each gene, LS³ sequentially removes the fastest-evolving taxon of the ingroup and tests for lineage rate homogeneity until all lineages have uniform evolutionary rates. The sequences excluded from the homogeneously evolving taxon subset are flagged as potentially problematic. The software implementation provides the user with the possibility to remove the flagged sequences for generating a new concatenated alignment. We tested LS³ with simulations and two real datasets containing LBA artifacts: a nucleotide dataset regarding the position of Glires within mammals and an amino-acid dataset concerning the position of nematodes within bilaterians. The initially incorrect phylogenies were corrected in all cases upon removing data flagged by LS³. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  1. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses.

    PubMed

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A; Janke, Axel

    2015-05-27

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    PubMed

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  3. Evolution of thermotolerance in hot spring cyanobacteria of the genus Synechococcus

    NASA Technical Reports Server (NTRS)

    Miller, S. R.; Castenholz, R. W.

    2000-01-01

    The extension of ecological tolerance limits may be an important mechanism by which microorganisms adapt to novel environments, but it may come at the evolutionary cost of reduced performance under ancestral conditions. We combined a comparative physiological approach with phylogenetic analyses to study the evolution of thermotolerance in hot spring cyanobacteria of the genus Synechococcus. Among the 20 laboratory clones of Synechococcus isolated from collections made along an Oregon hot spring thermal gradient, four different 16S rRNA gene sequences were identified. Phylogenies constructed by using the sequence data indicated that the clones were polyphyletic but that three of the four sequence groups formed a clade. Differences in thermotolerance were observed for clones with different 16S rRNA gene sequences, and comparison of these physiological differences within a phylogenetic framework provided evidence that more thermotolerant lineages of Synechococcus evolved from less thermotolerant ancestors. The extension of the thermal limit in these bacteria was correlated with a reduction in the breadth of the temperature range for growth, which provides evidence that enhanced thermotolerance has come at the evolutionary cost of increased thermal specialization. This study illustrates the utility of using phylogenetic comparative methods to investigate how evolutionary processes have shaped historical patterns of ecological diversification in microorganisms.

  4. Biophysical models of protein evolution: Understanding the patterns of evolutionary sequence divergence

    PubMed Central

    Echave, Julian; Wilke, Claus O.

    2018-01-01

    For decades, rates of protein evolution have been interpreted in terms of the vague concept of “functional importance”. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating them has large impacts on protein structure and stability. Here, we review the studies of the emergent field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field. PMID:28301766

  5. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

    PubMed Central

    Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928

  6. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.

    PubMed

    Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.

  7. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous–Paleogene boundary

    PubMed Central

    Vanneste, Kevin; Baele, Guy; Maere, Steven; Van de Peer, Yves

    2014-01-01

    Ancient whole-genome duplications (WGDs), also referred to as paleopolyploidizations, have been reported in most evolutionary lineages. Their attributed role remains a major topic of discussion, ranging from an evolutionary dead end to a road toward evolutionary success, with evidence supporting both fates. Previously, based on dating WGDs in a limited number of plant species, we found a clustering of angiosperm paleopolyploidizations around the Cretaceous–Paleogene (K–Pg) extinction event about 66 million years ago. Here we revisit this finding, which has proven controversial, by combining genome sequence information for many more plant lineages and using more sophisticated analyses. We include 38 full genome sequences and three transcriptome assemblies in a Bayesian evolutionary analysis framework that incorporates uncorrelated relaxed clock methods and fossil uncertainty. In accordance with earlier findings, we demonstrate a strongly nonrandom pattern of genome duplications over time with many WGDs clustering around the K–Pg boundary. We interpret these results in the context of recent studies on invasive polyploid plant species, and suggest that polyploid establishment is promoted during times of environmental stress. We argue that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time of polyploidization could mitigate the stark contrast in the proposed evolutionary fates of polyploids. PMID:24835588

  8. Computationally mapping sequence space to understand evolutionary protein engineering.

    PubMed

    Armstrong, Kathryn A; Tidor, Bruce

    2008-01-01

    Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.

  9. Local Geometry and Evolutionary Conservation of Protein Surfaces Reveal the Multiple Recognition Patches in Protein-Protein Interactions

    PubMed Central

    Laine, Elodie; Carbone, Alessandra

    2015-01-01

    Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684

  10. Phylogeny and Divergence Times of Lemurs Inferred with Recent and Ancient Fossils in the Tree.

    PubMed

    Herrera, James P; Dávalos, Liliana M

    2016-09-01

    Paleontological and neontological systematics seek to answer evolutionary questions with different data sets. Phylogenies inferred for combined extant and extinct taxa provide novel insights into the evolutionary history of life. Primates have an extensive, diverse fossil record and molecular data for living and extinct taxa are rapidly becoming available. We used two models to infer the phylogeny and divergence times for living and fossil primates, the tip-dating (TD) and fossilized birth-death process (FBD). We collected new morphological data, especially on the living and extinct endemic lemurs of Madagascar. We combined the morphological data with published DNA sequences to infer near-complete (88% of lemurs) time-calibrated phylogenies. The results suggest that primates originated around the Cretaceous-Tertiary boundary, slightly earlier than indicated by the fossil record and later than previously inferred from molecular data alone. We infer novel relationships among extinct lemurs, and strong support for relationships that were previously unresolved. Dates inferred with TD were significantly older than those inferred with FBD, most likely related to an assumption of a uniform branching process in the TD compared with a birth-death process assumed in the FBD. This is the first study to combine morphological and DNA sequence data from extinct and extant primates to infer evolutionary relationships and divergence times, and our results shed new light on the tempo of lemur evolution and the efficacy of combined phylogenetic analyses. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. On the Statistical Properties of the Lower Main Sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Angelou, George C.; Bellinger, Earl P.; Hekker, Saskia

    Astronomy is in an era where all-sky surveys are mapping the Galaxy. The plethora of photometric, spectroscopic, asteroseismic, and astrometric data allows us to characterize the comprising stars in detail. Here we quantify to what extent precise stellar observations reveal information about the properties of a star, including properties that are unobserved, or even unobservable. We analyze the diagnostic potential of classical and asteroseismic observations for inferring stellar parameters such as age, mass, and radius from evolutionary tracks of solar-like oscillators on the lower main sequence. We perform rank correlation tests in order to determine the capacity of each observablemore » quantity to probe structural components of stars and infer their evolutionary histories. We also analyze the principal components of classic and asteroseismic observables to highlight the degree of redundancy present in the measured quantities and demonstrate the extent to which information of the model parameters can be extracted. We perform multiple regression using combinations of observable quantities in a grid of evolutionary simulations and appraise the predictive utility of each combination in determining the properties of stars. We identify the combinations that are useful and provide limits to where each type of observable quantity can reveal information about a star. We investigate the accuracy with which targets in the upcoming TESS and PLATO missions can be characterized. We demonstrate that the combination of observations from GAIA and PLATO will allow us to tightly constrain stellar masses, ages, and radii with machine learning for the purposes of Galactic and planetary studies.« less

  12. Time Clustered Sampling Can Inflate the Inferred Substitution Rate in Foot-And-Mouth Disease Virus Analyses.

    PubMed

    Pedersen, Casper-Emil T; Frandsen, Peter; Wekesa, Sabenzia N; Heller, Rasmus; Sangula, Abraham K; Wadsworth, Jemma; Knowles, Nick J; Muwanika, Vincent B; Siegismund, Hans R

    2015-01-01

    With the emergence of analytical software for the inference of viral evolution, a number of studies have focused on estimating important parameters such as the substitution rate and the time to the most recent common ancestor (tMRCA) for rapidly evolving viruses. Coupled with an increasing abundance of sequence data sampled under widely different schemes, an effort to keep results consistent and comparable is needed. This study emphasizes commonly disregarded problems in the inference of evolutionary rates in viral sequence data when sampling is unevenly distributed on a temporal scale through a study of the foot-and-mouth (FMD) disease virus serotypes SAT 1 and SAT 2. Our study shows that clustered temporal sampling in phylogenetic analyses of FMD viruses will strongly bias the inferences of substitution rates and tMRCA because the inferred rates in such data sets reflect a rate closer to the mutation rate rather than the substitution rate. Estimating evolutionary parameters from viral sequences should be performed with due consideration of the differences in short-term and longer-term evolutionary processes occurring within sets of temporally sampled viruses, and studies should carefully consider how samples are combined.

  13. Evolutionary Novelty in a Butterfly Wing Pattern through Enhancer Shuffling

    PubMed Central

    Pardo-Diaz, Carolina; Hanly, Joseph J.; Martin, Simon H.; Mallet, James; Dasmahapatra, Kanchon K.; Salazar, Camilo; Joron, Mathieu; Nadeau, Nicola; McMillan, W. Owen; Jiggins, Chris D.

    2016-01-01

    An important goal in evolutionary biology is to understand the genetic changes underlying novel morphological structures. We investigated the origins of a complex wing pattern found among Amazonian Heliconius butterflies. Genome sequence data from 142 individuals across 17 species identified narrow regions associated with two distinct red colour pattern elements, dennis and ray. We hypothesise that these modules in non-coding sequence represent distinct cis-regulatory loci that control expression of the transcription factor optix, which in turn controls red pattern variation across Heliconius. Phylogenetic analysis of the two elements demonstrated that they have distinct evolutionary histories and that novel adaptive morphological variation was created by shuffling these cis-regulatory modules through recombination between divergent lineages. In addition, recombination of modules into different combinations within species further contributes to diversity. Analysis of the timing of diversification in these two regions supports the hypothesis of introgression moving regulatory modules between species, rather than shared ancestral variation. The dennis phenotype introgressed into Heliconius melpomene at about the same time that ray originated in this group, while ray introgressed back into H. elevatus much more recently. We show that shuffling of existing enhancer elements both within and between species provides a mechanism for rapid diversification and generation of novel morphological combinations during adaptive radiation. PMID:26771987

  14. Evolutionary Novelty in a Butterfly Wing Pattern through Enhancer Shuffling.

    PubMed

    Wallbank, Richard W R; Baxter, Simon W; Pardo-Diaz, Carolina; Hanly, Joseph J; Martin, Simon H; Mallet, James; Dasmahapatra, Kanchon K; Salazar, Camilo; Joron, Mathieu; Nadeau, Nicola; McMillan, W Owen; Jiggins, Chris D

    2016-01-01

    An important goal in evolutionary biology is to understand the genetic changes underlying novel morphological structures. We investigated the origins of a complex wing pattern found among Amazonian Heliconius butterflies. Genome sequence data from 142 individuals across 17 species identified narrow regions associated with two distinct red colour pattern elements, dennis and ray. We hypothesise that these modules in non-coding sequence represent distinct cis-regulatory loci that control expression of the transcription factor optix, which in turn controls red pattern variation across Heliconius. Phylogenetic analysis of the two elements demonstrated that they have distinct evolutionary histories and that novel adaptive morphological variation was created by shuffling these cis-regulatory modules through recombination between divergent lineages. In addition, recombination of modules into different combinations within species further contributes to diversity. Analysis of the timing of diversification in these two regions supports the hypothesis of introgression moving regulatory modules between species, rather than shared ancestral variation. The dennis phenotype introgressed into Heliconius melpomene at about the same time that ray originated in this group, while ray introgressed back into H. elevatus much more recently. We show that shuffling of existing enhancer elements both within and between species provides a mechanism for rapid diversification and generation of novel morphological combinations during adaptive radiation.

  15. The genome of the sea urchin Strongylocentrotus purpuratus.

    PubMed

    Sodergren, Erica; Weinstock, George M; Davidson, Eric H; Cameron, R Andrew; Gibbs, Richard A; Angerer, Robert C; Angerer, Lynne M; Arnone, Maria Ina; Burgess, David R; Burke, Robert D; Coffman, James A; Dean, Michael; Elphick, Maurice R; Ettensohn, Charles A; Foltz, Kathy R; Hamdoun, Amro; Hynes, Richard O; Klein, William H; Marzluff, William; McClay, David R; Morris, Robert L; Mushegian, Arcady; Rast, Jonathan P; Smith, L Courtney; Thorndyke, Michael C; Vacquier, Victor D; Wessel, Gary M; Wray, Greg; Zhang, Lan; Elsik, Christine G; Ermolaeva, Olga; Hlavina, Wratko; Hofmann, Gretchen; Kitts, Paul; Landrum, Melissa J; Mackey, Aaron J; Maglott, Donna; Panopoulou, Georgia; Poustka, Albert J; Pruitt, Kim; Sapojnikov, Victor; Song, Xingzhi; Souvorov, Alexandre; Solovyev, Victor; Wei, Zheng; Whittaker, Charles A; Worley, Kim; Durbin, K James; Shen, Yufeng; Fedrigo, Olivier; Garfield, David; Haygood, Ralph; Primus, Alexander; Satija, Rahul; Severson, Tonya; Gonzalez-Garay, Manuel L; Jackson, Andrew R; Milosavljevic, Aleksandar; Tong, Mark; Killian, Christopher E; Livingston, Brian T; Wilt, Fred H; Adams, Nikki; Bellé, Robert; Carbonneau, Seth; Cheung, Rocky; Cormier, Patrick; Cosson, Bertrand; Croce, Jenifer; Fernandez-Guerra, Antonio; Genevière, Anne-Marie; Goel, Manisha; Kelkar, Hemant; Morales, Julia; Mulner-Lorillon, Odile; Robertson, Anthony J; Goldstone, Jared V; Cole, Bryan; Epel, David; Gold, Bert; Hahn, Mark E; Howard-Ashby, Meredith; Scally, Mark; Stegeman, John J; Allgood, Erin L; Cool, Jonah; Judkins, Kyle M; McCafferty, Shawn S; Musante, Ashlan M; Obar, Robert A; Rawson, Amanda P; Rossetti, Blair J; Gibbons, Ian R; Hoffman, Matthew P; Leone, Andrew; Istrail, Sorin; Materna, Stefan C; Samanta, Manoj P; Stolc, Viktor; Tongprasit, Waraporn; Tu, Qiang; Bergeron, Karl-Frederik; Brandhorst, Bruce P; Whittle, James; Berney, Kevin; Bottjer, David J; Calestani, Cristina; Peterson, Kevin; Chow, Elly; Yuan, Qiu Autumn; Elhaik, Eran; Graur, Dan; Reese, Justin T; Bosdet, Ian; Heesun, Shin; Marra, Marco A; Schein, Jacqueline; Anderson, Michele K; Brockton, Virginia; Buckley, Katherine M; Cohen, Avis H; Fugmann, Sebastian D; Hibino, Taku; Loza-Coll, Mariano; Majeske, Audrey J; Messier, Cynthia; Nair, Sham V; Pancer, Zeev; Terwilliger, David P; Agca, Cavit; Arboleda, Enrique; Chen, Nansheng; Churcher, Allison M; Hallböök, F; Humphrey, Glen W; Idris, Mohammed M; Kiyama, Takae; Liang, Shuguang; Mellott, Dan; Mu, Xiuqian; Murray, Greg; Olinski, Robert P; Raible, Florian; Rowe, Matthew; Taylor, John S; Tessmar-Raible, Kristin; Wang, D; Wilson, Karen H; Yaguchi, Shunsuke; Gaasterland, Terry; Galindo, Blanca E; Gunaratne, Herath J; Juliano, Celina; Kinukawa, Masashi; Moy, Gary W; Neill, Anna T; Nomura, Mamoru; Raisch, Michael; Reade, Anna; Roux, Michelle M; Song, Jia L; Su, Yi-Hsien; Townley, Ian K; Voronina, Ekaterina; Wong, Julian L; Amore, Gabriele; Branno, Margherita; Brown, Euan R; Cavalieri, Vincenzo; Duboc, Véronique; Duloquin, Louise; Flytzanis, Constantin; Gache, Christian; Lapraz, François; Lepage, Thierry; Locascio, Annamaria; Martinez, Pedro; Matassi, Giorgio; Matranga, Valeria; Range, Ryan; Rizzo, Francesca; Röttinger, Eric; Beane, Wendy; Bradham, Cynthia; Byrum, Christine; Glenn, Tom; Hussain, Sofia; Manning, Gerard; Miranda, Esther; Thomason, Rebecca; Walton, Katherine; Wikramanayke, Athula; Wu, Shu-Yu; Xu, Ronghui; Brown, C Titus; Chen, Lili; Gray, Rachel F; Lee, Pei Yun; Nam, Jongmin; Oliveri, Paola; Smith, Joel; Muzny, Donna; Bell, Stephanie; Chacko, Joseph; Cree, Andrew; Curry, Stacey; Davis, Clay; Dinh, Huyen; Dugan-Rocha, Shannon; Fowler, Jerry; Gill, Rachel; Hamilton, Cerrissa; Hernandez, Judith; Hines, Sandra; Hume, Jennifer; Jackson, Laronda; Jolivet, Angela; Kovar, Christie; Lee, Sandra; Lewis, Lora; Miner, George; Morgan, Margaret; Nazareth, Lynne V; Okwuonu, Geoffrey; Parker, David; Pu, Ling-Ling; Thorn, Rachel; Wright, Rita

    2006-11-10

    We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.

  16. Exploring Evolutionary Patterns in Genetic Sequence: A Computer Exercise

    ERIC Educational Resources Information Center

    Shumate, Alice M.; Windsor, Aaron J.

    2010-01-01

    The increase in publications presenting molecular evolutionary analyses and the availability of comparative sequence data through resources such as NCBI's GenBank underscore the necessity of providing undergraduates with hands-on sequence analysis skills in an evolutionary context. This need is particularly acute given that students have been…

  17. Influenza A virus evolution and spatio-temporal dynamics in Eurasian wild birds: a phylogenetic and phylogeographical study of whole-genome sequence data

    PubMed Central

    Lewis, Nicola S.; Verhagen, Josanne H.; Javakhishvili, Zurab; Russell, Colin A.; Lexmond, Pascal; Westgeest, Kim B.; Bestebroer, Theo M.; Halpin, Rebecca A.; Lin, Xudong; Ransier, Amy; Fedorova, Nadia B.; Stockwell, Timothy B.; Latorre-Margalef, Neus; Olsen, Björn; Smith, Gavin; Bahl, Justin; Wentworth, David E.; Waldenström, Jonas; Fouchier, Ron A. M.

    2015-01-01

    Low pathogenic avian influenza A viruses (IAVs) have a natural host reservoir in wild waterbirds and the potential to spread to other host species. Here, we investigated the evolutionary, spatial and temporal dynamics of avian IAVs in Eurasian wild birds. We used whole-genome sequences collected as part of an intensive long-term Eurasian wild bird surveillance study, and combined this genetic data with temporal and spatial information to explore the virus evolutionary dynamics. Frequent reassortment and co-circulating lineages were observed for all eight genomic RNA segments over time. There was no apparent species-specific effect on the diversity of the avian IAVs. There was a spatial and temporal relationship between the Eurasian sequences and significant viral migration of avian IAVs from West Eurasia towards Central Eurasia. The observed viral migration patterns differed between segments. Furthermore, we discuss the challenges faced when analysing these surveillance and sequence data, and the caveats to be borne in mind when drawing conclusions from the apparent results of such analyses. PMID:25904147

  18. Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs.

    PubMed

    Várnai, Csilla; Burkoff, Nikolas S; Wild, David L

    2017-01-01

    Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs.

  19. Effects of Darwinian Selection and Mutability on Rate of Broadly Neutralizing Antibody Evolution during HIV-1 Infection

    PubMed Central

    Sheng, Zizhang; Schramm, Chaim A.; Connors, Mark; Morris, Lynn; Mascola, John R.; Kwong, Peter D.; Shapiro, Lawrence

    2016-01-01

    Accumulation of somatic mutations in antibody variable regions is critical for antibody affinity maturation, with HIV-1 broadly neutralizing antibodies (bnAbs) generally requiring years to develop. We recently found that the rate at which mutations accumulate decreases over time, but the mechanism governing this slowing is unclear. In this study, we investigated whether natural selection and/or mutability of the antibody variable region contributed significantly to observed decrease in rate. We used longitudinally sampled sequences of immunoglobulin transcripts of single lineages from each of 3 donors, as determined by next generation sequencing. We estimated the evolutionary rates of the complementarity determining regions (CDRs), which are most significant for functional selection, and found they evolved about 1.5- to 2- fold faster than the framework regions. We also analyzed the presence of AID hotspots and coldspots at different points in lineage development and observed an average decrease in mutability of less than 10 percent over time. Altogether, the correlation between Darwinian selection strength and evolutionary rate trended toward significance, especially for CDRs, but cannot fully explain the observed changes in evolutionary rate. The mutability modulated by AID hotspots and coldspots changes correlated only weakly with evolutionary rates. The combined effects of Darwinian selection and mutability contribute substantially to, but do not fully explain, evolutionary rate change for HIV-1-targeting bnAb lineages. PMID:27191167

  20. Evolutionary and Functional Relationships in the Truncated Hemoglobin Family.

    PubMed

    Bustamante, Juan P; Radusky, Leandro; Boechi, Leonardo; Estrin, Darío A; Ten Have, Arjen; Martí, Marcelo A

    2016-01-01

    Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2 affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends.

  1. Evolutionary and Functional Relationships in the Truncated Hemoglobin Family

    PubMed Central

    Bustamante, Juan P.; Radusky, Leandro; Boechi, Leonardo; Estrin, Darío A.; ten Have, Arjen; Martí, Marcelo A.

    2016-01-01

    Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2 affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends. PMID:26788940

  2. Identifying the pattern of molecular evolution for Zaire ebolavirus in the 2014 outbreak in West Africa.

    PubMed

    Liu, Si-Qing; Deng, Cheng-Lin; Yuan, Zhi-Ming; Rayner, Simon; Zhang, Bo

    2015-06-01

    The current Ebola virus disease (EVD) epidemic has killed more than all previous Ebola outbreaks combined and, even as efforts appear to be bringing the outbreak under control, the threat of reemergence remains. The availability of new whole-genome sequences from West Africa in 2014 outbreak, together with those from the earlier outbreaks, provide an opportunity to investigate the genetic characteristics, the epidemiological dynamics and the evolutionary history for Zaire ebolavirus (ZEBOV). To investigate the evolutionary properties of ZEBOV in this outbreak, we examined amino acid mutations, positive selection, and evolutionary rates on the basis of 123 ZEBOV genome sequences. The estimated phylogenetic relationships within ZEBOV revealed that viral sequences from the same period or location formed a distinct cluster. The West Africa viruses probably derived from Middle Africa, consistent with results from previous studies. Analysis of the seven protein regions of ZEBOV revealed evidence of positive selection acting on the GP and L genes. Interestingly, all putatively positive-selected sites identified in the GP are located within the mucin-like domain of the solved structure of the protein, suggesting a possible role in the immune evasion properties of ZEBOV. Compared with earlier outbreaks, the evolutionary rate of GP gene was estimated to significantly accelerate in the 2014 outbreak, suggesting that more ZEBOV variants are generated for human to human transmission during this sweeping epidemic. However, a more balanced sample set and next generation sequencing datasets would help achieve a clearer understanding at the genetic level of how the virus is evolving and adapting to new conditions. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Lifemap: Exploring the Entire Tree of Life.

    PubMed

    de Vienne, Damien M

    2016-12-01

    The Tree of Life (ToL) is meant to be a unique representation of the evolutionary relationships between all species on earth. Huge efforts are made to assemble such a large tree, helped by the decrease of sequencing costs and improved methods to reconstruct and combine phylogenies, but no tool exists today to explore the ToL in its entirety in a satisfying manner. By combining methods used in modern cartography, such as OpenStreetMap, with a new way of representing tree-like structures, I created Lifemap, a tool allowing the exploration of a complete representation of the ToL (between 800,000 and 2.2 million species depending on the data source) in a zoomable interface. A server version of Lifemap also allows users to visualize their own trees. This should help researchers in ecology and evolutionary biology in their everyday work, but may also permit the diffusion to a broader audience of our current knowledge of the evolutionary relationships linking all organisms.

  4. BAYESIAN PROTEIN STRUCTURE ALIGNMENT.

    PubMed

    Rodriguez, Abel; Schmidler, Scott C

    The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.

  5. ORCAN-a web-based meta-server for real-time detection and functional annotation of orthologs.

    PubMed

    Zielezinski, Andrzej; Dziubek, Michal; Sliski, Jan; Karlowski, Wojciech M

    2017-04-15

    ORCAN (ORtholog sCANner) is a web-based meta-server for one-click evolutionary and functional annotation of protein sequences. The server combines information from the most popular orthology-prediction resources, including four tools and four online databases. Functional annotation utilizes five additional comparisons between the query and identified homologs, including: sequence similarity, protein domain architectures, functional motifs, Gene Ontology term assignments and a list of associated articles. Furthermore, the server uses a plurality-based rating system to evaluate the orthology relationships and to rank the reference proteins by their evolutionary and functional relevance to the query. Using a dataset of ∼1 million true yeast orthologs as a sample reference set, we show that combining multiple orthology-prediction tools in ORCAN increases the sensitivity and precision by 1-2 percent points. The service is available for free at http://www.combio.pl/orcan/ . wmk@amu.edu.pl. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  6. The Genome of the Sea Urchin Strongylocentrotus purpuratus

    PubMed Central

    2011-01-01

    We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes. PMID:17095691

  7. Incorporating evolution of transcription factor binding sites into annotated alignments.

    PubMed

    Bais, Abha S; Grossmann, Stefen; Vingron, Martin

    2007-08-01

    Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield "conserved TFBSs". Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits)are generated. Moreover,the pair- profile related parameters are derived in a sound statistical framework. In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions,as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs,we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution. Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this.Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.

  8. The scope and strength of sex-specific selection in genome evolution.

    PubMed

    Wright, A E; Mank, J E

    2013-09-01

    Males and females share the vast majority of their genomes and yet are often subject to different, even conflicting, selection. Genomic and transcriptomic developments have made it possible to assess sex-specific selection at the molecular level, and it is clear that sex-specific selection shapes the evolutionary properties of several genomic characteristics, including transcription, post-transcriptional regulation, imprinting, genome structure and gene sequence. Sex-specific selection is strongly influenced by mating system, which also causes neutral evolutionary changes that affect different regions of the genome in different ways. Here, we synthesize theoretical and molecular work in order to provide a cohesive view of the role of sex-specific selection and mating system in genome evolution. We also highlight the need for a combined approach, incorporating both genomic data and experimental phenotypic studies, in order to understand precisely how sex-specific selection drives evolutionary change across the genome. © 2013 The Authors. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.

  9. Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing.

    PubMed

    Liu, Tiancheng; Yu, Lin; Liu, Lei; Li, Hong; Li, Yixue

    2015-01-01

    High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO) studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the "funnel-like" model and the "hourglass" model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.

  10. Phenotype–genotype correlation in Hirschsprung disease is illuminated by comparative analysis of the RET protein sequence

    PubMed Central

    Kashuk, Carl S.; Stone, Eric A.; Grice, Elizabeth A.; Portnoy, Matthew E.; Green, Eric D.; Sidow, Arend; Chakravarti, Aravinda; McCallion, Andrew S.

    2005-01-01

    The ability to discriminate between deleterious and neutral amino acid substitutions in the genes of patients remains a significant challenge in human genetics. The increasing availability of genomic sequence data from multiple vertebrate species allows inclusion of sequence conservation and physicochemical properties of residues to be used for functional prediction. In this study, the RET receptor tyrosine kinase serves as a model disease gene in which a broad spectrum (≥116) of disease-associated mutations has been identified among patients with Hirschsprung disease and multiple endocrine neoplasia type 2. We report the alignment of the human RET protein sequence with the orthologous sequences of 12 non-human vertebrates (eight mammalian, one avian, and three teleost species), their comparative analysis, the evolutionary topology of the RET protein, and predicted tolerance for all published missense mutations. We show that, although evolutionary conservation alone provides significant information to predict the effect of a RET mutation, a model that combines comparative sequence data with analysis of physiochemical properties in a quantitative framework provides far greater accuracy. Although the ability to discern the impact of a mutation is imperfect, our analyses permit substantial discrimination between predicted functional classes of RET mutations and disease severity even for a multigenic disease such as Hirschsprung disease. PMID:15956201

  11. Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks

    PubMed Central

    2011-01-01

    Background Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. Results A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Conclusions Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced. PMID:21849086

  12. Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks.

    PubMed

    Xie, Xueying; Jin, Jing; Mao, Yongyi

    2011-08-18

    Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced.

  13. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  14. Purifying Selection Maintains Dosage-Sensitive Genes during Degeneration of the Threespine Stickleback Y Chromosome

    PubMed Central

    White, Michael A.; Kitano, Jun; Peichel, Catherine L.

    2015-01-01

    Sex chromosomes are subject to unique evolutionary forces that cause suppression of recombination, leading to sequence degeneration and the formation of heteromorphic chromosome pairs (i.e., XY or ZW). Although progress has been made in characterizing the outcomes of these evolutionary processes on vertebrate sex chromosomes, it is still unclear how recombination suppression and sequence divergence typically occur and how gene dosage imbalances are resolved in the heterogametic sex. The threespine stickleback fish (Gasterosteus aculeatus) is a powerful model system to explore vertebrate sex chromosome evolution, as it possesses an XY sex chromosome pair at relatively early stages of differentiation. Using a combination of whole-genome and transcriptome sequencing, we characterized sequence evolution and gene expression across the sex chromosomes. We uncovered two distinct evolutionary strata that correspond with known structural rearrangements on the Y chromosome. In the oldest stratum, only a handful of genes remain, and these genes are under strong purifying selection. By comparing sex-linked gene expression with expression of autosomal orthologs in an outgroup, we show that dosage compensation has not evolved in threespine sticklebacks through upregulation of the X chromosome in males. Instead, in the oldest stratum, the genes that still possess a Y chromosome allele are enriched for genes predicted to be dosage sensitive in mammals and yeast. Our results suggest that dosage imbalances may have been avoided at haploinsufficient genes by retaining function of the Y chromosome allele through strong purifying selection. PMID:25818858

  15. Phylodynamics of the HIV-1 CRF02_AG clade in Cameroon

    PubMed Central

    Faria, Nuno Rodrigues; Suchard, Marc A; Abecasis, Ana; Sousa, J. D.; Ndembi, Nicaise; Camacho, R.J.; Vandamme, Anne-Mieke; Peeters, Martine; Lemey, Philippe

    2015-01-01

    Evolutionary analyses have revealed an origin of pandemic HIV-1 group M in the Congo River basin in the first part of the XXth century, but the patterns of historical viral spread in or around its epicentre remain largely unexplored. Here, we combine epidemiologic and molecular sequence data to investigate the spatiotemporal patterns of the CRF02_AG clade. By explicitly integrating prevalence counts and genetic population size estimates we date the epidemic emergence of CRF02_AG at 1973.1 (1972.1, 1975.3 95% CI). To infer their phylogeographic signature at a regional scale, we analyze pol and env time-stamped sequence data from 8 countries using a Bayesian phylogeographic approach based on a discrete asymmetric model. Our data confirms a spatial origin of this clade in the Democratic Republic of Congo (DRC) and suggests that viral dissemination to Cameroon occurred at an early stage of the evolutionary history of CRF02_AG. We find considerable support for epidemiological linkage between neighbour countries. Compilation of ethnographic data suggests that well-supported viral migration was related with chance exportation events rather than by sustained human migratory flows. Finally, using sequence data from 15 locations in Cameroon, we use relaxed random walk models to explore the spatiotemporal dynamics of CRF02_AG at a finer geographical detail. Phylogeographic dispersal in continuous space reveals that at least two distinct CRF02_AG lineages are circulating in overlapping regions that are evolving at different evolutionary and diffusion rates. Altogether, by combining molecular and epidemiological data, our results provide a time scale for CRF02_AG, place its spatial root within the putative root of group-M diversity and propose a scenario for the spatiotemporal patterns of a successful HIV-1 lineage both at a regional and country-scale. PMID:21565285

  16. The evolutionary time machine: forecasting how populations can adapt to changing environments using dormant propagules

    PubMed Central

    Orsini, Luisa; Schwenk, Klaus; De Meester, Luc; Colbourne, John K.; Pfrender, Michael E.; Weider, Lawrence J.

    2013-01-01

    Evolutionary changes are determined by a complex assortment of ecological, demographic and adaptive histories. Predicting how evolution will shape the genetic structures of populations coping with current (and future) environmental challenges has principally relied on investigations through space, in lieu of time, because long-term phenotypic and molecular data are scarce. Yet, dormant propagules in sediments, soils and permafrost are convenient natural archives of population-histories from which to trace adaptive trajectories along extended time periods. DNA sequence data obtained from these natural archives, combined with pioneering methods for analyzing both ecological and population genomic time-series data, are likely to provide predictive models to forecast evolutionary responses of natural populations to environmental changes resulting from natural and anthropogenic stressors, including climate change. PMID:23395434

  17. Toward a method for tracking virus evolutionary trajectory applied to the pandemic H1N1 2009 influenza virus.

    PubMed

    Squires, R Burke; Pickett, Brett E; Das, Sajal; Scheuermann, Richard H

    2014-12-01

    In 2009 a novel pandemic H1N1 influenza virus (H1N1pdm09) emerged as the first official influenza pandemic of the 21st century. Early genomic sequence analysis pointed to the swine origin of the virus. Here we report a novel computational approach to determine the evolutionary trajectory of viral sequences that uses data-driven estimations of nucleotide substitution rates to track the gradual accumulation of observed sequence alterations over time. Phylogenetic analysis and multiple sequence alignments show that sequences belonging to the resulting evolutionary trajectory of the H1N1pdm09 lineage exhibit a gradual accumulation of sequence variations and tight temporal correlations in the topological structure of the phylogenetic trees. These results suggest that our evolutionary trajectory analysis (ETA) can more effectively pinpoint the evolutionary history of viruses, including the host and geographical location traversed by each segment, when compared against either BLAST or traditional phylogenetic analysis alone. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Comparative Transcriptomics of Strawberries (Fragaria spp.) Provides Insights into Evolutionary Patterns.

    PubMed

    Qiao, Qin; Xue, Li; Wang, Qia; Sun, Hang; Zhong, Yang; Huang, Jinling; Lei, Jiajun; Zhang, Ticao

    2016-01-01

    Multiple closely related species with genomic sequences provide an ideal system for studies on comparative and evolutionary genomics, as well as the mechanism of speciation. The whole genome sequences of six strawberry species ( Fragaria spp.) have been released, which provide one of the richest genomic resources of any plant genus. In this study, we first generated seven transcriptome sequences of Fragaria species de novo , with a total of 48,557-82,537 unigenes per species. Combined with 13 other species genomes in Rosales, we reconstructed a phylogenetic tree at the genomic level. The phylogenic tree shows that Fragaria closed grouped with Rubus and the Fragaria clade is divided into three subclades. East Asian species appeared in every subclade, suggesting that the genus originated in this area at ∼7.99 Mya. Four species found in mountains of Southwest China originated at ∼3.98 Mya, suggesting that rapid speciation occurred to adapt to changing environments following the uplift of the Qinghai-Tibet Plateau. Moreover, we identified 510 very significantly positively selected genes in the cultivated species F . × ananassa genome. This set of genes was enriched in functions related to specific agronomic traits, such as carbon metabolism and plant hormone signal transduction processes, which are directly related to fruit quality and flavor. These findings illustrate comprehensive evolutionary patterns in Fragaria and the genetic basis of fruit domestication of cultivated strawberry at the genomic/transcriptomic level.

  19. Classification and Lineage Tracing of SH2 Domains Throughout Eukaryotes.

    PubMed

    Liu, Bernard A

    2017-01-01

    Today there exists a rapidly expanding number of sequenced genomes. Cataloging protein interaction domains such as the Src Homology 2 (SH2) domain across these various genomes can be accomplished with ease due to existing algorithms and predictions models. An evolutionary analysis of SH2 domains provides a step towards understanding how SH2 proteins integrated with existing signaling networks to position phosphotyrosine signaling as a crucial driver of robust cellular communication networks in metazoans. However organizing and tracing SH2 domain across organisms and understanding their evolutionary trajectory remains a challenge. This chapter describes several methodologies towards analyzing the evolutionary trajectory of SH2 domains including a global SH2 domain classification system, which facilitates annotation of new SH2 sequences essential for tracing the lineage of SH2 domains throughout eukaryote evolution. This classification utilizes a combination of sequence homology, protein domain architecture and the boundary positions between introns and exons within the SH2 domain or genes encoding these domains. Discrete SH2 families can then be traced across various genomes to provide insight into its origins. Furthermore, additional methods for examining potential mechanisms for divergence of SH2 domains from structural changes to alterations in the protein domain content and genome duplication will be discussed. Therefore a better understanding of SH2 domain evolution may enhance our insight into the emergence of phosphotyrosine signaling and the expansion of protein interaction domains.

  20. The Plasmodium gaboni genome illuminates allelic dimorphism of immunologically important surface antigens in P. falciparum.

    PubMed

    Roy, Scott William

    2015-12-01

    In the deadly human malaria parasite Plasmodium falciparum, several major merozoite surface proteins (MSPs) show a striking pattern of allelic diversity called allelic dimorphism (AD). In AD, the vast majority of observed alleles fall into two highly divergent allelic classes, with recombinant alleles being rare or not observed, presumably due to repression by natural selection (recombination suppression, or RS). The three AD loci, merozoite surface proteins (MSPs) 1, 2, and 6, along with MSP3, which also exhibits RS among four allelic classes, can be collectively called AD/RS. The causes of AD/RS and the evolutionary history of allelic diversity at these loci remain mysterious. The few available sequences from a single closely related chimpanzee parasite, P. reichenowi, have suggested that for 3/4 loci, AD/RS is an ancient state that has been retained in P. falciparum since well before the P. falciparum-P. reichenowi ancestor. On the other hand, based on comparative sequence analysis, we recently suggested that (i) AD/RS P. falciparum loci have undergone interallelic recombination over longer evolutionary times (on the timescale of recent speciation events), and thus (ii) AD/RS may be a recent phenomenon. The recent publication of genomic sequencing efforts for P. gaboni, an outgroup to P. falciparum and P. reichenowi, allows for improved reconstruction of the evolutionary history of these loci. In this work, I report genic sequence for P. gaboni for all four AD/RS P. falciparum loci (MSP1, 2, 3, and 6). Comparison of these sequences with available P. falciparum and P. reichenowi data strengthens the evidence for interallelic recombination over the evolutionary history of these species and also strengthens the case that AD/RS at these loci is ancient. Combined with previous results, these data provide evidence that AD/RS at different loci has evolved at several different times in the evolutionary history of P. falciparum: (i) before the P. gaboni-P. falciparum divergence, for much of MSP1 and MSP3; (ii) between the P. gaboni-P. falciparum and P. reichenowi-P. falciparum divergences, for the 5' end of the AD region of MSP6 and block 3 of MSP1; (iii) near the P. reichenowi-P. falciparum divergence, for the 3' end of the AD region of MSP6; and (iv) after the P. reichenowi-P. falciparum divergence, for MSP2. Based on these results, I suggest a new hypothesis for long-term evolutionary maintenance of AD/RS by recombination within allelic groups. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites.

    PubMed

    Fang, Chun; Noguchi, Tamotsu; Yamana, Hayato

    2014-10-01

    Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been widely adopted by sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved at some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. Every value in PSSMs is calculated independently of each other, without carrying the contextual information of residues in the sequence. Therefore, adopting the direct output of PSSM for prediction fails to consider the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of combining PSSMs with the specific conservation patterns of functional sites for prediction, three different PSSM-based methods for identifying three kinds of functional sites have been analyzed. Results suggest that, different PSSM-based methods differ in their capability to identify different patterns of functional sites, and better combining PSSMs with the specific conservation patterns of residues would largely facilitate the prediction.

  2. Detailed phylogenetic analysis of primate T-lymphotropic virus type 1 (PTLV-1) sequences from orangutans (Pongo pygmaeus) reveals new insights into the evolutionary history of PTLV-1 in Asia.

    PubMed

    Reid, Michael J C; Switzer, William M; Schillaci, Michael A; Ragonnet-Cronin, Manon; Joanisse, Isabelle; Caminiti, Kyna; Lowenberger, Carl A; Galdikas, Birute Mary F; Sandstrom, Paul A; Brooks, James I

    2016-09-01

    While human T-lymphotropic virus type 1 (HTLV-1) originates from ancient cross-species transmission of simian T-lymphotropic virus type 1 (STLV-1) from infected nonhuman primates, much debate exists on whether the first HTLV-1 occurred in Africa, or in Asia during early human evolution and migration. This topic is complicated by a lack of representative Asian STLV-1 to infer PTLV-1 evolutionary histories. In this study we obtained new STLV-1 LTR and tax sequences from a wild-born Bornean orangutan (Pongo pygmaeus) and performed detailed phylogenetic analyses using both maximum likelihood and Bayesian inference of available Asian PTLV-1 and African STLV-1 sequences. Phylogenies, divergence dates and nucleotide substitution rates were co-inferred and compared using six different molecular clock calibrations in a Bayesian framework, including both archaeological and/or nucleotide substitution rate calibrations. We then combined our molecular results with paleobiogeographical and ecological data to infer the most likely evolutionary history of PTLV-1. Based on the preferred models our analyses robustly inferred an Asian source for PTLV-1 with cross-species transmission of STLV-1 likely from a macaque (Macaca sp.) to an orangutan about 37.9-48.9kya, and to humans between 20.3-25.5kya. An orangutan diversification of STLV-1 commenced approximately 6.4-7.3kya. Our analyses also inferred that HTLV-1 was first introduced into Australia ~3.1-3.7kya, corresponding to both genetic and archaeological changes occurring in Australia at that time. Finally, HTLV-1 appears in Melanesia at ~2.3-2.7kya corresponding to the migration of the Lapita peoples into the region. Our results also provide an important future reference for calibrating information essential for PTLV evolutionary timescale inference. Longer sequence data, or full genomes from a greater representation of Asian primates, including gibbons, leaf monkeys, and Sumatran orangutans are needed to fully elucidate these evolutionary dates and relationships using the model criteria suggested herein. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Using ancient protein kinases to unravel a modern cancer drug's mechanism

    DOE PAGES

    Wilson, C.; Agafonov, R. V.; Hoemberger, M.; ...

    2015-02-19

    Macromolecular function is rooted in energy landscapes, where sequence determines not a single structure but an ensemble of conformations. Hence, evolution modifies a protein’s function by altering its energy landscape. Consequently, we recreate the evolutionary pathway between two modern human oncogenes, Src and Abl, by reconstructing their common ancestors. Our evolutionary reconstruction combined with x-ray structures of the common ancestor and pre–steady-state kinetics reveals a detailed atomistic mechanism for selectivity of the successful cancer drug Gleevec. Gleevec affinity is gained during the evolutionary trajectory toward Abl and lost toward Src, primarily by shifting an induced-fit equilibrium that is also disruptedmore » in the clinical T315I resistance mutation. Lastly, this work reveals the mechanism of Gleevec specificity while offering insights into how energy landscapes evolve.« less

  4. A pronounced evolutionary shift of the pseudoautosomal region boundary in house mice

    PubMed Central

    White, Michael A.; Ikeda, Akihiro; Payseur, Bret A.

    2012-01-01

    The pseudoautosomal region (PAR) is essential for the accurate pairing and segregation of the X and Y chromosomes during meiosis. Despite its functional significance, the PAR shows substantial evolutionary divergence in structure and sequence between mammalian species. An instructive example of PAR evolution is the house mouse Mus musculus domesticus (represented by the C57BL/6J strain), which has the smallest PAR among those that have been mapped. In C57BL/6J, the PAR boundary is located just ~700 kb from the distal end of the X chromosome, whereas the boundary is found at a more proximal position in Mus spretus, a species that diverged from house mice 2–4 million years ago. Here, we use a combination of genetic and physical mapping to document a pronounced shift in the PAR boundary in a second house mouse subspecies, Mus musculus castaneus (represented by the CAST/EiJ strain), ~430 kb proximal of the M. m. domesticus boundary. We demonstrate molecular evolutionary consequences of this shift, including a marked lineage-specific increase in sequence divergence within Mid1, a gene that resides entirely within the M. m. castaneus PAR but straddles the boundary in other subspecies. Our results extend observations of structural divergence in the PAR to closely related subspecies, pointing to major evolutionary changes in this functionally important genomic region over a short time period. PMID:22763584

  5. A pronounced evolutionary shift of the pseudoautosomal region boundary in house mice.

    PubMed

    White, Michael A; Ikeda, Akihiro; Payseur, Bret A

    2012-08-01

    The pseudoautosomal region (PAR) is essential for the accurate pairing and segregation of the X and Y chromosomes during meiosis. Despite its functional significance, the PAR shows substantial evolutionary divergence in structure and sequence between mammalian species. An instructive example of PAR evolution is the house mouse Mus musculus domesticus (represented by the C57BL/6J strain), which has the smallest PAR among those that have been mapped. In C57BL/6J, the PAR boundary is located just ~700 kb from the distal end of the X chromosome, whereas the boundary is found at a more proximal position in Mus spretus, a species that diverged from house mice 2-4 million years ago. In this study we used a combination of genetic and physical mapping to document a pronounced shift in the PAR boundary in a second house mouse subspecies, Mus musculus castaneus (represented by the CAST/EiJ strain), ~430 kb proximal of the M. m. domesticus boundary. We demonstrate molecular evolutionary consequences of this shift, including a marked lineage-specific increase in sequence divergence within Mid1, a gene that resides entirely within the M. m. castaneus PAR but straddles the boundary in other subspecies. Our results extend observations of structural divergence in the PAR to closely related subspecies, pointing to major evolutionary changes in this functionally important genomic region over a short time period.

  6. Rapidly rotating polytropes in general relativity

    NASA Technical Reports Server (NTRS)

    Cook, Gregory B.; Shapiro, Stuart L.; Teukolsky, Saul A.

    1994-01-01

    We construct an extensive set of equilibrium sequences of rotating polytropes in general relativity. We determine a number of important physical parameters of such stars, including maximum mass and maximum spin rate. The stability of the configurations against quasi-radial perturbations is diagnosed. Two classes of evolutionary sequences of fixed rest mass and entropy are explored: normal sequences which behave very much like Newtonian evolutionary sequences, and supramassive sequences which exist solely because of relativistic effects. Dissipation leading to loss of angular momentum causes a star to evolve in a quasi-stationary fashion along an evolutionary sequence. Supramassive sequences evolve towards eventual catastrophic collapse to a black hole. Prior to collapse, the star must spin up as it loses angular momentum, an effect which may provide an observational precursor to gravitational collapse to a black hole.

  7. Theoretical Foundation of the RelTime Method for Estimating Divergence Times from Variable Evolutionary Rates

    PubMed Central

    Tamura, Koichiro; Tao, Qiqing; Kumar, Sudhir

    2018-01-01

    Abstract RelTime estimates divergence times by relaxing the assumption of a strict molecular clock in a phylogeny. It shows excellent performance in estimating divergence times for both simulated and empirical molecular sequence data sets in which evolutionary rates varied extensively throughout the tree. RelTime is computationally efficient and scales well with increasing size of data sets. Until now, however, RelTime has not had a formal mathematical foundation. Here, we show that the basis of the RelTime approach is a relative rate framework (RRF) that combines comparisons of evolutionary rates in sister lineages with the principle of minimum rate change between evolutionary lineages and their respective descendants. We present analytical solutions for estimating relative lineage rates and divergence times under RRF. We also discuss the relationship of RRF with other approaches, including the Bayesian framework. We conclude that RelTime will be useful for phylogenies with branch lengths derived not only from molecular data, but also morphological and biochemical traits. PMID:29893954

  8. Diminishing-returns epistasis decreases adaptability along an evolutionary trajectory.

    PubMed

    Wünsche, Andrea; Dinh, Duy M; Satterwhite, Rebecca S; Arenas, Carolina Diaz; Stoebel, Daniel M; Cooper, Tim F

    2017-03-01

    Populations evolving in constant environments exhibit declining adaptability. Understanding the basis of this pattern could reveal underlying processes determining the repeatability of evolutionary outcomes. In principle, declining adaptability can be due to a decrease in the effect size of beneficial mutations, a decrease in the rate at which they occur, or some combination of both. By evolving Escherichia coli populations started from different steps along a single evolutionary trajectory, we show that declining adaptability is best explained by a decrease in the size of available beneficial mutations. This pattern reflected the dominant influence of negative genetic interactions that caused new beneficial mutations to confer smaller benefits in fitter genotypes. Genome sequencing revealed that starting genotypes that were more similar to one another did not exhibit greater similarity in terms of new beneficial mutations, supporting the view that epistasis acts globally, having a greater influence on the effect than on the identity of available mutations along an adaptive trajectory. Our findings provide support for a general mechanism that leads to predictable phenotypic evolutionary trajectories.

  9. A high density physical map of chromosome 1BL supports evolutionary studies, map-based cloning and sequencing in wheat

    PubMed Central

    2013-01-01

    Background As for other major crops, achieving a complete wheat genome sequence is essential for the application of genomics to breeding new and improved varieties. To overcome the complexities of the large, highly repetitive and hexaploid wheat genome, the International Wheat Genome Sequencing Consortium established a chromosome-based strategy that was validated by the construction of the physical map of chromosome 3B. Here, we present improved strategies for the construction of highly integrated and ordered wheat physical maps, using chromosome 1BL as a template, and illustrate their potential for evolutionary studies and map-based cloning. Results Using a combination of novel high throughput marker assays and an assembly program, we developed a high quality physical map representing 93% of wheat chromosome 1BL, anchored and ordered with 5,489 markers including 1,161 genes. Analysis of the gene space organization and evolution revealed that gene distribution and conservation along the chromosome results from the superimposition of the ancestral grass and recent wheat evolutionary patterns, leading to a peak of synteny in the central part of the chromosome arm and an increased density of non-collinear genes towards the telomere. With a density of about 11 markers per Mb, the 1BL physical map provides 916 markers, including 193 genes, for fine mapping the 40 QTLs mapped on this chromosome. Conclusions Here, we demonstrate that high marker density physical maps can be developed in complex genomes such as wheat to accelerate map-based cloning, gain new insights into genome evolution, and provide a foundation for reference sequencing. PMID:23800011

  10. Hidden long evolutionary memory in a model biochemical network

    NASA Astrophysics Data System (ADS)

    Ali, Md. Zulfikar; Wingreen, Ned S.; Mukhopadhyay, Ranjan

    2018-04-01

    We introduce a minimal model for the evolution of functional protein-interaction networks using a sequence-based mutational algorithm, and apply the model to study neutral drift in networks that yield oscillatory dynamics. Starting with a functional core module, random evolutionary drift increases network complexity even in the absence of specific selective pressures. Surprisingly, we uncover a hidden order in sequence space that gives rise to long-term evolutionary memory, implying strong constraints on network evolution due to the topology of accessible sequence space.

  11. Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species.

    PubMed

    Walker, Michael B; King, Benjamin L; Paigen, Kenneth

    2012-01-01

    Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.

  12. Integrative View of α2,3-Sialyltransferases (ST3Gal) Molecular and Functional Evolution in Deuterostomes: Significance of Lineage-Specific Losses

    PubMed Central

    Petit, Daniel; Teppa, Elin; Mir, Anne-Marie; Vicogne, Dorothée; Thisse, Christine; Thisse, Bernard; Filloux, Cyril; Harduin-Lepers, Anne

    2015-01-01

    Sialyltransferases are responsible for the synthesis of a diverse range of sialoglycoconjugates predicted to be pivotal to deuterostomes’ evolution. In this work, we reconstructed the evolutionary history of the metazoan α2,3-sialyltransferases family (ST3Gal), a subset of sialyltransferases encompassing six subfamilies (ST3Gal I–ST3Gal VI) functionally characterized in mammals. Exploration of genomic and expressed sequence tag databases and search of conserved sialylmotifs led to the identification of a large data set of st3gal-related gene sequences. Molecular phylogeny and large scale sequence similarity network analysis identified four new vertebrate subfamilies called ST3Gal III-r, ST3Gal VII, ST3Gal VIII, and ST3Gal IX. To address the issue of the origin and evolutionary relationships of the st3gal-related genes, we performed comparative syntenic mapping of st3gal gene loci combined to ancestral genome reconstruction. The ten vertebrate ST3Gal subfamilies originated from genome duplication events at the base of vertebrates and are organized in three distinct and ancient groups of genes predating the early deuterostomes. Inferring st3gal gene family history identified also several lineage-specific gene losses, the significance of which was explored in a functional context. Toward this aim, spatiotemporal distribution of st3gal genes was analyzed in zebrafish and bovine tissues. In addition, molecular evolutionary analyses using specificity determining position and coevolved amino acid predictions led to the identification of amino acid residues with potential implication in functional divergence of vertebrate ST3Gal. We propose a detailed scenario of the evolutionary relationships of st3gal genes coupled to a conceptual framework of the evolution of ST3Gal functions. PMID:25534026

  13. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    PubMed

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  14. Centromere and telomere sequence alterations reflect the rapid genome evolution within the carnivorous plant genus Genlisea.

    PubMed

    Tran, Trung D; Cao, Hieu X; Jovtchev, Gabriele; Neumann, Pavel; Novák, Petr; Fojtová, Miloslava; Vu, Giang T H; Macas, Jiří; Fajkus, Jiří; Schubert, Ingo; Fuchs, Joerg

    2015-12-01

    Linear chromosomes of eukaryotic organisms invariably possess centromeres and telomeres to ensure proper chromosome segregation during nuclear divisions and to protect the chromosome ends from deterioration and fusion, respectively. While centromeric sequences may differ between species, with arrays of tandemly repeated sequences and retrotransposons being the most abundant sequence types in plant centromeres, telomeric sequences are usually highly conserved among plants and other organisms. The genome size of the carnivorous genus Genlisea (Lentibulariaceae) is highly variable. Here we study evolutionary sequence plasticity of these chromosomal domains at an intrageneric level. We show that Genlisea nigrocaulis (1C = 86 Mbp; 2n = 40) and G. hispidula (1C = 1550 Mbp; 2n = 40) differ as to their DNA composition at centromeres and telomeres. G. nigrocaulis and its close relative G. pygmaea revealed mainly 161 bp tandem repeats, while G. hispidula and its close relative G. subglabra displayed a combination of four retroelements at centromeric positions. G. nigrocaulis and G. pygmaea chromosome ends are characterized by the Arabidopsis-type telomeric repeats (TTTAGGG); G. hispidula and G. subglabra instead revealed two intermingled sequence variants (TTCAGG and TTTCAGG). These differences in centromeric and, surprisingly, also in telomeric DNA sequences, uncovered between groups with on average a > 9-fold genome size difference, emphasize the fast genome evolution within this genus. Such intrageneric evolutionary alteration of telomeric repeats with cytosine in the guanine-rich strand, not yet known for plants, might impact the epigenetic telomere chromatin modification. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

  15. The scope and strength of sex-specific selection in genome evolution

    PubMed Central

    Wright, A E; Mank, J E

    2013-01-01

    Males and females share the vast majority of their genomes and yet are often subject to different, even conflicting, selection. Genomic and transcriptomic developments have made it possible to assess sex-specific selection at the molecular level, and it is clear that sex-specific selection shapes the evolutionary properties of several genomic characteristics, including transcription, post-transcriptional regulation, imprinting, genome structure and gene sequence. Sex-specific selection is strongly influenced by mating system, which also causes neutral evolutionary changes that affect different regions of the genome in different ways. Here, we synthesize theoretical and molecular work in order to provide a cohesive view of the role of sex-specific selection and mating system in genome evolution. We also highlight the need for a combined approach, incorporating both genomic data and experimental phenotypic studies, in order to understand precisely how sex-specific selection drives evolutionary change across the genome. PMID:23848139

  16. Meaningful call combinations and compositional processing in the southern pied babbler

    PubMed Central

    Engesser, Sabrina; Ridley, Amanda R.; Townsend, Simon W.

    2016-01-01

    Language’s expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a “mobbing sequence,” potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought. PMID:27155011

  17. Meaningful call combinations and compositional processing in the southern pied babbler.

    PubMed

    Engesser, Sabrina; Ridley, Amanda R; Townsend, Simon W

    2016-05-24

    Language's expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a "mobbing sequence," potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought.

  18. Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.

    PubMed

    Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi

    2014-01-01

    A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.

  19. Comparative Transcriptomics of Strawberries (Fragaria spp.) Provides Insights into Evolutionary Patterns

    PubMed Central

    Qiao, Qin; Xue, Li; Wang, Qia; Sun, Hang; Zhong, Yang; Huang, Jinling; Lei, Jiajun; Zhang, Ticao

    2016-01-01

    Multiple closely related species with genomic sequences provide an ideal system for studies on comparative and evolutionary genomics, as well as the mechanism of speciation. The whole genome sequences of six strawberry species (Fragaria spp.) have been released, which provide one of the richest genomic resources of any plant genus. In this study, we first generated seven transcriptome sequences of Fragaria species de novo, with a total of 48,557–82,537 unigenes per species. Combined with 13 other species genomes in Rosales, we reconstructed a phylogenetic tree at the genomic level. The phylogenic tree shows that Fragaria closed grouped with Rubus and the Fragaria clade is divided into three subclades. East Asian species appeared in every subclade, suggesting that the genus originated in this area at ∼7.99 Mya. Four species found in mountains of Southwest China originated at ∼3.98 Mya, suggesting that rapid speciation occurred to adapt to changing environments following the uplift of the Qinghai–Tibet Plateau. Moreover, we identified 510 very significantly positively selected genes in the cultivated species F. × ananassa genome. This set of genes was enriched in functions related to specific agronomic traits, such as carbon metabolism and plant hormone signal transduction processes, which are directly related to fruit quality and flavor. These findings illustrate comprehensive evolutionary patterns in Fragaria and the genetic basis of fruit domestication of cultivated strawberry at the genomic/transcriptomic level. PMID:28018379

  20. A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.

    PubMed

    Wallberg, Andreas; Han, Fan; Wellhagen, Gustaf; Dahle, Bjørn; Kawata, Masakado; Haddad, Nizar; Simões, Zilá Luz Paulino; Allsopp, Mike H; Kandemir, Irfan; De la Rúa, Pilar; Pirk, Christian W; Webster, Matthew T

    2014-10-01

    The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system- and sperm motility-related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees.

  1. Characterization of Hepatitis C Virus (HCV) Envelope Diversification from Acute to Chronic Infection within a Sexually Transmitted HCV Cluster by Using Single-Molecule, Real-Time Sequencing

    PubMed Central

    Ho, Cynthia K. Y.; Raghwani, Jayna; Koekkoek, Sylvie; Liang, Richard H.; Van der Meer, Jan T. M.; Van Der Valk, Marc; De Jong, Menno; Pybus, Oliver G.

    2016-01-01

    ABSTRACT In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms. IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations. PMID:28077634

  2. Evolution of high-mass star-forming regions .

    NASA Astrophysics Data System (ADS)

    Giannetti, A.; Leurini, S.; Wyrowski, F.; Urquhart, J.; König, C.; Csengeri, T.; Güsten, R.; Menten, K. M.

    Observational identification of a coherent evolutionary sequence for high-mass star-forming regions is still missing. We use the progressive heating of the gas caused by the feedback of high-mass young stellar objects to prove the statistical validity of the most common schemes used to observationally define an evolutionary sequence for high-mass clumps, and identify which physical process dominates in the different phases. From the spectroscopic follow-ups carried out towards the TOP100 sample between 84 and 365 km s^-1 giga hertz, we selected several multiplets of CH3CN, CH3CCH, and CH3OH lines to derive the physical properties of the gas in the clumps along the evolutionary sequence. We demonstrate that the evolutionary sequence is statistically valid, and we define intervals in L/M separating the compression, collapse and accretion, and disruption phases. The first hot cores and ZAMS stars appear at L/M≈10usk {L_ȯ}msun-1

  3. Time to split Salvia s.l. (Lamiaceae) - New insights from Old World Salvia phylogeny.

    PubMed

    Will, Maria; Claßen-Bockhoff, Regine

    2017-04-01

    Salvia L. is widely known as the largest genus in the mint family. A morphological modification of the androecium (lever-like stamens) was used to support this genus. However, molecular data revealed that Salvia is polyphyletic. Since phylogenetic studies largely underrepresented Old World Salvia species, we filled this gap and combined new data with existing sequences. The aim of our study was the identification of well-supported clades that provide the basis for evolutionary and taxonomic conclusions. We included ITS data (internal transcribed spacer) from 220 Salvia species, 86 of which were sequenced for the first time. Additionally, the highly variable plastid marker rpl32-trnL was sequenced, providing new data for 100 Salvia species. These sequences were combined with the accessions available from GenBank. Old World Salvia is represented herein with 57% of its species. The two datasets were analyzed separately using BI and ML approaches. Our data confirm that Salvia is polyphyletic with four distinct evolutionary lineages (Clade I-IV), including five additional genera. The clades strongly reflect the geographical distribution, i.e., Clade IV (East Asia), Clade III (Southwest Asia to Northern Africa), and Clade II (America). The origin of Salvia s.s. (Clade I) is most likely Southwest Asia. A high degree of parallel character evolution was identified in most of the Old World sections. Based on our results, we reconstructed the evolution and biogeography of Salvia s.l. and propose to split this large group into six genera, each supported by geographical distribution, morphology, and karyology. Salvia s.l. is a polyphyletic group that was originally regarded as a genus because its species share a derived stamen structure. However, phylogenetic data clearly indicate that this floral trait and other morphological characters evolved in parallel. Our study illustrates that the combination of different data sets allows a comprehensive reconstruction of taxa and characteristic evolution, both of which are a precondition for future revision. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Engineering of a target site-specific recombinase by a combined evolution- and structure-guided approach

    PubMed Central

    Abi-Ghanem, Josephine; Chusainow, Janet; Karimova, Madina; Spiegel, Christopher; Hofmann-Sieber, Helga; Hauber, Joachim; Buchholz, Frank; Pisabarro, M. Teresa

    2013-01-01

    Site-specific recombinases (SSRs) can perform DNA rearrangements, including deletions, inversions and translocations when their naive target sequences are placed strategically into the genome of an organism. Hence, in order to employ SSRs in heterologous hosts, their target sites have to be introduced into the genome of an organism before the enzyme can be practically employed. Engineered SSRs hold great promise for biotechnology and advanced biomedical applications, as they promise to extend the usefulness of SSRs to allow efficient and specific recombination of pre-existing, natural genomic sequences. However, the generation of enzymes with desired properties remains challenging. Here, we use substrate-linked directed evolution in combination with molecular modeling to rationally engineer an efficient and specific recombinase (sTre) that readily and specifically recombines a sequence present in the HIV-1 genome. We elucidate the role of key residues implicated in the molecular recognition mechanism and we present a rationale for sTre’s enhanced specificity. Combining evolutionary and rational approaches should help in accelerating the generation of enzymes with desired properties for use in biotechnology and biomedicine. PMID:23275541

  5. Molecular selection in a unified evolutionary sequence

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  6. rVISTA 2.0: Evolutionary Analysis of Transcription Factor Binding Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loots, G G; Ovcharenko, I

    2004-01-28

    Identifying and characterizing the patterns of DNA cis-regulatory modules represents a challenge that has the potential to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and therefore are often conserved between related species. Using this evolutionary principle we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. The rVISTA tool combines transcription factor binding site (TFBS) predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are highly conserved and present in a specific configuration within an alignment. Heremore » we present the newly developed version 2.0 of the rVISTA tool that can process alignments generated by both zPicture and PipMaker alignment programs or use pre-computed pairwise alignments of seven vertebrate genomes available from the ECR Browser. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences. rVISTA tool is publicly available at http://rvista.dcode.org/.« less

  7. Structural genomics reveals EVE as a new ASCH/PUA-related domain

    PubMed Central

    Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard

    2014-01-01

    Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354

  8. Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertonati, C.; Punta, M; Fischer, M

    2008-01-01

    We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less

  9. The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

    PubMed Central

    Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

    2013-01-01

    Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520

  10. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    PubMed

    Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

    2013-01-01

    Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  11. TARGETED CAPTURE IN EVOLUTIONARY AND ECOLOGICAL GENOMICS

    PubMed Central

    Jones, Matthew R.; Good, Jeffrey M.

    2016-01-01

    The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole genome sequencing in ecological and evolutionary genomic studies. High throughput targeted capture is one such strategy that involves the parallel enrichment of pre-selected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across labs focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to 1) increase the accessibility of targeted capture to researchers working in non-model taxa by discussing capture methods that circumvent the need of a reference genome, 2) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy, and 3) discuss the future of targeted capture and other genome partitioning approaches in light of the increasing accessibility of whole genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole genome sequencing. PMID:26137993

  12. Versatility and Invariance in the Evolution of Homologous Heteromeric Interfaces

    PubMed Central

    Andreani, Jessica; Faure, Guilhem; Guerois, Raphaël

    2012-01-01

    Evolutionary pressures act on protein complex interfaces so that they preserve their complementarity. Nonetheless, the elementary interactions which compose the interface are highly versatile throughout evolution. Understanding and characterizing interface plasticity across evolution is a fundamental issue which could provide new insights into protein-protein interaction prediction. Using a database of 1,024 couples of close and remote heteromeric structural interologs, we studied protein-protein interactions from a structural and evolutionary point of view. We systematically and quantitatively analyzed the conservation of different types of interface contacts. Our study highlights astonishing plasticity regarding polar contacts at complex interfaces. It also reveals that up to a quarter of the residues switch out of the interface when comparing two homologous complexes. Despite such versatility, we identify two important interface descriptors which correlate with an increased conservation in the evolution of interfaces: apolar patches and contacts surrounding anchor residues. These observations hold true even when restricting the dataset to transiently formed complexes. We show that a combination of six features related either to sequence or to geometric properties of interfaces can be used to rank positions likely to share similar contacts between two interologs. Altogether, our analysis provides important tracks for extracting meaningful information from multiple sequence alignments of conserved binding partners and for discriminating near-native interfaces using evolutionary information. PMID:22952442

  13. Phylogeography above the species level for perennial species in a composite genus

    PubMed Central

    Tremetsberger, Karin; Ortiz, María Ángeles; Terrab, Anass; Balao, Francisco; Casimiro-Soriguer, Ramón; Talavera, María; Talavera, Salvador

    2016-01-01

    In phylogeography, DNA sequence and fingerprint data at the population level are used to infer evolutionary histories of species. Phylogeography above the species level is concerned with the genealogical aspects of divergent lineages. Here, we present a phylogeographic study to examine the evolutionary history of a western Mediterranean composite, focusing on the perennial species of Helminthotheca (Asteraceae, Cichorieae). We used molecular markers (amplified fragment length polymorphism (AFLP), internal transcribed spacer and plastid DNA sequences) to infer relationships among populations throughout the distributional range of the group. Interpretation is aided by biogeographic and molecular clock analyses. Four coherent entities are revealed by Bayesian mixture clustering of AFLP data, which correspond to taxa previously recognized at the rank of subspecies. The origin of the group was in western North Africa, from where it expanded across the Strait of Gibraltar to the Iberian Peninsula and across the Strait of Sicily to Sicily. Pleistocene lineage divergence is inferred within western North Africa as well as within the western Iberian region. The existence of the four entities as discrete evolutionary lineages suggests that they should be elevated to the rank of species, yielding H. aculeata, H. comosa, H. maroccana and H. spinosa, whereby the latter two necessitate new combinations. PMID:26644340

  14. A wing expressed sequence tag resource for Bicyclus anynana butterflies, an evo-devo model

    PubMed Central

    Beldade, Patrícia; Rudd, Stephen; Gruber, Jonathan D; Long, Anthony D

    2006-01-01

    Background Butterfly wing color patterns are a key model for integrating evolutionary developmental biology and the study of adaptive morphological evolution. Yet, despite the biological, economical and educational value of butterflies they are still relatively under-represented in terms of available genomic resources. Here, we describe an Expression Sequence Tag (EST) project for Bicyclus anynana that has identified the largest available collection to date of expressed genes for any butterfly. Results By targeting cDNAs from developing wings at the stages when pattern is specified, we biased gene discovery towards genes potentially involved in pattern formation. Assembly of 9,903 ESTs from a subtracted library allowed us to identify 4,251 genes of which 2,461 were annotated based on BLAST analyses against relevant gene collections. Gene prediction software identified 2,202 peptides, of which 215 longer than 100 amino acids had no homology to any known proteins and, thus, potentially represent novel or highly diverged butterfly genes. We combined gene and Single Nucleotide Polymorphism (SNP) identification by constructing cDNA libraries from pools of outbred individuals, and by sequencing clones from the 3' end to maximize alignment depth. Alignments of multi-member contigs allowed us to identify over 14,000 putative SNPs, with 316 genes having at least one high confidence double-hit SNP. We furthermore identified 320 microsatellites in transcribed genes that can potentially be used as genetic markers. Conclusion Our project was designed to combine gene and sequence polymorphism discovery and has generated the largest gene collection available for any butterfly and many potential markers in expressed genes. These resources will be invaluable for exploring the potential of B. anynana in particular, and butterflies in general, as models in ecological, evolutionary, and developmental genetics. PMID:16737530

  15. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    PubMed

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  16. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  17. Rapid genome-wide evolution in Brassica rapa populations following drought revealed by sequencing of ancestral and descendant gene pools.

    PubMed

    Franks, Steven J; Kane, Nolan C; O'Hara, Niamh B; Tittes, Silas; Rest, Joshua S

    2016-08-01

    There is increasing evidence that evolution can occur rapidly in response to selection. Recent advances in sequencing suggest the possibility of documenting genetic changes as they occur in populations, thus uncovering the genetic basis of evolution, particularly if samples are available from both before and after selection. Here, we had a unique opportunity to directly assess genetic changes in natural populations following an evolutionary response to a fluctuation in climate. We analysed genome-wide differences between ancestors and descendants of natural populations of Brassica rapa plants from two locations that rapidly evolved changes in multiple phenotypic traits, including flowering time, following a multiyear late-season drought in California. These ancestor-descendant comparisons revealed evolutionary shifts in allele frequencies in many genes. Some genes showing evolutionary shifts have functions related to drought stress and flowering time, consistent with an adaptive response to selection. Loci differentiated between ancestors and descendants (FST outliers) were generally different from those showing signatures of selection based on site frequency spectrum analysis (Tajima's D), indicating that the loci that evolved in response to the recent drought and those under historical selection were generally distinct. Very few genes showed similar evolutionary responses between two geographically distinct populations, suggesting independent genetic trajectories of evolution yielding parallel phenotypic changes. The results show that selection can result in rapid genome-wide evolutionary shifts in allele frequencies in natural populations, and highlight the usefulness of combining resurrection experiments in natural populations with genomics for studying the genetic basis of adaptive evolution. © 2016 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  18. Analyses of Evolutionary Characteristics of the Hemagglutinin-Esterase Gene of Influenza C Virus during a Period of 68 Years Reveals Evolutionary Patterns Different from Influenza A and B Viruses.

    PubMed

    Furuse, Yuki; Matsuzaki, Yoko; Nishimura, Hidekazu; Oshitani, Hitoshi

    2016-11-26

    Infections with the influenza C virus causing respiratory symptoms are common, particularly among children. Since isolation and detection of the virus are rarely performed, compared with influenza A and B viruses, the small number of available sequences of the virus makes it difficult to analyze its evolutionary dynamics. Recently, we reported the full genome sequence of 102 strains of the virus. Here, we exploited the data to elucidate the evolutionary characteristics and phylodynamics of the virus compared with influenza A and B viruses. Along with our data, we obtained public sequence data of the hemagglutinin-esterase gene of the virus; the dataset consists of 218 unique sequences of the virus collected from 14 countries between 1947 and 2014. Informatics analyses revealed that (1) multiple lineages have been circulating globally; (2) there have been weak and infrequent selective bottlenecks; (3) the evolutionary rate is low because of weak positive selection and a low capability to induce mutations; and (4) there is no significant positive selection although a few mutations affecting its antigenicity have been induced. The unique evolutionary dynamics of the influenza C virus must be shaped by multiple factors, including virological, immunological, and epidemiological characteristics.

  19. Analyses of Evolutionary Characteristics of the Hemagglutinin-Esterase Gene of Influenza C Virus during a Period of 68 Years Reveals Evolutionary Patterns Different from Influenza A and B Viruses

    PubMed Central

    Furuse, Yuki; Matsuzaki, Yoko; Nishimura, Hidekazu; Oshitani, Hitoshi

    2016-01-01

    Infections with the influenza C virus causing respiratory symptoms are common, particularly among children. Since isolation and detection of the virus are rarely performed, compared with influenza A and B viruses, the small number of available sequences of the virus makes it difficult to analyze its evolutionary dynamics. Recently, we reported the full genome sequence of 102 strains of the virus. Here, we exploited the data to elucidate the evolutionary characteristics and phylodynamics of the virus compared with influenza A and B viruses. Along with our data, we obtained public sequence data of the hemagglutinin-esterase gene of the virus; the dataset consists of 218 unique sequences of the virus collected from 14 countries between 1947 and 2014. Informatics analyses revealed that (1) multiple lineages have been circulating globally; (2) there have been weak and infrequent selective bottlenecks; (3) the evolutionary rate is low because of weak positive selection and a low capability to induce mutations; and (4) there is no significant positive selection although a few mutations affecting its antigenicity have been induced. The unique evolutionary dynamics of the influenza C virus must be shaped by multiple factors, including virological, immunological, and epidemiological characteristics. PMID:27898037

  20. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules

    PubMed Central

    Ashkenazy, Haim; Abadi, Shiran; Martz, Eric; Chay, Ofer; Mayrose, Itay; Pupko, Tal; Ben-Tal, Nir

    2016-01-01

    The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree. PMID:27166375

  1. Evolution of Enzyme Superfamilies: Comprehensive Exploration of Sequence-Function Relationships.

    PubMed

    Baier, F; Copp, J N; Tokuriki, N

    2016-11-22

    The sequence and functional diversity of enzyme superfamilies have expanded through billions of years of evolution from a common ancestor. Understanding how protein sequence and functional "space" have expanded, at both the evolutionary and molecular level, is central to biochemistry, molecular biology, and evolutionary biology. Integrative approaches that examine protein sequence, structure, and function have begun to provide comprehensive views of the functional diversity and evolutionary relationships within enzyme superfamilies. In this review, we outline the recent advances in our understanding of enzyme evolution and superfamily functional diversity. We describe the tools that have been used to comprehensively analyze sequence relationships and to characterize sequence and function relationships. We also highlight recent large-scale experimental approaches that systematically determine the activity profiles across enzyme superfamilies. We identify several intriguing insights from this recent body of work. First, promiscuous activities are prevalent among extant enzymes. Second, many divergent proteins retain "function connectivity" via enzyme promiscuity, which can be used to probe the evolutionary potential and history of enzyme superfamilies. Finally, we discuss open questions regarding the intricacies of enzyme divergence, as well as potential research directions that will deepen our understanding of enzyme superfamily evolution.

  2. Evolutionary dynamics of retrotransposons assessed by high-throughput sequencing in wild relatives of wheat.

    PubMed

    Senerchia, Natacha; Wicker, Thomas; Felber, François; Parisod, Christian

    2013-01-01

    Transposable elements (TEs) represent a major fraction of plant genomes and drive their evolution. An improved understanding of genome evolution requires the dynamics of a large number of TE families to be considered. We put forward an approach bypassing the required step of a complete reference genome to assess the evolutionary trajectories of high copy number TE families from genome snapshot with high-throughput sequencing. Low coverage sequencing of the complex genomes of Aegilops cylindrica and Ae. geniculata using 454 identified more than 70% of the sequences as known TEs, mainly long terminal repeat (LTR) retrotransposons. Comparing the abundance of reads as well as patterns of sequence diversity and divergence within and among genomes assessed the dynamics of 44 major LTR retrotransposon families of the 165 identified. In particular, molecular population genetics on individual TE copies distinguished recently active from quiescent families and highlighted different evolutionary trajectories of retrotransposons among related species. This work presents a suite of tools suitable for current sequencing data, allowing to address the genome-wide evolutionary dynamics of TEs at the family level and advancing our understanding of the evolution of nonmodel genomes.

  3. Open Reading Frame Phylogenetic Analysis on the Cloud

    PubMed Central

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  4. Evolutionary distances in the twilight zone--a rational kernel approach.

    PubMed

    Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian

    2010-12-31

    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.

  5. Detecting and Analyzing Genetic Recombination Using RDP4.

    PubMed

    Martin, Darren P; Murrell, Ben; Khoosal, Arjun; Muhire, Brejnev

    2017-01-01

    Recombination between nucleotide sequences is a major process influencing the evolution of most species on Earth. The evolutionary value of recombination has been widely debated and so too has its influence on evolutionary analysis methods that assume nucleotide sequences replicate without recombining. When nucleic acids recombine, the evolution of the daughter or recombinant molecule cannot be accurately described by a single phylogeny. This simple fact can seriously undermine the accuracy of any phylogenetics-based analytical approach which assumes that the evolutionary history of a set of recombining sequences can be adequately described by a single phylogenetic tree. There are presently a large number of available methods and associated computer programs for analyzing and characterizing recombination in various classes of nucleotide sequence datasets. Here we examine the use of some of these methods to derive and test recombination hypotheses using multiple sequence alignments.

  6. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    PubMed

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-06

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. Evolutionary tree reconstruction

    NASA Technical Reports Server (NTRS)

    Cheeseman, Peter; Kanefsky, Bob

    1990-01-01

    It is described how Minimum Description Length (MDL) can be applied to the problem of DNA and protein evolutionary tree reconstruction. If there is a set of mutations that transform a common ancestor into a set of the known sequences, and this description is shorter than the information to encode the known sequences directly, then strong evidence for an evolutionary relationship has been found. A heuristic algorithm is described that searches for the simplest tree (smallest MDL) that finds close to optimal trees on the test data. Various ways of extending the MDL theory to more complex evolutionary relationships are discussed.

  8. The complete nucleotide sequence of the domestic dog (Canis familiaris) mitochondrial genome.

    PubMed

    Kim, K S; Lee, S E; Jeong, H W; Ha, J H

    1998-10-01

    The complete nucleotide sequence of the mitochondrial genome of the domestic dog, Canis familiaris, was determined. The length of the sequence was 16,728 bp; however, the length was not absolute due to the variation (heteroplasmy) caused by differing numbers of the repetitive motif, 5'-GTACACGT(A/G)C-3', in the control region. The genome organization, gene contents, and codon usage conformed to those of other mammalian mitochondrial genomes. Although its features were unknown, the "CTAGA" duplication event which followed the translational stop codon of the COII gene was not observed in other mammalian mitochondrial genomes. In order to determine the possible differences between mtDNAs in carnivores, two rRNA and 13 protein-coding genes from the cat, dog, and seal were compared. The combined molecular differences, in two rRNA genes as well as in the inferred amino acid sequences of the mitochondrial 13 protein-coding genes, suggested that there is a closer relationship between the dog and the seal than there is between either of these species and the cat. Based on the molecular differences of the mtDNA, the evolutionary divergence between the cat, the dog, and the seal was dated to approximately 50 +/- 4 million years ago. The degree of difference between carnivore mtDNAs varied according to the individual protein-coding gene applied, showing that the evolutionary relationships of distantly related species should be presented in an extended study based on ample sequence data like complete mtDNA molecules. Copyright 1998 Academic Press.

  9. 3D RNA and functional interactions from evolutionary couplings

    PubMed Central

    Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.

    2016-01-01

    Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444

  10. Evolutional dynamics of 45S and 5S ribosomal DNA in ancient allohexaploid Atropa belladonna.

    PubMed

    Volkov, Roman A; Panchuk, Irina I; Borisjuk, Nikolai V; Hosiawa-Baranska, Marta; Maluszynska, Jolanta; Hemleben, Vera

    2017-01-23

    Polyploid hybrids represent a rich natural resource to study molecular evolution of plant genes and genomes. Here, we applied a combination of karyological and molecular methods to investigate chromosomal structure, molecular organization and evolution of ribosomal DNA (rDNA) in nightshade, Atropa belladonna (fam. Solanaceae), one of the oldest known allohexaploids among flowering plants. Because of their abundance and specific molecular organization (evolutionarily conserved coding regions linked to variable intergenic spacers, IGS), 45S and 5S rDNA are widely used in plant taxonomic and evolutionary studies. Molecular cloning and nucleotide sequencing of A. belladonna 45S rDNA repeats revealed a general structure characteristic of other Solanaceae species, and a very high sequence similarity of two length variants, with the only difference in number of short IGS subrepeats. These results combined with the detection of three pairs of 45S rDNA loci on separate chromosomes, presumably inherited from both tetraploid and diploid ancestor species, example intensive sequence homogenization that led to substitution/elimination of rDNA repeats of one parent. Chromosome silver-staining revealed that only four out of six 45S rDNA sites are frequently transcriptionally active, demonstrating nucleolar dominance. For 5S rDNA, three size variants of repeats were detected, with the major class represented by repeats containing all functional IGS elements required for transcription, the intermediate size repeats containing partially deleted IGS sequences, and the short 5S repeats containing severe defects both in the IGS and coding sequences. While shorter variants demonstrate increased rate of based substitution, probably in their transition into pseudogenes, the functional 5S rDNA variants are nearly identical at the sequence level, pointing to their origin from a single parental species. Localization of the 5S rDNA genes on two chromosome pairs further supports uniparental inheritance from the tetraploid progenitor. The obtained molecular, cytogenetic and phylogenetic data demonstrate complex evolutionary dynamics of rDNA loci in allohexaploid species of Atropa belladonna. The high level of sequence unification revealed in 45S and 5S rDNA loci of this ancient hybrid species have been seemingly achieved by different molecular mechanisms.

  11. Targeted sequencing for high-resolution evolutionary analyses following genome duplication in salmonid fish: Proof of concept for key components of the insulin-like growth factor axis.

    PubMed

    Lappin, Fiona M; Shaw, Rebecca L; Macqueen, Daniel J

    2016-12-01

    High-throughput sequencing has revolutionised comparative and evolutionary genome biology. It has now become relatively commonplace to generate multiple genomes and/or transcriptomes to characterize the evolution of large taxonomic groups of interest. Nevertheless, such efforts may be unsuited to some research questions or remain beyond the scope of some research groups. Here we show that targeted high-throughput sequencing offers a viable alternative to study genome evolution across a vertebrate family of great scientific interest. Specifically, we exploited sequence capture and Illumina sequencing to characterize the evolution of key components from the insulin-like growth (IGF) signalling axis of salmonid fish at unprecedented phylogenetic resolution. The IGF axis represents a central governor of vertebrate growth and its core components were expanded by whole genome duplication in the salmonid ancestor ~95Ma. Using RNA baits synthesised to genes encoding the complete family of IGF binding proteins (IGFBP) and an IGF hormone (IGF2), we captured, sequenced and assembled orthologous and paralogous exons from species representing all ten salmonid genera. This approach generated 299 novel sequences, most as complete or near-complete protein-coding sequences. Phylogenetic analyses confirmed congruent evolutionary histories for all nineteen recognized salmonid IGFBP family members and identified novel salmonid-specific IGF2 paralogues. Moreover, we reconstructed the evolution of duplicated IGF axis paralogues across a replete salmonid phylogeny, revealing complex historic selection regimes - both ancestral to salmonids and lineage-restricted - that frequently involved asymmetric paralogue divergence under positive and/or relaxed purifying selection. Our findings add to an emerging literature highlighting diverse applications for targeted sequencing in comparative-evolutionary genomics. We also set out a viable approach to obtain large sets of nuclear genes for any member of the salmonid family, which should enable insights into the evolutionary role of whole genome duplication before additional nuclear genome sequences become available. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331

  13. Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing.

    PubMed

    Kamneva, Olga K; Syring, John; Liston, Aaron; Rosenberg, Noah A

    2017-08-04

    Hybridization is observed in many eukaryotic lineages and can lead to the formation of polyploid species. The study of hybridization and polyploidization faces challenges both in data generation and in accounting for population-level phenomena such as coalescence processes in phylogenetic analysis. Genus Fragaria is one example of a set of plant taxa in which a range of ploidy levels is observed across species, but phylogenetic origins are unknown. Here, using 20 diploid and polyploid Fragaria species, we combine approaches from NGS data analysis and phylogenetics to infer evolutionary origins of polyploid strawberries, taking into account coalescence processes. We generate haplotype sequences for 257 low-copy nuclear markers assembled from Illumina target capture sequence data. We then identify putative hybridization events by analyzing gene tree topologies, and further test predicted hybridizations in a coalescence framework. This approach confirms the allopolyploid ancestry of F. chiloensis and F. virginiana, and provides new allopolyploid ancestry hypotheses for F. iturupensis, F. moschata, and F. orientalis. Evidence of gene flow between diploids F. bucharica and F. vesca is also detected, suggesting that it might be appropriate to consider these groups as conspecifics. This study is one of the first in which target capture sequencing followed by computational deconvolution of individual haplotypes is used for tracing origins of polyploid taxa. The study also provides new perspectives on the evolutionary history of Fragaria.

  14. Corallimorpharians are not “naked corals”: insights into relationships between Scleractinia and Corallimorpharia from phylogenomic analyses

    PubMed Central

    Lin, Mei Fang; Chou, Wen Hwa; Kitahara, Marcelo V.; Chen, Chao Lun Allen

    2016-01-01

    Calcification is one of the most distinctive traits of scleractinian corals. Their hard skeletons form the substratum of reef ecosystems and confer on corals their remarkable diversity of shapes. Corallimorpharians are non-calcifying, close relatives of scleractinian corals, and the evolutionary relationship between these two groups is key to understanding the evolution of calcification in the coral lineage. One pivotal question is whether scleractinians are a monophyletic group, paraphyly being an alternative possibility if corallimorpharians are corals that have lost their ability to calcify, as is implied by the “naked-coral” hypothesis. Despite major efforts, relationships between scleractinians and corallimorpharians remain equivocal and controversial. Although the complete mitochondrial genomes of a range of scleractinians and corallimorpharians have been obtained, heterogeneity in composition and evolutionary rates means that mitochondrial sequences are insufficient to understand the relationship between these two groups. To overcome these limitations, transcriptome data were generated for three representative corallimorpharians. These were used in combination with sequences available for a representative range of scleractinians to identify 291 orthologous single copy protein-coding nuclear markers. Unlike the mitochondrial sequences, these nuclear markers do not display any distinct compositional bias in their nucleotide or amino-acid sequences. A range of phylogenomic approaches congruently reveal a topology consistent with scleractinian monophyly and corallimorpharians as the sister clade of scleractinians. PMID:27761308

  15. HIV Maintains an Evolving and Dispersed Population in Multiple Tissues during Suppressive Combined Antiretroviral Therapy in Individuals with Cancer

    PubMed Central

    Rose, Rebecca; Lamers, Susanna L.; Nolan, David J.; Maidji, Ekaterina; Faria, N. R.; Pybus, Oliver G.; Dollar, James J.; Maruniak, Samuel A.; McAvoy, Andrew C.; Salemi, Marco; Stoddart, Cheryl A.; Singer, Elyse J.

    2016-01-01

    ABSTRACT While combined antiretroviral therapy (cART) can result in undetectable plasma viral loads, it does not eradicate HIV infection. Furthermore, HIV-infected individuals while on cART remain at an increased risk of developing serious comorbidities, such as cancer, neurological disease, and atherosclerosis, suggesting that during cART, tissue-based HIV may contribute to such pathologies. We obtained DNA and RNA env, nef, and pol sequences using single-genome sequencing from postmortem tissues of three HIV+ cART-treated (cART+) individuals with undetectable viral load and metastatic cancer at death and performed time-scaled Bayesian evolutionary analyses. We used a sensitive in situ hybridization technique to visualize HIV gag-pol mRNA transcripts in cerebellum and lymph node tissues from one patient. Tissue-associated virus evolved at similar rates in cART+ and cART-naive (cART−) patients. Phylogenetic trees were characterized by two distinct features: (i) branching patterns consistent with constant viral evolution and dispersal among tissues and (ii) very recently derived clades containing both DNA and RNA sequences from multiple tissues. Rapid expansion of virus near death corresponded to wide-spread metastasis. HIV RNA+ cells clustered in cerebellum tissue but were dispersed in lymph node tissue, mirroring the evolutionary patterns observed for that patient. Activated, infiltrating macrophages were associated with HIV RNA. Our data provide evidence that tissues serve as a sanctuary for wild-type HIV during cART and suggest the importance of macrophages as an alternative reservoir and mechanism of virus spread. IMPORTANCE Combined antiretroviral therapy (cART) reduces plasma HIV to undetectable levels; however, removal of cART results in plasma HIV rebound, thus highlighting its inability to entirely rid the body of infection. Additionally, HIV-infected individuals on cART remain at high risk of serious diseases, which suggests a contribution from residual HIV. In this study, we isolated and sequenced HIV from postmortem tissues from three HIV+ cART+ individuals who died with metastatic cancer and had no detectable plasma viral load. Using high-resolution evolutionary analyses, we found that tissue-based HIV continues to replicate, evolve, and migrate among tissues during cART. Furthermore, cancer onset and metastasis coincided with increased HIV expansion, suggesting a linked mechanism. HIV-expressing cells were associated with tissue macrophages, a target of HIV infection. Our results suggest the importance of tissues, and macrophages in particular, as a target for novel anti-HIV therapies. PMID:27466425

  16. HIV Maintains an Evolving and Dispersed Population in Multiple Tissues during Suppressive Combined Antiretroviral Therapy in Individuals with Cancer.

    PubMed

    Rose, Rebecca; Lamers, Susanna L; Nolan, David J; Maidji, Ekaterina; Faria, N R; Pybus, Oliver G; Dollar, James J; Maruniak, Samuel A; McAvoy, Andrew C; Salemi, Marco; Stoddart, Cheryl A; Singer, Elyse J; McGrath, Michael S

    2016-10-15

    While combined antiretroviral therapy (cART) can result in undetectable plasma viral loads, it does not eradicate HIV infection. Furthermore, HIV-infected individuals while on cART remain at an increased risk of developing serious comorbidities, such as cancer, neurological disease, and atherosclerosis, suggesting that during cART, tissue-based HIV may contribute to such pathologies. We obtained DNA and RNA env, nef, and pol sequences using single-genome sequencing from postmortem tissues of three HIV(+) cART-treated (cART(+)) individuals with undetectable viral load and metastatic cancer at death and performed time-scaled Bayesian evolutionary analyses. We used a sensitive in situ hybridization technique to visualize HIV gag-pol mRNA transcripts in cerebellum and lymph node tissues from one patient. Tissue-associated virus evolved at similar rates in cART(+) and cART-naive (cART(-)) patients. Phylogenetic trees were characterized by two distinct features: (i) branching patterns consistent with constant viral evolution and dispersal among tissues and (ii) very recently derived clades containing both DNA and RNA sequences from multiple tissues. Rapid expansion of virus near death corresponded to wide-spread metastasis. HIV RNA(+) cells clustered in cerebellum tissue but were dispersed in lymph node tissue, mirroring the evolutionary patterns observed for that patient. Activated, infiltrating macrophages were associated with HIV RNA. Our data provide evidence that tissues serve as a sanctuary for wild-type HIV during cART and suggest the importance of macrophages as an alternative reservoir and mechanism of virus spread. Combined antiretroviral therapy (cART) reduces plasma HIV to undetectable levels; however, removal of cART results in plasma HIV rebound, thus highlighting its inability to entirely rid the body of infection. Additionally, HIV-infected individuals on cART remain at high risk of serious diseases, which suggests a contribution from residual HIV. In this study, we isolated and sequenced HIV from postmortem tissues from three HIV(+) cART(+) individuals who died with metastatic cancer and had no detectable plasma viral load. Using high-resolution evolutionary analyses, we found that tissue-based HIV continues to replicate, evolve, and migrate among tissues during cART. Furthermore, cancer onset and metastasis coincided with increased HIV expansion, suggesting a linked mechanism. HIV-expressing cells were associated with tissue macrophages, a target of HIV infection. Our results suggest the importance of tissues, and macrophages in particular, as a target for novel anti-HIV therapies. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  17. Phylo-VISTA: Interactive visualization of multiple DNA sequence alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.

    The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. Results: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a frameworkmore » based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. Availability: Phylo-VISTA is available at http://www-gsd.lbl. gov/phylovista. It requires an Internet browser with Java Plugin 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu« less

  18. Natural Selection in Large Populations

    NASA Astrophysics Data System (ADS)

    Desai, Michael

    2011-03-01

    I will discuss theoretical and experimental approaches to the evolutionary dynamics and population genetics of natural selection in large populations. In these populations, many mutations are often present simultaneously, and because recombination is limited, selection cannot act on them all independently. Rather, it can only affect whole combinations of mutations linked together on the same chromosome. Methods common in theoretical population genetics have been of limited utility in analyzing this coupling between the fates of different mutations. In the past few years it has become increasingly clear that this is a crucial gap in our understanding, as sequence data has begun to show that selection appears to act pervasively on many linked sites in a wide range of populations, including viruses, microbes, Drosophila, and humans. I will describe approaches that combine analytical tools drawn from statistical physics and dynamical systems with traditional methods in theoretical population genetics to address this problem, and describe how experiments in budding yeast can help us directly observe these evolutionary dynamics.

  19. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

    PubMed Central

    Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

    2011-01-01

    Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353

  20. Molecular Evolution of a Type 1 Wild-Vaccine Poliovirus Recombinant during Widespread Circulation in China

    PubMed Central

    Liu, Hong-Mei; Zheng, Du-Ping; Zhang, Li-Bi; Oberste, M. Steven; Pallansch, Mark A.; Kew, Olen M.

    2000-01-01

    Type 1 wild-vaccine recombinant polioviruses were isolated from poliomyelitis patients in China from 1991 to 1993. We compared the sequences of 34 recombinant isolates over the 1,353-nucleotide (nt) genomic interval (nt 2480 to 3832) encoding the major capsid protein, VP1, and the protease, 2A. All recombinants had a 367-nt block of sequence (nt 3271 to 3637) derived from the Sabin 1 oral poliovirus vaccine strain spanning the 3′-terminal sequences of VP1 (115 nt) and the 5′ half of 2A (252 nt). The remaining VP1 sequences were closely (up to 99.5%) related to those of a major genotype of wild type 1 poliovirus endemic to China up to 1994. In contrast, the non-vaccine-derived sequences at the 3′ half of 2A were more distantly related (<90% nucleotide sequence match) to those of other contemporary wild polioviruses from China. The vaccine-derived sequences of the earliest (April 1991) isolates completely matched those of Sabin 1. Later isolates diverged from the early isolates primarily by accumulation of synonymous base substitutions (at a rate of ∼3.7 × 10−2 substitutions per synonymous site per year) over the entire VP1-2A interval. Distinct evolutionary lineages were found in different Chinese provinces. From the combined epidemiologic and evolutionary analyses, we propose that the recombinant virus arose during mixed infection of a single individual in northern China in early 1991 and that its progeny spread by multiple independent chains of transmission into some of the most populous areas of China within a year of the initiating infection. PMID:11070012

  1. Rapid Multi-Locus Sequence Typing Using Microfluidic Biochips

    DTIC Science & Technology

    2010-05-12

    Sequence Types. The evolutionary history of all the B. cereus MLST concatenated Sequence Types (545 taxa, 2,394 nucleotide positions) was inferred using...the Neighbor-Joining method [28]. The bootstrap consensus tree inferred from 100 replicates was taken to represent the evolutionary history of the... Chlamydia (manuscript in preparation) and performed pilot studies on Staphylococcus aureus and Streptoccus pneumoniae (Data S4 and Text S2). Another potential

  2. Combining fossil and molecular data to date the diversification of New World Primates.

    PubMed

    Schrago, C G; Mello, B; Soares, A E R

    2013-11-01

    Recent methodological advances in molecular dating associated with the growing availability of sequence data have prompted the study of the evolution of New World Anthropoidea in recent years. Motivated by questions regarding historical biogeography or the mode of evolution, these works aimed to obtain a clearer scenario of Platyrrhini origins and diversification. Although some consensus was found, disputed issues, especially those relating to the evolutionary affinities of fossil taxa, remain. The use of fossil taxa for divergence time analysis is traditionally restricted to the provision of calibration priors. However, new analytical approaches have been developed that incorporate fossils as terminals and, thus, directly assign ages to the fossil tips. In this study, we conducted a combined analysis of molecular and morphological data, including fossils, to derive the timescale of New World anthropoids. Differently from previous studies that conducted total-evidence analysis of molecules and morphology, our approach investigated the morphological clock alone. Our results corroborate the hypothesis that living platyrrhines diversified in the last 20 Ma and that Miocene Patagonian fossils compose an independent evolutionary radiation that diversified in the late Oligocene. When compared to the node ages inferred from the molecular timescale, the inclusion of fossils augmented the precision of the estimates for nodes constrained by the fossil tips. We show that morphological data can be analysed using the same methodological framework applied in relaxed molecular clock studies. © 2013 The Authors. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.

  3. Endogenous Retroviruses in the Genomics Era.

    PubMed

    Johnson, Welkin E

    2015-11-01

    Endogenous retroviruses comprise millions of discrete genetic loci distributed within the genomes of extant vertebrates. These sequences, which are clearly related to exogenous retroviruses, represent retroviral infections of the deep past, and their abundance suggests that retroviruses were a near-constant presence throughout the evolutionary history of modern vertebrates. Endogenous retroviruses contribute in myriad ways to the evolution of host genomes, as mutagens and as sources of genetic novelty (both coding and regulatory) to be acted upon by the twin engines of random genetic drift and natural selection. Importantly, the richness and complexity of endogenous retrovirus data can be used to understand how viruses spread and adapt on evolutionary timescales by combining population genetics and evolutionary theory with a detailed understanding of retrovirus biology (gleaned from the study of extant retroviruses). In addition to revealing the impact of viruses on organismal evolution, such studies can help us better understand, by looking back in time, how life-history traits, as well as ecological and geological events, influence the movement of viruses within and between populations.

  4. Epigenetics of Ancient DNA.

    PubMed

    Zhenilo, S V; Sokolov, A S; Prokhortchouk, E B

    2016-01-01

    Initially, the study of DNA isolated from ancient specimens had been based on the analysis of the primary nucleotide sequence. This approach has allowed researchers to study the evolutionary changes that occur in different populations and determine the influence of the environment on genetic selection. However, the improvement of methodological approaches to genome-wide analysis has opened up new possibilities in the search for the epigenetic mechanisms involved in the regulation of gene expression. It was discovered recently that the methylation status of the regulatory elements of the HOXD cluster and MEIS 1 gene changed during human evolution. Epigenetic changes in these genes played a key role in the evolution of the limbs of modern humans. Recent works have demonstrated that it is possible to determine the transcriptional activity of genes in ancient DNA samples by combining information on DNA methylation and the DNAaseI hypersensitive sequences located at the transcription start sites of genes. In the nearest future, if a preserved fossils brain is found, it will be possible to identify the evolutionary changes in the higher nervous system associated with epigenetic differences.

  5. SCARF: maximizing next-generation EST assemblies for evolutionary and population genomic analyses.

    PubMed

    Barker, Michael S; Dlugosch, Katrina M; Reddy, A Chaitanya C; Amyotte, Sarah N; Rieseberg, Loren H

    2009-02-15

    Scaffolded and Corrected Assembly of Roche 454 (SCARF) is a next-generation sequence assembly tool for evolutionary genomics that is designed especially for assembling 454 EST sequences against high-quality reference sequences from related species. The program was created to knit together 454 contigs that do not assemble during traditional de novo assembly, using a reference sequence library to orient the 454 sequences. SCARF is freely available at http://msbarker.com/software.htm, and is released under the open source GPLv3 license (http://www.opensource.org/licenses/gpl-3.0.html.

  6. Comparison of the theoretical and real-world evolutionary potential of a genetic circuit

    NASA Astrophysics Data System (ADS)

    Razo-Mejia, M.; Boedicker, J. Q.; Jones, D.; DeLuna, A.; Kinney, J. B.; Phillips, R.

    2014-04-01

    With the development of next-generation sequencing technologies, many large scale experimental efforts aim to map genotypic variability among individuals. This natural variability in populations fuels many fundamental biological processes, ranging from evolutionary adaptation and speciation to the spread of genetic diseases and drug resistance. An interesting and important component of this variability is present within the regulatory regions of genes. As these regions evolve, accumulated mutations lead to modulation of gene expression, which may have consequences for the phenotype. A simple model system where the link between genetic variability, gene regulation and function can be studied in detail is missing. In this article we develop a model to explore how the sequence of the wild-type lac promoter dictates the fold-change in gene expression. The model combines single-base pair resolution maps of transcription factor and RNA polymerase binding energies with a comprehensive thermodynamic model of gene regulation. The model was validated by predicting and then measuring the variability of lac operon regulation in a collection of natural isolates. We then implement the model to analyze the sensitivity of the promoter sequence to the regulatory output, and predict the potential for regulation to evolve due to point mutations in the promoter region.

  7. Large-scale gene function analysis with the PANTHER classification system.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  8. Genome Size, Molecular Phylogeny, and Evolutionary History of the Tribe Aquilarieae (Thymelaeaceae), the Natural Source of Agarwood

    PubMed Central

    Farah, Azman H.; Lee, Shiou Yih; Gao, Zhihui; Yao, Tze Leong; Madon, Maria; Mohamed, Rozi

    2018-01-01

    The tribe Aquilarieae of the family Thymelaeaceae consists of two genera, Aquilaria and Gyrinops, with a total of 30 species, distributed from northeast India, through southeast Asia and the south of China, to Papua New Guinea. They are an important botanical resource for fragrant agarwood, a prized product derived from injured or infected stems of these species. The aim of this study was to estimate the genome size of selected Aquilaria species and comprehend the evolutionary history of Aquilarieae speciation through molecular phylogeny. Five non-coding chloroplast DNA regions and a nuclear region were sequenced from 12 Aquilaria and three Gyrinops species. Phylogenetic trees constructed using combined chloroplast DNA sequences revealed relationships of the studied 15 members in Aquilarieae, while nuclear ribosomal DNA internal transcribed spacer (ITS) sequences showed a paraphyletic relationship between Aquilaria species from Indochina and Malesian. We exposed, for the first time, the estimated divergence time for Aquilarieae speciation, which was speculated to happen during the Miocene Epoch. The ancestral split and biogeographic pattern of studied species were discussed. Results showed no large variation in the 2C-values for the five Aquilaria species (1.35–2.23 pg). Further investigation into the genome size may provide additional information regarding ancestral traits and its evolution history. PMID:29896211

  9. Mitogenome Sequencing in the Genus Camelus Reveals Evidence for Purifying Selection and Long-term Divergence between Wild and Domestic Bactrian Camels.

    PubMed

    Mohandesan, Elmira; Fitak, Robert R; Corander, Jukka; Yadamsuren, Adiya; Chuluunbat, Battsetseg; Abdelhadi, Omer; Raziq, Abdul; Nagy, Peter; Stalder, Gabrielle; Walzer, Chris; Faye, Bernard; Burger, Pamela A

    2017-08-30

    The genus Camelus is an interesting model to study adaptive evolution in the mitochondrial genome, as the three extant Old World camel species inhabit hot and low-altitude as well as cold and high-altitude deserts. We sequenced 24 camel mitogenomes and combined them with three previously published sequences to study the role of natural selection under different environmental pressure, and to advance our understanding of the evolutionary history of the genus Camelus. We confirmed the heterogeneity of divergence across different components of the electron transport system. Lineage-specific analysis of mitochondrial protein evolution revealed a significant effect of purifying selection in the concatenated protein-coding genes in domestic Bactrian camels. The estimated dN/dS < 1 in the concatenated protein-coding genes suggested purifying selection as driving force for shaping mitogenome diversity in camels. Additional analyses of the functional divergence in amino acid changes between species-specific lineages indicated fixed substitutions in various genes, with radical effects on the physicochemical properties of the protein products. The evolutionary time estimates revealed a divergence between domestic and wild Bactrian camels around 1.1 [0.58-1.8] million years ago (mya). This has major implications for the conservation and management of the critically endangered wild species, Camelus ferus.

  10. Sex in a test tube: testing the benefits of in vitro recombination.

    PubMed

    Pesce, Diego; Lehman, Niles; de Visser, J Arjan G M

    2016-10-19

    The origin and evolution of sex, and the associated role of recombination, present a major problem in biology. Sex typically involves recombination of closely related DNA or RNA sequences, which is fundamentally a random process that creates but also breaks up beneficial allele combinations. Directed evolution experiments, which combine in vitro mutation and recombination protocols with in vitro or in vivo selection, have proved to be an effective approach for improving functionality of nucleic acids and enzymes. As this approach allows extreme control over evolutionary conditions and parameters, it also facilitates the detection of small or position-specific recombination benefits and benefits associated with recombination between highly divergent genotypes. Yet, in vitro approaches have been largely exploratory and motivated by obtaining improved end products rather than testing hypotheses of recombination benefits. Here, we review the various experimental systems and approaches used by in vitro studies of recombination, discuss what they say about the evolutionary role of recombination, and sketch their potential for addressing extant questions about the evolutionary role of sex and recombination, in particular on complex fitness landscapes. We also review recent insights into the role of 'extracellular recombination' during the origin of life.This article is part of the themed issue 'Weird sex: the underappreciated diversity of sexual reproduction'. © 2016 The Author(s).

  11. Rapid radiation events in the family Ursidae indicated by likelihood phylogenetic estimation from multiple fragments of mtDNA.

    PubMed

    Waits, L P; Sullivan, J; O'Brien, S J; Ward, R H

    1999-10-01

    The bear family (Ursidae) presents a number of phylogenetic ambiguities as the evolutionary relationships of the six youngest members (ursine bears) are largely unresolved. Recent mitochondrial DNA analyses have produced conflicting results with respect to the phylogeny of ursine bears. In an attempt to resolve these issues, we obtained 1916 nucleotides of mitochondrial DNA sequence data from six gene segments for all eight bear species and conducted maximum likelihood and maximum parsimony analyses on all fragments separately and combined. All six single-region gene trees gave different phylogenetic estimates; however, only for control region data was this significantly incongruent with the results from the combined data. The optimal phylogeny for the combined data set suggests that the giant panda is most basal followed by the spectacled bear. The sloth bear is the basal ursine bear, and there is weak support for a sister taxon relationship of the American and Asiatic black bears. The sun bear is sister taxon to the youngest clade containing brown bears and polar bears. Statistical analyses of alternate hypotheses revealed a lack of strong support for many of the relationships. We suggest that the difficulties surrounding the resolution of the evolutionary relationships of the Ursidae are linked to the existence of sequential rapid radiation events in bear evolution. Thus, unresolved branching orders during these time periods may represent an accurate representation of the evolutionary history of bear species. Copyright 1999 Academic Press.

  12. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2008-12-01

    Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.

  13. Protein interface classification by evolutionary analysis

    PubMed Central

    2012-01-01

    Background Distinguishing biologically relevant interfaces from lattice contacts in protein crystals is a fundamental problem in structural biology. Despite efforts towards the computational prediction of interface character, many issues are still unresolved. Results We present here a protein-protein interface classifier that relies on evolutionary data to detect the biological character of interfaces. The classifier uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. Both aim at detecting differential selection pressure between interface core and rim or rest of surface. The core residues, defined as fully buried residues (>95% burial), appear to be fundamental determinants of biological interfaces: their number is in itself a powerful discriminator of interface character and together with the evolutionary measures it is able to clearly distinguish evolved biological contacts from crystal ones. We demonstrate that this definition of core residues leads to distinctively better results than earlier definitions from the literature. The stringent selection and quality filtering of structural and sequence data was key to the success of the method. Most importantly we demonstrate that a more conservative selection of homolog sequences - with relatively high sequence identities to the query - is able to produce a clearer signal than previous attempts. Conclusions An evolutionary approach like the one presented here is key to the advancement of the field, which so far was missing an effective method exploiting the evolutionary character of protein interfaces. Its coverage and performance will only improve over time thanks to the incessant growth of sequence databases. Currently our method reaches an accuracy of 89% in classifying interfaces of the Ponstingl 2003 datasets and it lends itself to a variety of useful applications in structural biology and bioinformatics. We made the corresponding software implementation available to the community as an easy-to-use graphical web interface at http://www.eppic-web.org. PMID:23259833

  14. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz

    2009-12-13

    Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.

  15. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388

  16. Complete Chloroplast Genome of the Multifunctional Crop Globe Artichoke and Comparison with Other Asteraceae

    PubMed Central

    Curci, Pasquale L.; De Paola, Domenico; Danzi, Donatella; Vendramin, Giovanni G.; Sonnante, Gabriella

    2015-01-01

    With over 20,000 species, Asteraceae is the second largest plant family. High-throughput sequencing of nuclear and chloroplast genomes has allowed for a better understanding of the evolutionary relationships within large plant families. Here, the globe artichoke chloroplast (cp) genome was obtained by a combination of whole-genome and BAC clone high-throughput sequencing. The artichoke cp genome is 152,529 bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 25,155 bp, representing the longest IRs found in the Asteraceae family so far. The large (LSC) and the small (SSC) single-copy regions span 83,578 bp and 18,641 bp, respectively. The artichoke cp sequence was compared to the other eight Asteraceae complete cp genomes available, revealing an IR expansion at the SSC/IR boundary. This expansion consists of 17 bp of the ndhF gene generating an overlap between the ndhF and ycf1 genes. A total of 127 cp simple sequence repeats (cpSSRs) were identified in the artichoke cp genome, potentially suitable for future population studies in the Cynara genus. Parsimony-informative regions were evaluated and allowed to place a Cynara species within the Asteraceae family tree. The eight most informative coding regions were also considered and tested for “specific barcode” purpose in the Asteraceae family. Our results highlight the usefulness of cp genome sequencing in exploring plant genome diversity and retrieving reliable molecular resources for phylogenetic and evolutionary studies, as well as for specific barcodes in plants. PMID:25774672

  17. Complete chloroplast genome of the multifunctional crop globe artichoke and comparison with other Asteraceae.

    PubMed

    Curci, Pasquale L; De Paola, Domenico; Danzi, Donatella; Vendramin, Giovanni G; Sonnante, Gabriella

    2015-01-01

    With over 20,000 species, Asteraceae is the second largest plant family. High-throughput sequencing of nuclear and chloroplast genomes has allowed for a better understanding of the evolutionary relationships within large plant families. Here, the globe artichoke chloroplast (cp) genome was obtained by a combination of whole-genome and BAC clone high-throughput sequencing. The artichoke cp genome is 152,529 bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 25,155 bp, representing the longest IRs found in the Asteraceae family so far. The large (LSC) and the small (SSC) single-copy regions span 83,578 bp and 18,641 bp, respectively. The artichoke cp sequence was compared to the other eight Asteraceae complete cp genomes available, revealing an IR expansion at the SSC/IR boundary. This expansion consists of 17 bp of the ndhF gene generating an overlap between the ndhF and ycf1 genes. A total of 127 cp simple sequence repeats (cpSSRs) were identified in the artichoke cp genome, potentially suitable for future population studies in the Cynara genus. Parsimony-informative regions were evaluated and allowed to place a Cynara species within the Asteraceae family tree. The eight most informative coding regions were also considered and tested for "specific barcode" purpose in the Asteraceae family. Our results highlight the usefulness of cp genome sequencing in exploring plant genome diversity and retrieving reliable molecular resources for phylogenetic and evolutionary studies, as well as for specific barcodes in plants.

  18. Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica

    PubMed Central

    2011-01-01

    Background Angiosperm mitochondrial genomes are more complex than those of other organisms. Analyses of the mitochondrial genome sequences of at least 11 angiosperm species have showed several common properties; these cannot easily explain, however, how the diverse mitotypes evolved within each genus or species. We analyzed the evolutionary relationships of Brassica mitotypes by sequencing. Results We sequenced the mitotypes of cam (Brassica rapa), ole (B. oleracea), jun (B. juncea), and car (B. carinata) and analyzed them together with two previously sequenced mitotypes of B. napus (pol and nap). The sizes of whole single circular genomes of cam, jun, ole, and car are 219,747 bp, 219,766 bp, 360,271 bp, and 232,241 bp, respectively. The mitochondrial genome of ole is largest as a resulting of the duplication of a 141.8 kb segment. The jun mitotype is the result of an inherited cam mitotype, and pol is also derived from the cam mitotype with evolutionary modifications. Genes with known functions are conserved in all mitotypes, but clear variation in open reading frames (ORFs) with unknown functions among the six mitotypes was observed. Sequence relationship analysis showed that there has been genome compaction and inheritance in the course of Brassica mitotype evolution. Conclusions We have sequenced four Brassica mitotypes, compared six Brassica mitotypes and suggested a mechanism for mitochondrial genome formation in Brassica, including evolutionary events such as inheritance, duplication, rearrangement, genome compaction, and mutation. PMID:21988783

  19. The Role of Major Gas-rich Mergers on the Evolution of Galaxies from the Blue Cloud to the Red Sequence

    NASA Astrophysics Data System (ADS)

    Guo, Rui; Hao, Cai-Na; Xia, X. Y.; Mao, Shude; Shi, Yong

    2016-07-01

    With the aim of exploring the fast evolutionary path from the blue cloud of star-forming galaxies to the red sequence of quiescent galaxies in the local universe, we select a local advanced merging infrared luminous and ultraluminous galaxy (adv-merger (U)LIRGs) sample and perform careful dust extinction corrections to investigate their positions in the star formation rate-M *, u - r, and NUV - r color-mass diagrams. The sample consists of 89 (U)LIRGs at the late merger stage, obtained from cross-correlating the Infrared Astronomical Satellite Point Source Catalog Redshift Survey and 1 Jy ULIRGs samples with the Sloan Digital Sky Survey DR7 database. Our results show that 74 % +/- 5 % of adv-merger (U)LIRGs are localized above the 1σ line of the local star-forming galaxy main sequence. We also find that all adv-merger (U)LIRGs are more massive than and as blue as the blue cloud galaxies after corrections for Galactic and internal dust extinctions, with 95 % +/- 2 % and 81 % +/- 4 % of them outside the blue cloud on the u - r and NUV - r color-mass diagrams, respectively. These results, combined with the short timescale for exhausting the molecular gas reservoir in adv-merger (U)LIRGs (3× {10}7 to 3× {10}8 years), imply that the adv-merger (U)LIRGs are likely at the starting point of the fast evolutionary track previously proposed by several groups. While the number density of adv-merger (U)LIRGs is only ˜ 0.1 % of the blue cloud star-forming galaxies in the local universe, this evolutionary track may play a more important role at high redshift.

  20. Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

    PubMed Central

    2011-01-01

    Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398

  1. Complete genome sequence and integrated protein localization and interaction map for alfalfa dwarf virus, which combines properties of both cytoplasmic and nuclear plant rhabdoviruses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bejerman, Nicolás, E-mail: n.bejerman@uq.edu.au; Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072; Giolitti, Fabián

    Summary: We have determined the full-length 14,491-nucleotide genome sequence of a new plant rhabdovirus, alfalfa dwarf virus (ADV). Seven open reading frames (ORFs) were identified in the antigenomic orientation of the negative-sense, single-stranded viral RNA, in the order 3′-N-P-P3-M-G-P6-L-5′. The ORFs are separated by conserved intergenic regions and the genome coding region is flanked by complementary 3′ leader and 5′ trailer sequences. Phylogenetic analysis of the nucleoprotein amino acid sequence indicated that this alfalfa-infecting rhabdovirus is related to viruses in the genus Cytorhabdovirus. When transiently expressed as GFP fusions in Nicotiana benthamiana leaves, most ADV proteins accumulated in the cellmore » periphery, but unexpectedly P protein was localized exclusively in the nucleus. ADV P protein was shown to have a homotypic, and heterotypic nuclear interactions with N, P3 and M proteins by bimolecular fluorescence complementation. ADV appears unique in that it combines properties of both cytoplasmic and nuclear plant rhabdoviruses. - Highlights: • The complete genome of alfalfa dwarf virus is obtained. • An integrated localization and interaction map for ADV is determined. • ADV has a genome sequence similarity and evolutionary links with cytorhabdoviruses. • ADV protein localization and interaction data show an association with the nucleus. • ADV combines properties of both cytoplasmic and nuclear plant rhabdoviruses.« less

  2. Evolution in the block: common elements of 5S rDNA organization and evolutionary patterns in distant fish genera.

    PubMed

    Campo, Daniel; García-Vázquez, Eva

    2012-01-01

    The 5S rDNA is organized in the genome as tandemly repeated copies of a structural unit composed of a coding sequence plus a nontranscribed spacer (NTS). The coding region is highly conserved in the evolution, whereas the NTS vary in both length and sequence. It has been proposed that 5S rRNA genes are members of a gene family that have arisen through concerted evolution. In this study, we describe the molecular organization and evolution of the 5S rDNA in the genera Lepidorhombus and Scophthalmus (Scophthalmidae) and compared it with already known 5S rDNA of the very different genera Merluccius (Merluccidae) and Salmo (Salmoninae), to identify common structural elements or patterns for understanding 5S rDNA evolution in fish. High intra- and interspecific diversity within the 5S rDNA family in all the genera can be explained by a combination of duplications, deletions, and transposition events. Sequence blocks with high similarity in all the 5S rDNA members across species were identified for the four studied genera, with evidences of intense gene conversion within noncoding regions. We propose a model to explain the evolution of the 5S rDNA, in which the evolutionary units are blocks of nucleotides rather than the entire sequences or single nucleotides. This model implies a "two-speed" evolution: slow within blocks (homogenized by recombination) and fast within the gene family (diversified by duplications and deletions).

  3. Graph Model of Coalescence with Recombinations

    NASA Astrophysics Data System (ADS)

    Parida, Laxmi

    One of the primary genetic events shaping an autosomal chromosome is recombination. This is a process that occurs during meiosis, in eukaryotes, that results in the offsprings having different combinations of (homologous) genes, or chromosomal segments, of the two parents. The presence of these recombination events in the evolutionary history of each chromosome complicates the genetic landscape of a population, and understanding the manifestations of these genetic exchanges in the chromosome sequences has been a subject of intense curiosity (see [Hud83, Gri99, HSW05] and citations therein).

  4. Metabolic network prediction through pairwise rational kernels.

    PubMed

    Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian

    2014-09-26

    Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.

  5. Comparative functional pan-genome analyses to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon metabolism in the genus Mycobacterium.

    PubMed

    Kweon, Ohgew; Kim, Seong-Jae; Blom, Jochen; Kim, Sung-Kwan; Kim, Bong-Soo; Baek, Dong-Heon; Park, Su Inn; Sutherland, John B; Cerniglia, Carl E

    2015-02-14

    The bacterial genus Mycobacterium is of great interest in the medical and biotechnological fields. Despite a flood of genome sequencing and functional genomics data, significant gaps in knowledge between genome and phenome seriously hinder efforts toward the treatment of mycobacterial diseases and practical biotechnological applications. In this study, we propose the use of systematic, comparative functional pan-genomic analysis to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon (PAH) metabolism in the genus Mycobacterium. Phylogenetic, phenotypic, and genomic information for 27 completely genome-sequenced mycobacteria was systematically integrated to reconstruct a mycobacterial phenotype network (MPN) with a pan-genomic concept at a network level. In the MPN, mycobacterial phenotypes show typical scale-free relationships. PAH degradation is an isolated phenotype with the lowest connection degree, consistent with phylogenetic and environmental isolation of PAH degraders. A series of functional pan-genomic analyses provide conserved and unique types of genomic evidence for strong epistatic and pleiotropic impacts on evolutionary trajectories of the PAH-degrading phenotype. Under strong natural selection, the detailed gene gain/loss patterns from horizontal gene transfer (HGT)/deletion events hypothesize a plausible evolutionary path, an epistasis-based birth and pleiotropy-dependent death, for PAH metabolism in the genus Mycobacterium. This study generated a practical mycobacterial compendium of phenotypic and genomic changes, focusing on the PAH-degrading phenotype, with a pan-genomic perspective of the evolutionary events and the environmental challenges. Our findings suggest that when selection acts on PAH metabolism, only a small fraction of possible trajectories is likely to be observed, owing mainly to a combination of the ambiguous phenotypic effects of PAHs and the corresponding pleiotropy- and epistasis-dependent evolutionary adaptation. Evolutionary constraints on the selection of trajectories, like those seen in PAH-degrading phenotypes, are likely to apply to the evolution of other phenotypes in the genus Mycobacterium.

  6. Pleistocene evolutionary history of the Clouded Apollo (Parnassius mnemosyne): genetic signatures of climate cycles and a 'time-dependent' mitochondrial substitution rate.

    PubMed

    Gratton, P; Konopiński, M K; Sbordoni, V

    2008-10-01

    Genetic data are currently providing a large amount of new information on past distribution of species and are contributing to a new vision of Pleistocene ice ages. Nonetheless, an increasing number of studies on the 'time dependency' of mutation rates suggest that date assessments for evolutionary events of the Pleistocene might be overestimated. We analysed mitochondrial (mt) DNA (COI) sequence variation in 225 Parnassius mnemosyne individuals sampled across central and eastern Europe in order to assess (i) the existence of genetic signatures of Pleistocene climate shifts; and (ii) the timescale of demographic and evolutionary events. Our analyses reveal a phylogeographical pattern markedly influenced by the Pleistocene/Holocene climate shifts. Eastern Alpine and Balkan populations display comparatively high mtDNA diversity, suggesting multiple glacial refugia. On the other hand, three widely distributed and spatially segregated lineages occupy most of northern and eastern Europe, indicating postglacial recolonization from different refugial areas. We show that a conventional 'phylogenetic' substitution rate cannot account for the present distribution of genetic variation in this species, and we combine phylogeographical pattern and palaeoecological information in order to determine a suitable intraspecific rate through a Bayesian coalescent approach. We argue that our calibrated 'time-dependent' rate (0.096 substitutions/ million years), offers the most convincing time frame for the evolutionary events inferred from sequence data. When scaled by the new rate, estimates of divergence between Balkan and Alpine lineages point to c. 19 000 years before present (last glacial maximum), and parameters of demographic expansion for northern lineages are consistent with postglacial warming (5-11 000 years before present).

  7. Comparative Analysis of Evolutionary Mechanisms of the Hemagglutinin and Three Internal Protein Genes of Influenza B Virus: Multiple Cocirculating Lineages and Frequent Reassortment of the NP, M, and NS Genes

    PubMed Central

    Lindstrom, Stephen E.; Hiromoto, Yasuaki; Nishimura, Hidekazu; Saito, Takehiko; Nerome, Reiko; Nerome, Kuniaki

    1999-01-01

    Phylogenetic profiles of the genes coding for the hemagglutinin (HA) protein, nucleoprotein (NP), matrix (M) protein, and nonstructural (NS) proteins of influenza B viruses isolated from 1940 to 1998 were analyzed in a parallel manner in order to understand the evolutionary mechanisms of these viruses. Unlike human influenza A (H3N2) viruses, the evolutionary pathways of all four genes of recent influenza B viruses revealed similar patterns of genetic divergence into two major lineages. Although evolutionary rates of the HA, NP, M, and NS genes of influenza B viruses were estimated to be generally lower than those of human influenza A viruses, genes of influenza B viruses demonstrated complex phylogenetic patterns, indicating alternative mechanisms for generation of virus variability. Topologies of the evolutionary trees of each gene were determined to be quite distinct from one another, showing that these genes were evolving in an independent manner. Furthermore, variable topologies were apparently the result of frequent genetic exchange among cocirculating epidemic viruses. Evolutionary analysis done in the present study provided further evidence for cocirculation of multiple lineages as well as sequestering and reemergence of phylogenetic lineages of the internal genes. In addition, comparison of deduced amino acid sequences revealed a novel amino acid deletion in the HA1 domain of the HA protein of recent isolates from 1998 belonging to the B/Yamagata/16/88-like lineage. It thus became apparent that, despite lower evolutionary rates, influenza B viruses were able to generate genetic diversity among circulating viruses through a combination of evolutionary mechanisms involving cocirculating lineages and genetic reassortment by which new variants with distinct gene constellations emerged. PMID:10196339

  8. Determination of evolutionary relationships of outbreak-associated Listeria monocytogenes strains of serotypes 1/2a and 1/2b by whole-genome sequencing

    USDA-ARS?s Scientific Manuscript database

    We used whole-genome sequencing to determine evolutionary relationships among 20 outbreak-associated clinical isolates of Listeria monocytogenes serotypes 1/2a and 1/2b. Isolates from 6 of 11 outbreaks fell outside the clonal groups or “epidemic clones” that have been previously associated with outb...

  9. A Case-by-Case Evolutionary Analysis of Four Imprinted Retrogenes

    PubMed Central

    McCole, Ruth B; Loughran, Noeleen B; Chahal, Mandeep; Fernandes, Luis P; Roberts, Roland G; Fraternali, Franca; O'Connell, Mary J; Oakey, Rebecca J

    2011-01-01

    Retroposition is a widespread phenomenon resulting in the generation of new genes that are initially related to a parent gene via very high coding sequence similarity. We examine the evolutionary fate of four retrogenes generated by such an event; mouse Inpp5f_v2, Mcts2, Nap1l5, and U2af1-rs1. These genes are all subject to the epigenetic phenomenon of parental imprinting. We first provide new data on the age of these retrogene insertions. Using codon-based models of sequence evolution, we show these retrogenes have diverse evolutionary trajectories, including divergence from the parent coding sequence under positive selection pressure, purifying selection pressure maintaining parent-retrogene similarity, and neutral evolution. Examination of the expression pattern of retrogenes shows an atypical, broad pattern across multiple tissues. Protein 3D structure modeling reveals that a positively selected residue in U2af1-rs1, not shared by its parent, may influence protein conformation. Our case-by-case analysis of the evolution of four imprinted retrogenes reveals that this interesting class of imprinted genes, while similar in regulation and sequence characteristics, follow very varied evolutionary paths. PMID:21166792

  10. ECOD: An Evolutionary Classification of Protein Domains

    PubMed Central

    Kinch, Lisa N.; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V.

    2014-01-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies. PMID:25474468

  11. ECOD: an evolutionary classification of protein domains.

    PubMed

    Cheng, Hua; Schaeffer, R Dustin; Liao, Yuxing; Kinch, Lisa N; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V

    2014-12-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

  12. Using single cell sequencing data to model the evolutionary history of a tumor.

    PubMed

    Kim, Kyung In; Simon, Richard

    2014-01-24

    The introduction of next-generation sequencing (NGS) technology has made it possible to detect genomic alterations within tumor cells on a large scale. However, most applications of NGS show the genetic content of mixtures of cells. Recently developed single cell sequencing technology can identify variation within a single cell. Characterization of multiple samples from a tumor using single cell sequencing can potentially provide information on the evolutionary history of that tumor. This may facilitate understanding how key mutations accumulate and evolve in lineages to form a heterogeneous tumor. We provide a computational method to infer an evolutionary mutation tree based on single cell sequencing data. Our approach differs from traditional phylogenetic tree approaches in that our mutation tree directly describes temporal order relationships among mutation sites. Our method also accommodates sequencing errors. Furthermore, we provide a method for estimating the proportion of time from the earliest mutation event of the sample to the most recent common ancestor of the sample of cells. Finally, we discuss current limitations on modeling with single cell sequencing data and possible improvements under those limitations. Inferring the temporal ordering of mutational sites using current single cell sequencing data is a challenge. Our proposed method may help elucidate relationships among key mutations and their role in tumor progression.

  13. Rapid evolution of the env gene leader sequence in cats naturally infected with feline immunodeficiency virus

    PubMed Central

    Hughes, Joseph; Biek, Roman; Litster, Annette; Willett, Brian J.; Hosie, Margaret J.

    2015-01-01

    Analysing the evolution of feline immunodeficiency virus (FIV) at the intra-host level is important in order to address whether the diversity and composition of viral quasispecies affect disease progression. We examined the intra-host diversity and the evolutionary rates of the entire env and structural fragments of the env sequences obtained from sequential blood samples in 43 naturally infected domestic cats that displayed different clinical outcomes. We observed in the majority of cats that FIV env showed very low levels of intra-host diversity. We estimated that env evolved at a rate of 1.16×10−3 substitutions per site per year and demonstrated that recombinant sequences evolved faster than non-recombinant sequences. It was evident that the V3–V5 fragment of FIV env displayed higher evolutionary rates in healthy cats than in those with terminal illness. Our study provided the first evidence that the leader sequence of env, rather than the V3–V5 sequence, had the highest intra-host diversity and the highest evolutionary rate of all env fragments, consistent with this region being under a strong selective pressure for genetic variation. Overall, FIV env displayed relatively low intra-host diversity and evolved slowly in naturally infected cats. The maximum evolutionary rate was observed in the leader sequence of env. Although genetic stability is not necessarily a prerequisite for clinical stability, the higher genetic stability of FIV compared with human immunodeficiency virus might explain why many naturally infected cats do not progress rapidly to AIDS. PMID:25535323

  14. Sequence similarities and evolutionary relationships of microbial, plant and animal alpha-amylases.

    PubMed

    Janecek, S

    1994-09-01

    Amino acid sequence comparison of 37 alpha-amylases from microbial, plant and animal sources was performed to identify their mutual sequence similarities in addition to the five already described conserved regions. These sequence regions were examined from structure/function and evolutionary perspectives. An unrooted evolutionary tree of alpha-amylases was constructed on a subset of 55 residues from the alignment of sequence similarities along with conserved regions. The most important new information extracted from the tree was as follows: (a) the close evolutionary relationship of Alteromonas haloplanctis alpha-amylase (thermolabile enzyme from an antarctic psychrotroph) with the already known group of homologous alpha-amylases from streptomycetes, Thermomonospora curvata, insects and mammals, and (b) the remarkable 40.1% identity between starch-saccharifying Bacillus subtilis alpha-amylase and the enzyme from the ruminal bacterium Butyrivibrio fibrisolvens, an alpha-amylase with an unusually large polypeptide chain (943 residues in the mature enzyme). Due to a very high degree of similarity, the whole amino acid sequences of three groups of alpha-amylases, namely (a) fungi and yeasts, (b) plants, and (c) A. haloplanctis, streptomycetes, T. curvata, insects and mammals, were aligned independently and their unrooted distance trees were calculated using these alignments. Possible rooting of the trees was also discussed. Based on the knowledge of the location of the five disulfide bonds in the structure of pig pancreatic alpha-amylase, the possible disulfide bridges were established for each of these groups of homologous alpha-amylases.

  15. Evolutionary genetics of insect innate immunity.

    PubMed

    Viljakainen, Lumi

    2015-11-01

    Patterns of evolution in immune defense genes help to understand the evolutionary dynamics between hosts and pathogens. Multiple insect genomes have been sequenced, with many of them having annotated immune genes, which paves the way for a comparative genomic analysis of insect immunity. In this review, I summarize the current state of comparative and evolutionary genomics of insect innate immune defense. The focus is on the conserved and divergent components of immunity with an emphasis on gene family evolution and evolution at the sequence level; both population genetics and molecular evolution frameworks are considered. © The Author 2015. Published by Oxford University Press.

  16. Ancient DNA from marine mammals: studying long-lived species over ecological and evolutionary timescales.

    PubMed

    Foote, Andrew D; Hofreiter, Michael; Morin, Phillip A

    2012-01-20

    Marine mammals have long generation times and broad, difficult to sample distributions, which makes inferring evolutionary and demographic changes using field studies of extant populations challenging. However, molecular analyses from sub-fossil or historical materials of marine mammals such as bone, tooth, baleen, skin, fur, whiskers and scrimshaw using ancient DNA (aDNA) approaches provide an opportunity for investigating such changes over evolutionary and ecological timescales. Here, we review the application of aDNA techniques to the study of marine mammals. Most of the studies have focused on detecting changes in genetic diversity following periods of exploitation and environmental change. To date, these studies have shown that even small sample sizes can provide useful information on historical genetic diversity. Ancient DNA has also been used in investigations of changes in distribution and range of marine mammal species; we review these studies and discuss the limitations of such 'presence only' studies. Combining aDNA data with stable isotopes can provide further insights into changes in ecology and we review past studies and suggest future potential applications. We also discuss studies reconstructing inter- and intra-specific phylogenies from aDNA sequences and discuss how aDNA sequences could be used to estimate mutation rates. Finally, we highlight some of the problems of aDNA studies on marine mammals, such as obtaining sufficient sample sizes and calibrating for the marine reservoir effect when radiocarbon-dating such wide-ranging species. Copyright © 2011 Elsevier GmbH. All rights reserved.

  17. spa Typing and Multilocus Sequence Typing Show Comparable Performance in a Macroepidemiologic Study of Staphylococcus aureus in the United States

    PubMed Central

    O'Hara, F. Patrick; Suaya, Jose A.; Ray, G. Thomas; Baxter, Roger; Brown, Megan L.; Mera, Robertino M.; Close, Nicole M.; Thomas, Elizabeth

    2016-01-01

    A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants. PMID:26669861

  18. spa Typing and Multilocus Sequence Typing Show Comparable Performance in a Macroepidemiologic Study of Staphylococcus aureus in the United States.

    PubMed

    O'Hara, F Patrick; Suaya, Jose A; Ray, G Thomas; Baxter, Roger; Brown, Megan L; Mera, Robertino M; Close, Nicole M; Thomas, Elizabeth; Amrine-Madsen, Heather

    2016-01-01

    A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants.

  19. What Is Peromyscus? Evidence from nuclear and mitochondrial DNA sequences suggests the need for a new classification

    PubMed Central

    Platt, Roy N.; Amman, Brian R.; Keith, Megan S.; Thompson, Cody W.; Bradley, Robert D.

    2015-01-01

    The evolutionary relationships between Peromyscus, Habromys, Isthmomys, Megadontomys, Neotomodon, Osgoodomys, and Podomys are poorly understood. In order to further explore the evolutionary boundaries of Peromyscus and compare potential taxonomic solutions for this diverse group and its relatives, we conducted phylogenetic analyses of DNA sequence data from alcohol dehydrogenase (Adh1-I2), beta fibrinogen (Fgb-I7), interphotoreceptor retinoid-binding protein (Rbp3), and cytochrome-b (Cytb). Phylogenetic analyses of mitochondrial and nuclear genes produced similar topologies although levels of nodal support varied. The best-supported topology was obtained by combining nuclear and mitochondrial sequences. No monophyletic Peromyscus clade was supported. Instead, support was found for a clade containing Habromys, Megadontomys, Neotomodon, Osgoodomys, Podomys, and Peromyscus suggesting paraphyly of Peromyscus and confirming previous observations. Our analyses indicated an early divergence of Isthmomys from Peromyscus (approximately 8 million years ago), whereas most other peromyscine taxa emerged within the last 6 million years. To recover a monophyletic taxonomy from Peromyscus and affiliated lineages, we detail 3 taxonomic options in which Habromys, Megadontomys, Neotomodon, Osgoodomys, and Podomys are retained as genera, subsumed as subgenera, or subsumed as species groups within Peromyscus. Each option presents distinct taxonomic challenges, and the appropriate taxonomy must reflect the substantial levels of morphological divergence that characterize this group while maintaining the monophyletic relationships obtained from genetic data. PMID:26937047

  20. What Is Peromyscus? Evidence from nuclear and mitochondrial DNA sequences suggests the need for a new classification.

    PubMed

    Platt, Roy N; Amman, Brian R; Keith, Megan S; Thompson, Cody W; Bradley, Robert D

    2015-08-03

    The evolutionary relationships between Peromyscus , Habromys , Isthmomys , Megadontomys , Neotomodon , Osgoodomys , and Podomys are poorly understood. In order to further explore the evolutionary boundaries of Peromyscus and compare potential taxonomic solutions for this diverse group and its relatives, we conducted phylogenetic analyses of DNA sequence data from alcohol dehydrogenase ( Adh 1-I2), beta fibrinogen ( Fgb -I7), interphotoreceptor retinoid-binding protein ( Rbp 3), and cytochrome- b ( Cytb ). Phylogenetic analyses of mitochondrial and nuclear genes produced similar topologies although levels of nodal support varied. The best-supported topology was obtained by combining nuclear and mitochondrial sequences. No monophyletic Peromyscus clade was supported. Instead, support was found for a clade containing Habromys , Megadontomys , Neotomodon , Osgoodomys , Podomys , and Peromyscus suggesting paraphyly of Peromyscus and confirming previous observations. Our analyses indicated an early divergence of Isthmomys from Peromyscus (approximately 8 million years ago), whereas most other peromyscine taxa emerged within the last 6 million years. To recover a monophyletic taxonomy from Peromyscus and affiliated lineages, we detail 3 taxonomic options in which Habromys , Megadontomys , Neotomodon , Osgoodomys , and Podomys are retained as genera, subsumed as subgenera, or subsumed as species groups within Peromyscus . Each option presents distinct taxonomic challenges, and the appropriate taxonomy must reflect the substantial levels of morphological divergence that characterize this group while maintaining the monophyletic relationships obtained from genetic data.

  1. Concordance and discordance of sequence survey methods for molecular epidemiology

    PubMed Central

    Hasan, Nur A.; Cebula, Thomas A.; Colwell, Rita R.; Robison, Richard A.; Johnson, W. Evan; Crandall, Keith A.

    2015-01-01

    The post-genomic era is characterized by the direct acquisition and analysis of genomic data with many applications, including the enhancement of the understanding of microbial epidemiology and pathology. However, there are a number of molecular approaches to survey pathogen diversity, and the impact of these different approaches on parameter estimation and inference are not entirely clear. We sequenced whole genomes of bacterial pathogens, Burkholderia pseudomallei, Yersinia pestis, and Brucella spp. (60 new genomes), and combined them with 55 genomes from GenBank to address how different molecular survey approaches (whole genomes, SNPs, and MLST) impact downstream inferences on molecular evolutionary parameters, evolutionary relationships, and trait character associations. We selected isolates for sequencing to represent temporal, geographic origin, and host range variability. We found that substitution rate estimates vary widely among approaches, and that SNP and genomic datasets yielded different but strongly supported phylogenies. MLST yielded poorly supported phylogenies, especially in our low diversity dataset, i.e., Y. pestis. Trait associations showed that B. pseudomallei and Y. pestis phylogenies are significantly associated with geography, irrespective of the molecular survey approach used, while Brucella spp. phylogeny appears to be strongly associated with geography and host origin. We contrast inferences made among monomorphic (clonal) and non-monomorphic bacteria, and between intra- and inter-specific datasets. We also discuss our results in light of underlying assumptions of different approaches. PMID:25737810

  2. Closed-loop optimization of chromatography column sizing strategies in biopharmaceutical manufacture.

    PubMed

    Allmendinger, Richard; Simaria, Ana S; Turner, Richard; Farid, Suzanne S

    2014-10-01

    This paper considers a real-world optimization problem involving the identification of cost-effective equipment sizing strategies for the sequence of chromatography steps employed to purify biopharmaceuticals. Tackling this problem requires solving a combinatorial optimization problem subject to multiple constraints, uncertain parameters, and time-consuming fitness evaluations. An industrially-relevant case study is used to illustrate that evolutionary algorithms can identify chromatography sizing strategies with significant improvements in performance criteria related to process cost, time and product waste over the base case. The results demonstrate also that evolutionary algorithms perform best when infeasible solutions are repaired intelligently, the population size is set appropriately, and elitism is combined with a low number of Monte Carlo trials (needed to account for uncertainty). Adopting this setup turns out to be more important for scenarios where less time is available for the purification process. Finally, a data-visualization tool is employed to illustrate how user preferences can be accounted for when it comes to selecting a sizing strategy to be implemented in a real industrial setting. This work demonstrates that closed-loop evolutionary optimization, when tuned properly and combined with a detailed manufacturing cost model, acts as a powerful decisional tool for the identification of cost-effective purification strategies. © 2013 The Authors. Journal of Chemical Technology & Biotechnology published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.

  3. Closed-loop optimization of chromatography column sizing strategies in biopharmaceutical manufacture

    PubMed Central

    Allmendinger, Richard; Simaria, Ana S; Turner, Richard; Farid, Suzanne S

    2014-01-01

    BACKGROUND This paper considers a real-world optimization problem involving the identification of cost-effective equipment sizing strategies for the sequence of chromatography steps employed to purify biopharmaceuticals. Tackling this problem requires solving a combinatorial optimization problem subject to multiple constraints, uncertain parameters, and time-consuming fitness evaluations. RESULTS An industrially-relevant case study is used to illustrate that evolutionary algorithms can identify chromatography sizing strategies with significant improvements in performance criteria related to process cost, time and product waste over the base case. The results demonstrate also that evolutionary algorithms perform best when infeasible solutions are repaired intelligently, the population size is set appropriately, and elitism is combined with a low number of Monte Carlo trials (needed to account for uncertainty). Adopting this setup turns out to be more important for scenarios where less time is available for the purification process. Finally, a data-visualization tool is employed to illustrate how user preferences can be accounted for when it comes to selecting a sizing strategy to be implemented in a real industrial setting. CONCLUSION This work demonstrates that closed-loop evolutionary optimization, when tuned properly and combined with a detailed manufacturing cost model, acts as a powerful decisional tool for the identification of cost-effective purification strategies. © 2013 The Authors. Journal of Chemical Technology & Biotechnology published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry. PMID:25506115

  4. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles.

    PubMed

    Orozco-terWengel, Pablo; Kapun, Martin; Nolte, Viola; Kofler, Robert; Flatt, Thomas; Schlötterer, Christian

    2012-10-01

    The genomic basis of adaptation to novel environments is a fundamental problem in evolutionary biology that has gained additional importance in the light of the recent global change discussion. Here, we combined laboratory natural selection (experimental evolution) in Drosophila melanogaster with genome-wide next generation sequencing of DNA pools (Pool-Seq) to identify alleles that are favourable in a novel laboratory environment and traced their trajectories during the adaptive process. Already after 15 generations, we identified a pronounced genomic response to selection, with almost 5000 single nucleotide polymorphisms (SNP; genome-wide false discovery rates < 0.005%) deviating from neutral expectation. Importantly, the evolutionary trajectories of the selected alleles were heterogeneous, with the alleles falling into two distinct classes: (i) alleles that continuously rise in frequency; and (ii) alleles that at first increase rapidly but whose frequencies then reach a plateau. Our data thus suggest that the genomic response to selection can involve a large number of selected SNPs that show unexpectedly complex evolutionary trajectories, possibly due to nonadditive effects. © 2012 Blackwell Publishing Ltd.

  5. The evolution of transcriptional regulation in eukaryotes

    NASA Technical Reports Server (NTRS)

    Wray, Gregory A.; Hahn, Matthew W.; Abouheif, Ehab; Balhoff, James P.; Pizer, Margaret; Rockman, Matthew V.; Romano, Laura A.

    2003-01-01

    Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.

  6. Evolutionary profiles from the QR factorization of multiple sequence alignments

    PubMed Central

    Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida

    2005-01-01

    We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270

  7. Tempo and mode of genomic mutations unveil human evolutionary history.

    PubMed

    Hara, Yuichiro

    2015-01-01

    Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.

  8. Functionally essential, invariant glutamate near the C-terminus of strand beta 5 in various (alpha/beta)8-barrel enzymes as a possible indicator of their evolutionary relatedness.

    PubMed

    Janecek, S; Baláz, S

    1995-08-01

    Twelve different (alpha/beta)8-barrel enzymes belonging to three structurally distinct families were found to contain, near the C-terminus of their strand beta 5, a conserved invariant glutamic acid residue that plays an important functional role in each of these enzymes. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif owing to their mutual evolutionary relatedness. For this purpose, the sequence region around the well conserved fifth beta-strand of alpha-amylase containing catalytic glutamate (Glu230, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The isolated sequence stretches of the 12 (alpha/beta)8-barrels are discussed from both the sequence-structural and the evolutionary point of view, the invariant glutamate residue being proposed to be a joining feature of the studied group of enzymes remaining from their ancestral (alpha/beta)8-barrel.

  9. OncoNEM: inferring tumor evolution from single-cell sequencing data.

    PubMed

    Ross, Edith M; Markowetz, Florian

    2016-04-15

    Single-cell sequencing promises a high-resolution view of genetic heterogeneity and clonal evolution in cancer. However, methods to infer tumor evolution from single-cell sequencing data lag behind methods developed for bulk-sequencing data. Here, we present OncoNEM, a probabilistic method for inferring intra-tumor evolutionary lineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellular subpopulations and infers their genotypes as well as a tree describing their evolutionary relationships. In simulation studies, we assess OncoNEM's robustness and benchmark its performance against competing methods. Finally, we show its applicability in case studies of muscle-invasive bladder cancer and essential thrombocythemia.

  10. Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.

    PubMed

    Sharma, S; Raina, S N

    2005-01-01

    A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.

  11. Sequence data - Magnitude and implications of some ambiguities.

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Jukes, T. H.

    1972-01-01

    A stochastic model is applied to the divergence of the horse-pig lineage from a common ansestor in terms of the alpha and beta chains of hemoglobin and fibrinopeptides. The results are compared with those based on the minimum mutation distance model of Fitch (1972). Buckwheat and cauliflower cytochrome c sequences are analyzed to demonstrate their ambiguities. A comparative analysis of evolutionary rates for various proteins of horses and pigs shows that errors of considerable magnitude are introduced by Glx and Asx ambiguities into evolutionary conclusions drawn from sequences of incompletely analyzed proteins.

  12. The evolutionary sequence: origin and emergences.

    PubMed

    Fox, S W

    1986-03-01

    The evolutionary sequence is being reexamined experimentally from a "Big Bang"origin to the protocell and from the emergence of protocell and variety of species to Darwin's mental power (mind) and society (The Descent of Man). A most fundamentally revisionary consequence of experiments is an emphasis on endogenous ordering. This principle, seen vividly in ordered copolymerization of amino acids, has had new impact on the theory of Darwinian evolution and has been found to apply to the entire sequence. Herein, I will discuss some problems of dealing with teaching controversial subjects.

  13. The evolutionary sequence: origin and emergences

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    The evolutionary sequence is being reexamined experimentally from a "Big Bang"origin to the protocell and from the emergence of protocell and variety of species to Darwin's mental power (mind) and society (The Descent of Man). A most fundamentally revisionary consequence of experiments is an emphasis on endogenous ordering. This principle, seen vividly in ordered copolymerization of amino acids, has had new impact on the theory of Darwinian evolution and has been found to apply to the entire sequence. Herein, I will discuss some problems of dealing with teaching controversial subjects.

  14. Does sex speed up evolutionary rate and increase biodiversity?

    PubMed

    Melián, Carlos J; Alonso, David; Allesina, Stefano; Condit, Richard S; Etienne, Rampal S

    2012-01-01

    Most empirical and theoretical studies have shown that sex increases the rate of evolution, although evidence of sex constraining genomic and epigenetic variation and slowing down evolution also exists. Faster rates with sex have been attributed to new gene combinations, removal of deleterious mutations, and adaptation to heterogeneous environments. Slower rates with sex have been attributed to removal of major genetic rearrangements, the cost of finding a mate, vulnerability to predation, and exposure to sexually transmitted diseases. Whether sex speeds or slows evolution, the connection between reproductive mode, the evolutionary rate, and species diversity remains largely unexplored. Here we present a spatially explicit model of ecological and evolutionary dynamics based on DNA sequence change to study the connection between mutation, speciation, and the resulting biodiversity in sexual and asexual populations. We show that faster speciation can decrease the abundance of newly formed species and thus decrease long-term biodiversity. In this way, sex can reduce diversity relative to asexual populations, because it leads to a higher rate of production of new species, but with lower abundances. Our results show that reproductive mode and the mechanisms underlying it can alter the link between mutation, evolutionary rate, speciation and biodiversity and we suggest that a high rate of evolution may not be required to yield high biodiversity.

  15. Novel Insights on Hantavirus Evolution: The Dichotomy in Evolutionary Pressures Acting on Different Hantavirus Segments.

    PubMed

    Sankar, Sathish; Upadhyay, Mohita; Ramamurthy, Mageshbabu; Vadivel, Kumaran; Sagadevan, Kalaiselvan; Nandagopal, Balaji; Vivekanandan, Perumal; Sridharan, Gopalan

    2015-01-01

    Hantaviruses are important emerging zoonotic pathogens. The current understanding of hantavirus evolution is complicated by the lack of consensus on co-divergence of hantaviruses with their animal hosts. In addition, hantaviruses have long-term associations with their reservoir hosts. Analyzing the relative abundance of dinucleotides may shed new light on hantavirus evolution. We studied the relative abundance of dinucleotides and the evolutionary pressures shaping different hantavirus segments. A total of 118 sequences were analyzed; this includes 51 sequences of the S segment, 43 sequences of the M segment and 23 sequences of the L segment. The relative abundance of dinucleotides, effective codon number (ENC), codon usage biases were analyzed. Standard methods were used to investigate the relative roles of mutational pressure and translational selection on the three hantavirus segments. All three segments of hantaviruses are CpG depleted. Mutational pressure is the predominant evolutionary force leading to CpG depletion among hantaviruses. Interestingly, the S segment of hantaviruses is GpU depleted and in contrast to CpG depletion, the depletion of GpU dinucleotides from the S segment is driven by translational selection. Our findings also suggest that mutational pressure is the primary evolutionary pressure acting on the S and the M segments of hantaviruses. While translational selection plays a key role in shaping the evolution of the L segment. Our findings highlight how different evolutionary pressures may contribute disproportionally to the evolution of the three hantavirus segments. These findings provide new insights on the current understanding of hantavirus evolution. There is a dichotomy among evolutionary pressures shaping a) the relative abundance of different dinucleotides in hantavirus genomes b) the evolution of the three hantavirus segments.

  16. Principles of regulatory information conservation between mouse and human.

    PubMed

    Cheng, Yong; Ma, Zhihai; Kim, Bong-Hyun; Wu, Weisheng; Cayting, Philip; Boyle, Alan P; Sundaram, Vasavi; Xing, Xiaoyun; Dogan, Nergiz; Li, Jingjing; Euskirchen, Ghia; Lin, Shin; Lin, Yiing; Visel, Axel; Kawli, Trupti; Yang, Xinqiong; Patacsil, Dorrelyn; Keller, Cheryl A; Giardine, Belinda; Kundaje, Anshul; Wang, Ting; Pennacchio, Len A; Weng, Zhiping; Hardison, Ross C; Snyder, Michael P

    2014-11-20

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

  17. Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.

    PubMed

    Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt

    2008-07-01

    MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.

  18. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    PubMed

    Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

    2013-01-01

    Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.

  19. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    PubMed

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast genomes in grasses is consistent with lineage effects.

  20. Rapid Sequencing of the Bamboo Mitochondrial Genome Using Illumina Technology and Parallel Episodic Evolution of Organelle Genomes in Grasses

    PubMed Central

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Background Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. Methodology/Principal Findings We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Conclusions/Significance Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast genomes in grasses is consistent with lineage effects. PMID:22272330

  1. Sequence co-evolution gives 3D contacts and structures of protein complexes

    PubMed Central

    Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S

    2014-01-01

    Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213

  2. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

    PubMed

    Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

    2017-10-03

    Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.

  3. Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population

    PubMed Central

    Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

    2016-01-01

    Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. PMID:27189991

  4. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    PubMed

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  5. Using Evolutionary Data in Developing Phylogenetic Trees: A Scaffolded Approach with Authentic Data

    ERIC Educational Resources Information Center

    Davenport, K. D.; Milks, Kirstin Jane; Van Tassell, Rebecca

    2015-01-01

    Analyzing evolutionary relationships requires that students have a thorough understanding of evidence and of how scientists use evidence to develop these relationships. In this lesson sequence, students work in groups to process many different lines of evidence of evolutionary relationships between ungulates, then construct a scientific argument…

  6. Evolution of sparsity and modularity in a model of protein allostery

    NASA Astrophysics Data System (ADS)

    Hemery, Mathieu; Rivoire, Olivier

    2015-04-01

    The sequence of a protein is not only constrained by its physical and biochemical properties under current selection, but also by features of its past evolutionary history. Understanding the extent and the form that these evolutionary constraints may take is important to interpret the information in protein sequences. To study this problem, we introduce a simple but physical model of protein evolution where selection targets allostery, the functional coupling of distal sites on protein surfaces. This model shows how the geometrical organization of couplings between amino acids within a protein structure can depend crucially on its evolutionary history. In particular, two scenarios are found to generate a spatial concentration of functional constraints: high mutation rates and fluctuating selective pressures. This second scenario offers a plausible explanation for the high tolerance of natural proteins to mutations and for the spatial organization of their least tolerant amino acids, as revealed by sequence analysis and mutagenesis experiments. It also implies a faculty to adapt to new selective pressures that is consistent with observations. The model illustrates how several independent functional modules may emerge within the same protein structure, depending on the nature of past environmental fluctuations. Our model thus relates the evolutionary history of proteins to the geometry of their functional constraints, with implications for decoding and engineering protein sequences.

  7. Rooting the archaebacterial tree: the pivotal role of Thermococcus celer in archaebacterial evolution

    NASA Technical Reports Server (NTRS)

    Achenbach-Richter, L.; Gupta, R.; Zillig, W.; Woese, C. R.

    1988-01-01

    The sequence of the 16S ribosomal RNA gene from the archaebacterium Thermococcus celer shows the organism to be related to the methanogenic archaebacteria rather than to its phenotypic counterparts, the extremely thermophilic archaebacteria. This conclusion turns on the position of the root of the archaebacterial phylogenetic tree, however. The problems encountered in rooting this tree are analyzed in detail. Under conditions that suppress evolutionary noise both the parsimony and evolutionary distance methods yield a root location (using a number of eubacterial or eukaryotic outgroup sequences) that is consistent with that determined by an "internal rooting" method, based upon an (approximate) determination of relative evolutionary rates.

  8. Statistical distribution of amino acid sequences: a proof of Darwinian evolution.

    PubMed

    Eitner, Krystian; Koch, Uwe; Gaweda, Tomasz; Marciniak, Jedrzej

    2010-12-01

    The article presents results of the listing of the quantity of amino acids, dipeptides and tripeptides for all proteins available in the UNIPROT-TREMBL database and the listing for selected species and enzymes. UNIPROT-TREMBL contains protein sequences associated with computationally generated annotations and large-scale functional characterization. Due to the distinct metabolic pathways of amino acid syntheses and their physicochemical properties, the quantities of subpeptides in proteins vary. We have proved that the distribution of amino acids, dipeptides and tripeptides is statistical which confirms that the evolutionary biodiversity development model is subject to the theory of independent events. It seems interesting that certain short peptide combinations occur relatively rarely or even not at all. First, it confirms the Darwinian theory of evolution and second, it opens up opportunities for designing pharmaceuticals among rarely represented short peptide combinations. Furthermore, an innovative approach to the mass analysis of bioinformatic data is presented. eitner@amu.edu.pl Supplementary data are available at Bioinformatics online.

  9. Combined analysis of fourteen nuclear genes refines the Ursidae phylogeny.

    PubMed

    Pagès, Marie; Calvignac, Sébastien; Klein, Catherine; Paris, Mathilde; Hughes, Sandrine; Hänni, Catherine

    2008-04-01

    Despite numerous studies, questions remain about the evolutionary history of Ursidae and additional independent genetic markers were needed to elucidate these ambiguities. For this purpose, we sequenced ten nuclear genes for all the eight extant bear species. By combining these new sequences with those of four other recently published nuclear markers, we provide new insights into the phylogenetic relationships of the Ursidae family members. The hypothesis that the giant panda was the first species to diverge among ursids is definitively confirmed and the precise branching order within the Ursus genus is clarified for the first time. Moreover, our analyses indicate that the American and the Asiatic black bears do not cluster as sister taxa, as had been previously hypothesised. Sun and sloth bears clearly appear as the most basal ursine species but uncertainties about their exact relationships remain. Since our larger dataset did not enable us to clarify this last question, identifying rare genomic changes in bear genomes could be a promising solution for further studies.

  10. Inferring the mode of origin of polyploid species from next-generation sequence data.

    PubMed

    Roux, Camille; Pannell, John R

    2015-03-01

    Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels. © 2015 John Wiley & Sons Ltd.

  11. A Systematic Bayesian Integration of Epidemiological and Genetic Data

    PubMed Central

    Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin

    2015-01-01

    Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399

  12. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    PubMed

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

    PubMed Central

    Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.

    2012-01-01

    High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309

  14. Evaluating hypotheses of basal animal phylogeny using complete sequences of large and small subunit rRNA.

    PubMed

    Medina, M; Collins, A G; Silberman, J D; Sogin, M L

    2001-08-14

    We studied the evolutionary relationships among basal metazoan lineages by using complete large subunit (LSU) and small subunit (SSU) ribosomal RNA sequences for 23 taxa. After identifying competing hypotheses, we performed maximum likelihood searches for trees conforming to each hypothesis. Kishino-Hasegawa tests were used to determine whether the data (LSU, SSU, and combined) reject any of the competing hypotheses. We also conducted unconstrained tree searches, compared the resulting topologies, and calculated bootstrap indices. Shimodaira-Hasegawa tests were applied to determine whether the data reject any of the topologies resulting from the constrained and unconstrained tree searches. LSU, SSU, and the combined data strongly contradict two assertions pertaining to sponge phylogeny. Hexactinellid sponges are not likely to be the basal lineage of a monophyletic Porifera or the sister group to all other animals. Instead, Hexactinellida and Demospongia form a well-supported clade of siliceous sponges, Silicea. It remains unclear, on the basis of these data alone, whether the calcarean sponges are more closely related to Silicea or to nonsponge animals. The SSU and combined data reject the hypothesis that Bilateria is more closely related to Ctenophora than it is to Cnidaria, whereas LSU data alone do not refute either hypothesis. LSU and SSU data agree in supporting the monophyly of Bilateria, Cnidaria, Ctenophora, and Metazoa. LSU sequence data reveal phylogenetic structure in a data set with limited taxon sampling. Continued accumulation of LSU sequences should increase our understanding of animal phylogeny.

  15. The Complete Chloroplast Genome Sequences of Five Epimedium Species: Lights into Phylogenetic and Taxonomic Analyses

    PubMed Central

    Zhang, Yanjun; Du, Liuwen; Liu, Ao; Chen, Jianjun; Wu, Li; Hu, Weiming; Zhang, Wei; Kim, Kyunghee; Lee, Sang-Choon; Yang, Tae-Jin; Wang, Ying

    2016-01-01

    Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp) genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR) region and the single-copy (SC) boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR) and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants. PMID:27014326

  16. An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

    PubMed

    Tørresen, Ole K; Star, Bastiaan; Jentoft, Sissel; Reinar, William B; Grove, Harald; Miller, Jason R; Walenz, Brian P; Knight, James; Ekholm, Jenny M; Peluso, Paul; Edvardsen, Rolf B; Tooming-Klunderud, Ave; Skage, Morten; Lien, Sigbjørn; Jakobsen, Kjetill S; Nederbragt, Alexander J

    2017-01-18

    The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

  17. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  18. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

    PubMed Central

    2012-01-01

    Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net. PMID:22568821

  19. Identification of an ancient endogenous retrovirus, predating the divergence of the placental mammals.

    PubMed

    Lee, Adam; Nolan, Alison; Watson, Jason; Tristem, Michael

    2013-09-19

    The evolutionary arms race between mammals and retroviruses has long been recognized as one of the oldest host-parasite interactions. Rapid evolution rates in exogenous retroviruses have often made accurate viral age estimations highly problematic. Endogenous retroviruses (ERVs), however, integrate into the germline of their hosts, and are subjected to their evolutionary rates. This study describes, for the first time, a retroviral orthologue predating the divergence of placental mammals, giving it a minimum age of 104-110 Myr. Simultaneously, other orthologous selfish genetic elements (SGEs), inserted into the ERV sequence, provide evidence for the oldest individual mammalian-wide interspersed repeat and medium-reiteration frequency interspersed repeat mammalian repeats, with the same minimum age. The combined use of shared SGEs and reconstruction of viral orthologies defines new limits and increases maximum 'lookback' times, with subsequent implications for the field of paleovirology.

  20. Phylogenetic relationships in Cortinarius, section Calochroi, inferred from nuclear DNA sequences

    PubMed Central

    Garnica, Sigisfredo; Weiß, Michael; Oertel, Bernhard; Ammirati, Joseph; Oberwinkler, Franz

    2009-01-01

    Background Section Calochroi is one of the most species-rich lineages in the genus Cortinarius (Agaricales, Basidiomycota) and is widely distributed across boreo-nemoral areas, with some extensions into meridional zones. Previous phylogenetic studies of Calochroi (incl. section Fulvi) have been geographically restricted; therefore, phylogenetic and biogeographic relationships within this lineage at a global scale have been largely unknown. In this study, we obtained DNA sequences from a nearly complete taxon sampling of known species from Europe, Central America and North America. We inferred intra- and interspecific phylogenetic relationships as well as major morphological evolutionary trends within section Calochroi based on 576 ITS sequences, 230 ITS + 5.8S + D1/D2 sequences, and a combined dataset of ITS + 5.8S + D1/D2 and RPB1 sequences of a representative subsampling of 58 species. Results More than 100 species were identified by integrating DNA sequences with morphological, macrochemical and ecological data. Cortinarius section Calochroi was consistently resolved with high branch support into at least seven major lineages: Calochroi, Caroviolacei, Dibaphi, Elegantiores, Napi, Pseudoglaucopodes and Splendentes; whereas Rufoolivacei and Sulfurini appeared polyphyletic. A close relationship between Dibaphi, Elegantiores, Napi and Splendentes was consistently supported. Combinations of specific morphological, pigmentation and molecular characters appear useful in circumscribing clades. Conclusion Our analyses demonstrate that Calochroi is an exclusively northern hemispheric lineage, where species follow their host trees throughout their natural ranges within and across continents. Results of this study contribute substantially to defining European species in this group and will help to either identify or to name new species occurring across the northern hemisphere. Major groupings are in partial agreement with earlier morphology-based and molecular phylogenetic hypotheses, but some relationships were unexpected, based on external morphology. In such cases, their true affinities appear to have been obscured by the repeated appearance of similar features among distantly related species. Therefore, further taxonomic studies are needed to evaluate the consistency of species concepts and interpretations of morphological features in a more global context. Reconstruction of ancestral states yielded two major evolutionary trends within section Calochroi: (1) the development of bright pigments evolved independently multiple times, and (2) the evolution of abruptly marginate to flattened stipe bulbs represents an autapomorphy of the Calochroi clade. PMID:19121213

  1. Evolutionary characterization of the West Nile Virus complete genome.

    PubMed

    Gray, R R; Veras, N M C; Santos, L A; Salemi, M

    2010-07-01

    The spatial dynamics of the West Nile Virus epidemic in North America are largely unknown. Previous studies that investigated the evolutionary history of the virus used sequence data from the structural genes (prM and E); however, these regions may lack phylogenetic information and obscure true evolutionary relationships. This study systematically evaluated the evolutionary patterns in the eleven genes of the WNV genome in order to determine which region(s) were most phylogenetically informative. We found that while the E region lacks resolution and can potentially result in misleading conclusions, the full NS3 or NS5 regions have strong phylogenetic signal. Furthermore, we show that geographic structure of WNV infection within the US is more pronounced than previously reported in studies that used the structural genes. We conclude that future evolutionary studies should focus on NS3 and NS5 in order to maximize the available sequences while retaining maximal interpretative power to infer temporal and geographic trends among WNV strains. Copyright 2010 Elsevier Inc. All rights reserved.

  2. The impact of age, biogenesis, and genomic clustering on Drosophila microRNA evolution

    PubMed Central

    Mohammed, Jaaved; Flynt, Alex S.; Siepel, Adam; Lai, Eric C.

    2013-01-01

    The molecular evolutionary signatures of miRNAs inform our understanding of their emergence, biogenesis, and function. The known signatures of miRNA evolution have derived mostly from the analysis of deeply conserved, canonical loci. In this study, we examine the impact of age, biogenesis pathway, and genomic arrangement on the evolutionary properties of Drosophila miRNAs. Crucial to the accuracy of our results was our curation of high-quality miRNA alignments, which included nearly 150 corrections to ortholog calls and nucleotide sequences of the global 12-way Drosophilid alignments currently available. Using these data, we studied primary sequence conservation, normalized free-energy values, and types of structure-preserving substitutions. We expand upon common miRNA evolutionary patterns that reflect fundamental features of miRNAs that are under functional selection. We observe that melanogaster-subgroup-specific miRNAs, although recently emerged and rapidly evolving, nonetheless exhibit evolutionary signatures that are similar to well-conserved miRNAs and distinct from other structured noncoding RNAs and bulk conserved non-miRNA hairpins. This provides evidence that even young miRNAs may be selected for regulatory activities. More strikingly, we observe that mirtrons and clustered miRNAs both exhibit distinct evolutionary properties relative to solo, well-conserved miRNAs, even after controlling for sequence depth. These studies highlight the previously unappreciated impact of biogenesis strategy and genomic location on the evolutionary dynamics of miRNAs, and affirm that miRNAs do not evolve as a unitary class. PMID:23882112

  3. Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

    PubMed Central

    Dröge, J.; Gregor, I.; McHardy, A. C.

    2015-01-01

    Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150

  4. Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

    PubMed

    Oliani, L C; Lidani, K C F; Gabriel, J E

    2015-10-16

    MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways.

  5. A Phylogenomic Approach Based on PCR Target Enrichment and High Throughput Sequencing: Resolving the Diversity within the South American Species of Bartsia L. (Orobanchaceae)

    PubMed Central

    Tank, David C.

    2016-01-01

    Advances in high-throughput sequencing (HTS) have allowed researchers to obtain large amounts of biological sequence information at speeds and costs unimaginable only a decade ago. Phylogenetics, and the study of evolution in general, is quickly migrating towards using HTS to generate larger and more complex molecular datasets. In this paper, we present a method that utilizes microfluidic PCR and HTS to generate large amounts of sequence data suitable for phylogenetic analyses. The approach uses the Fluidigm Access Array System (Fluidigm, San Francisco, CA, USA) and two sets of PCR primers to simultaneously amplify 48 target regions across 48 samples, incorporating sample-specific barcodes and HTS adapters (2,304 unique amplicons per Access Array). The final product is a pooled set of amplicons ready to be sequenced, and thus, there is no need to construct separate, costly genomic libraries for each sample. Further, we present a bioinformatics pipeline to process the raw HTS reads to either generate consensus sequences (with or without ambiguities) for every locus in every sample or—more importantly—recover the separate alleles from heterozygous target regions in each sample. This is important because it adds allelic information that is well suited for coalescent-based phylogenetic analyses that are becoming very common in conservation and evolutionary biology. To test our approach and bioinformatics pipeline, we sequenced 576 samples across 96 target regions belonging to the South American clade of the genus Bartsia L. in the plant family Orobanchaceae. After sequencing cleanup and alignment, the experiment resulted in ~25,300bp across 486 samples for a set of 48 primer pairs targeting the plastome, and ~13,500bp for 363 samples for a set of primers targeting regions in the nuclear genome. Finally, we constructed a combined concatenated matrix from all 96 primer combinations, resulting in a combined aligned length of ~40,500bp for 349 samples. PMID:26828929

  6. MultiSeq: unifying sequence and structure data for evolutionary analysis

    PubMed Central

    Roberts, Elijah; Eargle, John; Wright, Dan; Luthey-Schulten, Zaida

    2006-01-01

    Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: PMID:16914055

  7. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  8. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  9. Genomic investigations of evolutionary dynamics and epistasis in microbial evolution experiments.

    PubMed

    Jerison, Elizabeth R; Desai, Michael M

    2015-12-01

    Microbial evolution experiments enable us to watch adaptation in real time, and to quantify the repeatability and predictability of evolution by comparing identical replicate populations. Further, we can resurrect ancestral types to examine changes over evolutionary time. Until recently, experimental evolution has been limited to measuring phenotypic changes, or to tracking a few genetic markers over time. However, recent advances in sequencing technology now make it possible to extensively sequence clones or whole-population samples from microbial evolution experiments. Here, we review recent work exploiting these techniques to understand the genomic basis of evolutionary change in experimental systems. We first focus on studies that analyze the dynamics of genome evolution in microbial systems. We then survey work that uses observations of sequence evolution to infer aspects of the underlying fitness landscape, concentrating on the epistatic interactions between mutations and the constraints these interactions impose on adaptation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  11. Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies.

    PubMed

    Spielman, Stephanie J; Wilke, Claus O

    2015-01-01

    We introduce Pyvolve, a flexible Python module for simulating genetic data along a phylogeny using continuous-time Markov models of sequence evolution. Easily incorporated into Python bioinformatics pipelines, Pyvolve can simulate sequences according to most standard models of nucleotide, amino-acid, and codon sequence evolution. All model parameters are fully customizable. Users can additionally specify custom evolutionary models, with custom rate matrices and/or states to evolve. This flexibility makes Pyvolve a convenient framework not only for simulating sequences under a wide variety of conditions, but also for developing and testing new evolutionary models. Pyvolve is an open-source project under a FreeBSD license, and it is available for download, along with a detailed user-manual and example scripts, from http://github.com/sjspielman/pyvolve.

  12. The development of the red giant branch. I - Theoretical evolutionary sequences

    NASA Technical Reports Server (NTRS)

    Sweigart, Allen V.; Greggio, Laura; Renzini, Alvio

    1989-01-01

    A grid of 100 evolutionary sequences extending from the zero-age main sequence to the onset of helium burning has been computed for stellar masses between 1.4 and 3.4 solar masses, helium abundances of 0.20 and 0.30, and heavy-element abundances of 0.004, 0.01, and 0.04. Using these computations the transition in the morphology of the red giant branch (RGB) between low-mass stars, which have an extended and luminous first RGB phase prior to helium ignition, and intermediate-mass stars, which do not, is investigated. Extensive tabulations of the numerical results are provided to aid in applying these sequences. The effects of the first dredge-up on the surface helium and CNO abundances of the sequences is discussed.

  13. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation.

    PubMed

    Schlötterer, C; Kofler, R; Versace, E; Tobler, R; Franssen, S U

    2015-05-01

    Evolve and resequence (E&R) is a new approach to investigate the genomic responses to selection during experimental evolution. By using whole genome sequencing of pools of individuals (Pool-Seq), this method can identify selected variants in controlled and replicable experimental settings. Reviewing the current state of the field, we show that E&R can be powerful enough to identify causative genes and possibly even single-nucleotide polymorphisms. We also discuss how the experimental design and the complexity of the trait could result in a large number of false positive candidates. We suggest experimental and analytical strategies to maximize the power of E&R to uncover the genotype-phenotype link and serve as an important research tool for a broad range of evolutionary questions.

  14. Advances in understanding tumour evolution through single-cell sequencing.

    PubMed

    Kuipers, Jack; Jahn, Katharina; Beerenwinkel, Niko

    2017-04-01

    The mutational heterogeneity observed within tumours poses additional challenges to the development of effective cancer treatments. A thorough understanding of a tumour's subclonal composition and its mutational history is essential to open up the design of treatments tailored to individual patients. Comparative studies on a large number of tumours permit the identification of mutational patterns which may refine forecasts of cancer progression, response to treatment and metastatic potential. The composition of tumours is shaped by evolutionary processes. Recent advances in next-generation sequencing offer the possibility to analyse the evolutionary history and accompanying heterogeneity of tumours at an unprecedented resolution, by sequencing single cells. New computational challenges arise when moving from bulk to single-cell sequencing data, leading to the development of novel modelling frameworks. In this review, we present the state of the art methods for understanding the phylogeny encoded in bulk or single-cell sequencing data, and highlight future directions for developing more comprehensive and informative pictures of tumour evolution. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  15. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    PubMed Central

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  16. Emerging Concepts of Data Integration in Pathogen Phylodynamics.

    PubMed

    Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.

  17. Evolution of microbes and viruses: a paradigm shift in evolutionary biology?

    PubMed Central

    Koonin, Eugene V.; Wolf, Yuri I.

    2012-01-01

    When Charles Darwin formulated the central principles of evolutionary biology in the Origin of Species in 1859 and the architects of the Modern Synthesis integrated these principles with population genetics almost a century later, the principal if not the sole objects of evolutionary biology were multicellular eukaryotes, primarily animals and plants. Before the advent of efficient gene sequencing, all attempts to extend evolutionary studies to bacteria have been futile. Sequencing of the rRNA genes in thousands of microbes allowed the construction of the three- domain “ribosomal Tree of Life” that was widely thought to have resolved the evolutionary relationships between the cellular life forms. However, subsequent massive sequencing of numerous, complete microbial genomes revealed novel evolutionary phenomena, the most fundamental of these being: (1) pervasive horizontal gene transfer (HGT), in large part mediated by viruses and plasmids, that shapes the genomes of archaea and bacteria and call for a radical revision (if not abandonment) of the Tree of Life concept, (2) Lamarckian-type inheritance that appears to be critical for antivirus defense and other forms of adaptation in prokaryotes, and (3) evolution of evolvability, i.e., dedicated mechanisms for evolution such as vehicles for HGT and stress-induced mutagenesis systems. In the non-cellular part of the microbial world, phylogenomics and metagenomics of viruses and related selfish genetic elements revealed enormous genetic and molecular diversity and extremely high abundance of viruses that come across as the dominant biological entities on earth. Furthermore, the perennial arms race between viruses and their hosts is one of the defining factors of evolution. Thus, microbial phylogenomics adds new dimensions to the fundamental picture of evolution even as the principle of descent with modification discovered by Darwin and the laws of population genetics remain at the core of evolutionary biology. PMID:22993722

  18. Emerging Concepts of Data Integration in Pathogen Phylodynamics

    PubMed Central

    Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504

  19. Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information.

    PubMed

    An, Ji-Yong; Zhang, Lei; Zhou, Yong; Zhao, Yu-Jun; Wang, Da-Fu

    2017-08-18

    Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .

  20. Testing Models of Stellar Structure and Evolution I. Comparison with Detached Eclipsing Binaries

    NASA Astrophysics Data System (ADS)

    del Burgo, C.; Allende Prieto, C.

    2018-05-01

    We present the results of an analysis aimed at testing the accuracy and precision of the PARSEC v1.2S library of stellar evolution models, combined with a Bayesian approach, to infer stellar parameters. We mainly employ the online DEBCat catalogue by Southworth, a compilation of detached eclipsing binary systems with published measurements of masses and radii to ˜ 2 per cent precision. We select a sample of 318 binary components, with masses between 0.10 and 14.5 solar units, and distances between 1.3 pc and ˜ 8 kpc for Galactic objects and ˜ 44-68 kpc for the extragalactic ones. The Bayesian analysis applied takes on input effective temperature, radius, and [Fe/H], and their uncertainties, returning theoretical predictions for other stellar parameters. From the comparison with dynamical masses, we conclude inferred masses are precisely derived for stars on the main-sequence and in the core-helium-burning phase, with respective uncertainties of 4 per cent and 7 per cent, on average. Subgiants and red giants masses are predicted within 14 per cent, and early asymptotic giant branch stars within 24 per cent. These results are helpful to further improve the models, in particular for advanced evolutionary stages for which our understanding is limited. We obtain distances and ages for the binary systems and compare them, whenever possible, with precise literature estimates, finding excellent agreement. We discuss evolutionary effects and the challenges associated with the inference of stellar ages from evolutionary models. We also provide useful polynomial fittings to theoretical zero-age main-sequence relations.

  1. Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches

    PubMed Central

    Krobath, Heinrich; Chan, Hue Sun

    2016-01-01

    Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein’s folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit-chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches. PMID:27253392

  2. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

    PubMed

    Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

    2012-05-01

    Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.

  3. Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis

    PubMed Central

    Aslam, Luqman; Beal, Kathryn; Ann Blomberg, Le; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian (Jake); Van Tassell, Curtis P.; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.

    2010-01-01

    A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest. PMID:20838655

  4. Genome Diversity and Evolution in the Budding Yeasts (Saccharomycotina)

    PubMed Central

    Dujon, Bernard A.; Louis, Edward J.

    2017-01-01

    Considerable progress in our understanding of yeast genomes and their evolution has been made over the last decade with the sequencing, analysis, and comparisons of numerous species, strains, or isolates of diverse origins. The role played by yeasts in natural environments as well as in artificial manufactures, combined with the importance of some species as model experimental systems sustained this effort. At the same time, their enormous evolutionary diversity (there are yeast species in every subphylum of Dikarya) sparked curiosity but necessitated further efforts to obtain appropriate reference genomes. Today, yeast genomes have been very informative about basic mechanisms of evolution, speciation, hybridization, domestication, as well as about the molecular machineries underlying them. They are also irreplaceable to investigate in detail the complex relationship between genotypes and phenotypes with both theoretical and practical implications. This review examines these questions at two distinct levels offered by the broad evolutionary range of yeasts: inside the best-studied Saccharomyces species complex, and across the entire and diversified subphylum of Saccharomycotina. While obviously revealing evolutionary histories at different scales, data converge to a remarkably coherent picture in which one can estimate the relative importance of intrinsic genome dynamics, including gene birth and loss, vs. horizontal genetic accidents in the making of populations. The facility with which novel yeast genomes can now be studied, combined with the already numerous available reference genomes, offer privileged perspectives to further examine these fundamental biological questions using yeasts both as eukaryotic models and as fungi of practical importance. PMID:28592505

  5. A Natural View of Microbial Biodiversity within Hot Spring Cyanobacterial Mat Communities

    PubMed Central

    Ward, David M.; Ferris, Michael J.; Nold, Stephen C.; Bateson, Mary M.

    1998-01-01

    This review summarizes a decade of research in which we have used molecular methods, in conjunction with more traditional approaches, to study hot spring cyanobacterial mats as models for understanding principles of microbial community ecology. Molecular methods reveal that the composition of these communities is grossly oversimplified by microscopic and cultivation methods. For example, none of 31 unique 16S rRNA sequences detected in the Octopus Spring mat, Yellowstone National Park, matches that of any prokaryote previously cultivated from geothermal systems; 11 are contributed by genetically diverse cyanobacteria, even though a single cyanobacterial species was suspected based on morphologic and culture analysis. By studying the basis for the incongruity between culture and molecular samplings of community composition, we are beginning to cultivate isolates whose 16S rRNA sequences are readily detected. By placing the genetic diversity detected in context with the well-defined natural environmental gradients typical of hot spring mat systems, the relationship between gene and species diversity is clarified and ecological patterns of species occurrence emerge. By combining these ecological patterns with the evolutionary patterns inherently revealed by phylogenetic analysis of gene sequence data, we find that it may be possible to understand microbial biodiversity within these systems by using principles similar to those developed by evolutionary ecologists to understand biodiversity of larger species. We hope that such an approach guides microbial ecologists to a more realistic and predictive understanding of microbial species occurrence and responsiveness in both natural and disturbed habitats. PMID:9841675

  6. A natural view of microbial biodiversity within hot spring cyanobacterial mat communities

    NASA Technical Reports Server (NTRS)

    Ward, D. M.; Ferris, M. J.; Nold, S. C.; Bateson, M. M.

    1998-01-01

    This review summarizes a decade of research in which we have used molecular methods, in conjunction with more traditional approaches, to study hot spring cyanobacterial mats as models for understanding principles of microbial community ecology. Molecular methods reveal that the composition of these communities is grossly oversimplified by microscopic and cultivation methods. For example, none of 31 unique 16S rRNA sequences detected in the Octopus Spring mat, Yellowstone National Park, matches that of any prokaryote previously cultivated from geothermal systems; 11 are contributed by genetically diverse cyanobacteria, even though a single cyanobacterial species was suspected based on morphologic and culture analysis. By studying the basis for the incongruity between culture and molecular samplings of community composition, we are beginning to cultivate isolates whose 16S rRNA sequences are readily detected. By placing the genetic diversity detected in context with the well-defined natural environmental gradients typical of hot spring mat systems, the relationship between gene and species diversity is clarified and ecological patterns of species occurrence emerge. By combining these ecological patterns with the evolutionary patterns inherently revealed by phylogenetic analysis of gene sequence data, we find that it may be possible to understand microbial biodiversity within these systems by using principles similar to those developed by evolutionary ecologists to understand biodiversity of larger species. We hope that such an approach guides microbial ecologists to a more realistic and predictive understanding of microbial species occurrence and responsiveness in both natural and disturbed habitats.

  7. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  8. Expression Differentiation Is Constrained to Low-Expression Proteins over Ecological Timescales

    PubMed Central

    Margres, Mark J.; Wray, Kenneth P.; Seavy, Margaret; McGivern, James J.; Herrera, Nathanael D.; Rokyta, Darin R.

    2016-01-01

    Protein expression level is one of the strongest predictors of protein sequence evolutionary rate, with high-expression protein sequences evolving at slower rates than low-expression protein sequences largely because of constraints on protein folding and function. Expression evolutionary rates also have been shown to be negatively correlated with expression level across human and mouse orthologs over relatively long divergence times (i.e., ∼100 million years). Long-term evolutionary patterns, however, often cannot be extrapolated to microevolutionary processes (and vice versa), and whether this relationship holds for traits evolving under directional selection within a single species over ecological timescales (i.e., <5000 years) is unknown and not necessarily expected. Expression is a metabolically costly process, and the expression level of a particular protein is predicted to be a tradeoff between the benefit of its function and the costs of its expression. Selection should drive the expression level of all proteins close to values that maximize fitness, particularly for high-expression proteins because of the increased energetic cost of production. Therefore, stabilizing selection may reduce the amount of standing expression variation for high-expression proteins, and in combination with physiological constraints that may place an upper bound on the range of beneficial expression variation, these constraints could severely limit the availability of beneficial expression variants. To determine whether rapid-expression evolution was restricted to low-expression proteins owing to these constraints on highly expressed proteins over ecological timescales, we compared venom protein expression levels across mainland and island populations for three species of pit vipers. We detected significant differentiation in protein expression levels in two of the three species and found that rapid-expression differentiation was restricted to low-expression proteins. Our results suggest that various constraints on high-expression proteins reduce the availability of beneficial expression variants relative to low-expression proteins, enabling low-expression proteins to evolve and potentially lead to more rapid adaptation. PMID:26546003

  9. Lineage-specific evolution of the vertebrate Otopetrin gene family revealed by comparative genomic analyses

    PubMed Central

    2011-01-01

    Background Mutations in the Otopetrin 1 gene (Otop1) in mice and fish produce an unusual bilateral vestibular pathology that involves the absence of otoconia without hearing impairment. The encoded protein, Otop1, is the only functionally characterized member of the Otopetrin Domain Protein (ODP) family; the extended sequence and structural preservation of ODP proteins in metazoans suggest a conserved functional role. Here, we use the tools of sequence- and cytogenetic-based comparative genomics to study the Otop1 and the Otop2-Otop3 genes and to establish their genomic context in 25 vertebrates. We extend our evolutionary study to include the gene mutated in Usher syndrome (USH) subtype 1G (Ush1g), both because of the head-to-tail clustering of Ush1g with Otop2 and because Otop1 and Ush1g mutations result in inner ear phenotypes. Results We established that OTOP1 is the boundary gene of an inversion polymorphism on human chromosome 4p16 that originated in the common human-chimpanzee lineage more than 6 million years ago. Other lineage-specific evolutionary events included a three-fold expansion of the Otop genes in Xenopus tropicalis and of Ush1g in teleostei fish. The tight physical linkage between Otop2 and Ush1g is conserved in all vertebrates. To further understand the functional organization of the Ushg1-Otop2 locus, we deduced a putative map of binding sites for CCCTC-binding factor (CTCF), a mammalian insulator transcription factor, from genome-wide chromatin immunoprecipitation-sequencing (ChIP-seq) data in mouse and human embryonic stem (ES) cells combined with detection of CTCF-binding motifs. Conclusions The results presented here clarify the evolutionary history of the vertebrate Otop and Ush1g families, and establish a framework for studying the possible interaction(s) of Ush1g and Otop in developmental pathways. PMID:21261979

  10. Assessing fluctuating evolutionary pressure in yeast and mammal evolutionary rate covariation using bioinformatics of meiotic protein genetic sequences

    NASA Astrophysics Data System (ADS)

    Dehipawala, Sunil; Nguyen, A.; Tremberger, G.; Cheung, E.; Holden, T.; Lieberman, D.; Cheung, T.

    2013-09-01

    The evolutionary rate co-variation in meiotic proteins has been reported for yeast and mammal using phylogenic branch lengths which assess retention, duplication and mutation. The bioinformatics of the corresponding DNA sequences could be classified as a diagram of fractal dimension and Shannon entropy. Results from biomedical gene research provide examples on the diagram methodology. The identification of adaptive selection using entropy marker and functional-structural diversity using fractal dimension would support a regression analysis where the coefficient of determination would serve as evolutionary pathway marker for DNA sequences and be an important component in the astrobiology community. Comparisons between biomedical genes such as EEF2 (elongation factor 2 human, mouse, etc), WDR85 in epigenetics, HAR1 in human specificity, clinical trial targeted cancer gene CD47, SIRT6 in spermatogenesis, and HLA-C in mosquito bite immunology demonstrate the diagram classification methodology. Comparisons to the SEPT4-XIAP pair in stem cell apoptosis, testesexpressed taste genes TAS1R3-GNAT3 pair, and amyloid beta APLP1-APLP2 pair with the yeast-mammal DNA sequences for meiotic proteins RAD50-MRE11 pair and NCAPD2-ICK pair have accounted for the observed fluctuating evolutionary pressure systematically. Regression with high R-sq values or a triangular-like cluster pattern for concordant pairs in co-variation among the studied species could serve as evidences for the possible location of common ancestors in the entropy-fractal dimension diagram, consistent with an example of the human-chimp common ancestor study using the FOXP2 regulated genes reported in human fetal brain study. The Deinococcus radiodurans R1 Rad-A could be viewed as an outlier in the RAD50 diagram and also in the free energy versus fractal dimension regression Cook's distance, consistent with a non-Earth source for this radiation resistant bacterium. Convergent and divergent fluctuating evolutionary pressure could be studied with extension to genetic sequences in organisms in possible astrobiology conditions, with the assumption that the continuation of a book of life would require meiotic proteins everywhere in the universe.

  11. Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

    PubMed

    Galián, José A; Rosato, Marcela; Rosselló, Josep A

    2014-03-01

    Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.

  12. Microsatellites for Carpotroche brasiliensis (Flacourtiaceae), a useful species for agroforestry and ecosystem conservation.

    PubMed

    Bittencourt, Flora; Alves, Jackeline S; Gaiotto, Fernanda A

    2015-12-01

    We developed microsatellite markers for Carpotroche brasiliensis (Flacourtiaceae), a dioecious tree that is used as a food resource by midsize animals of the Brazilian fauna. We designed 30 primer pairs using next-generation sequencing and classified 25 pairs as polymorphic. Observed heterozygosity ranged from 0.5 to 1.0, and expected heterozygosity ranged from 0.418 to 0.907. The combined probability of exclusion was greater than 0.999 and the combined probability of identity was less than 0.001, indicating that these microsatellites are appropriate for investigations of genetic structure, individual identification, and paternity testing. The developed molecular tools may contribute to future studies of population genetics, answering ecological and evolutionary questions regarding efficient conservation strategies for C. brasiliensis.

  13. Evolutionary genomics and HIV restriction factors.

    PubMed

    Pyndiah, Nitisha; Telenti, Amalio; Rausell, Antonio

    2015-03-01

    To provide updated insights into innate antiviral immunity and highlight prototypical evolutionary features of well characterized HIV restriction factors. Recently, a new HIV restriction factor, Myxovirus resistance 2, has been discovered and the region/residue responsible for its activity identified using an evolutionary approach. Furthermore, IFI16, an innate immunity protein known to sense several viruses, has been shown to contribute to the defense to HIV-1 by causing cell death upon sensing HIV-1 DNA. Restriction factors against HIV show characteristic signatures of positive selection. Different patterns of accelerated sequence evolution can distinguish antiviral strategies--offense or defence--as well as the level of specificity of the antiviral properties. Sequence analysis of primate orthologs of restriction factors serves to localize functional domains and sites responsible for antiviral action. We use recent discoveries to illustrate how evolutionary genomic analyses help identify new antiviral genes and their mechanisms of action.

  14. Toxin structures as evolutionary tools: Using conserved 3D folds to study the evolution of rapidly evolving peptides.

    PubMed

    Undheim, Eivind A B; Mobli, Mehdi; King, Glenn F

    2016-06-01

    Three-dimensional (3D) structures have been used to explore the evolution of proteins for decades, yet they have rarely been utilized to study the molecular evolution of peptides. Here, we highlight areas in which 3D structures can be particularly useful for studying the molecular evolution of peptide toxins. Although we focus our discussion on animal toxins, including one of the most widespread disulfide-rich peptide folds known, the inhibitor cystine knot, our conclusions should be widely applicable to studies of the evolution of disulfide-constrained peptides. We show that conserved 3D folds can be used to identify evolutionary links and test hypotheses regarding the evolutionary origin of peptides with extremely low sequence identity; construct accurate multiple sequence alignments; and better understand the evolutionary forces that drive the molecular evolution of peptides. Also watch the video abstract. © 2016 WILEY Periodicals, Inc.

  15. Secuencias evolutivas e isocronas para estrellas de baja masa e intermedia

    NASA Astrophysics Data System (ADS)

    Panei, J.; Baume, G.

    2016-08-01

    We present theoretical evolutionary sequences for low- and intermediate-mass stars. The masses calculated range from 1.7 to 10 M. The initial chemical composition is . In addition, we have taken into account a nuclear network with 17 isotopes and 34 nuclear reactions. With respect to the mix, we considered overshooting with a parameter . The evolutionary calculations were initialized from the region of instability of Hayashi, in order to calculate isochrones of pre-sequence, too.

  16. Characterization of irritans mariner-like elements in the olive fruit fly Bactrocera oleae (Diptera: Tephritidae): evolutionary implications.

    PubMed

    Ben Lazhar-Ajroud, Wafa; Caruso, Aurore; Mezghani, Maha; Bouallegue, Maryem; Tastard, Emmanuelle; Denis, Françoise; Rouault, Jacques-Deric; Makni, Hanem; Capy, Pierre; Chénais, Benoît; Makni, Mohamed; Casse, Nathalie

    2016-08-01

    Genomic variation among species is commonly driven by transposable element (TE) invasion; thus, the pattern of TEs in a genome allows drawing an evolutionary history of the studied species. This paper reports in vitro and in silico detection and characterization of irritans mariner-like elements (MLEs) in the genome and transcriptome of Bactrocera oleae (Rossi) (Diptera: Tephritidae). Eleven irritans MLE sequences have been isolated in vitro using terminal inverted repeats (TIRs) as primers, and 215 have been extracted in silico from the sequenced genome of B. oleae. Additionally, the sequenced genomes of Bactrocera tryoni (Froggatt) and Bactrocera cucurbitae (Diptera: Tephritidae) have been explored to identify irritans MLEs. A total of 129 sequences from B. tryoni have been extracted, while the genome of B. cucurbitae appears probably devoid of irritans MLEs. All detected irritans MLEs are defective due to several mutations and are clustered together in a monophyletic group suggesting a common ancestor. The evolutionary history and dynamics of these TEs are discussed in relation with the phylogenetic distribution of their hosts. The knowledge on the structure, distribution, dynamic, and evolution of irritans MLEs in Bactrocera species contributes to the understanding of both their evolutionary history and the invasion history of their hosts. This could also be the basis for genetic control strategies using transposable elements.

  17. Genomic V exons from whole genome shotgun data in reptiles.

    PubMed

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  18. Grand challenges in evolutionary and population genetics: The importance of integrating epigenetics, genomics, modeling, and experimentation

    Treesearch

    Samuel A. Cushman

    2014-01-01

    This is a time of explosive growth in the fields of evolutionary and population genetics, with whole genome sequencing and bioinformatics driving a transformative paradigm shift (Morozova and Marra, 2008). At the same time, advances in epigenetics are thoroughly transforming our understanding of evolutionary processes and their implications for populations, species and...

  19. Phylogenomics and taxonomy of Lecomtelleae (Poaceae), an isolated panicoid lineage from Madagascar

    PubMed Central

    Besnard, Guillaume; Christin, Pascal-Antoine; Malé, Pierre-Jean G.; Coissac, Eric; Ralimanana, Hélène; Vorontsova, Maria S.

    2013-01-01

    Background and Aims An accurate characterization of biodiversity requires analyses of DNA sequences in addition to classical morphological descriptions. New methods based on high-throughput sequencing may allow investigation of specimens with a large set of genetic markers to infer their evolutionary history. In the grass family, the phylogenetic position of the monotypic genus Lecomtella, a rare bamboo-like endemic from Madagascar, has never been appropriately evaluated. Until now its taxonomic treatment has remained controversial, indicating the need for re-evaluation based on a combination of molecular and morphological data. Methods The phylogenetic position of Lecomtella in Poaceae was evaluated based on sequences from the nuclear and plastid genomes generated by next-generation sequencing (NGS). In addition, a detailed morphological description of L. madagascariensis was produced, and its distribution and habit were investigated in order to assess its conservation status. Key Results The complete plastid sequence, a ribosomal DNA unit and fragments of low-copy nuclear genes (phyB and ppc) were obtained. All phylogenetic analyses place Lecomtella as an isolated member of the core panicoids, which last shared a common ancestor with other species >20 million years ago. Although Lecomtella exhibits morphological characters typical of Panicoideae, an unusual combination of traits supports its treatment as a separate group. Conclusions The study showed that NGS can be used to generate abundant phylogenetic information rapidly, opening new avenues for grass phylogenetics. These data clearly showed that Lecomtella forms an isolated lineage, which, in combination with its morphological peculiarities, justifies its treatment as a separate tribe: Lecomtelleae. New descriptions of the tribe, genus and species are presented with a typification, a distribution map and an IUCN conservation assessment. PMID:23985988

  20. Muroid rodent phylogenetics: 900-species tree reveals increasing diversification rates

    PubMed Central

    Schenk, John J.

    2017-01-01

    We combined new sequence data for more than 300 muroid rodent species with our previously published sequences for up to five nuclear and one mitochondrial genes to generate the most widely and densely sampled hypothesis of evolutionary relationships across Muroidea. An exhaustive screening procedure for publically available sequences was implemented to avoid the propagation of taxonomic errors that are common to supermatrix studies. The combined data set of carefully screened sequences derived from all available sequences on GenBank with our new data resulted in a robust maximum likelihood phylogeny for 900 of the approximately 1,620 muroids. Several regions that were equivocally resolved in previous studies are now more decisively resolved, and we estimated a chronogram using 28 fossil calibrations for the most integrated age and topological estimates to date. The results were used to update muroid classification and highlight questions needing additional data. We also compared the results of multigene supermatrix studies like this one with the principal published supertrees and concluded that the latter are unreliable for any comparative study in muroids. In addition, we explored diversification patterns as an explanation for why muroid rodents represent one of the most species-rich groups of mammals by detecting evidence for increasing net diversification rates through time across the muroid tree. We suggest the observation of increasing rates may be due to a combination of parallel increases in rate across clades and high average extinction rates. Five increased diversification-rate-shifts were inferred, suggesting that multiple, but perhaps not independent, events have led to the remarkable species diversity in the superfamily. Our results provide a phylogenetic framework for comparative studies that is not highly dependent upon the signal from any one gene. PMID:28813483

  1. Phylogenomics and taxonomy of Lecomtelleae (Poaceae), an isolated panicoid lineage from Madagascar.

    PubMed

    Besnard, Guillaume; Christin, Pascal-Antoine; Malé, Pierre-Jean G; Coissac, Eric; Ralimanana, Hélène; Vorontsova, Maria S

    2013-10-01

    An accurate characterization of biodiversity requires analyses of DNA sequences in addition to classical morphological descriptions. New methods based on high-throughput sequencing may allow investigation of specimens with a large set of genetic markers to infer their evolutionary history. In the grass family, the phylogenetic position of the monotypic genus Lecomtella, a rare bamboo-like endemic from Madagascar, has never been appropriately evaluated. Until now its taxonomic treatment has remained controversial, indicating the need for re-evaluation based on a combination of molecular and morphological data. The phylogenetic position of Lecomtella in Poaceae was evaluated based on sequences from the nuclear and plastid genomes generated by next-generation sequencing (NGS). In addition, a detailed morphological description of L. madagascariensis was produced, and its distribution and habit were investigated in order to assess its conservation status. The complete plastid sequence, a ribosomal DNA unit and fragments of low-copy nuclear genes (phyB and ppc) were obtained. All phylogenetic analyses place Lecomtella as an isolated member of the core panicoids, which last shared a common ancestor with other species >20 million years ago. Although Lecomtella exhibits morphological characters typical of Panicoideae, an unusual combination of traits supports its treatment as a separate group. The study showed that NGS can be used to generate abundant phylogenetic information rapidly, opening new avenues for grass phylogenetics. These data clearly showed that Lecomtella forms an isolated lineage, which, in combination with its morphological peculiarities, justifies its treatment as a separate tribe: Lecomtelleae. New descriptions of the tribe, genus and species are presented with a typification, a distribution map and an IUCN conservation assessment.

  2. Prevalence of cryptic species in morphologically uniform taxa - fast speciation and evolutionary radiation in Asian toads.

    PubMed

    Liu, Zuyao; Chen, Guoling; Zhu, Tianqi; Zeng, Zhaochi; Lyu, Zhitong; Wang, Jian; Messenger, Kevin; Greenberg, Anthony J; Guo, Zixiao; Yang, Ziheng; Shi, Suhua; Wang, Yingyong

    2018-06-16

    Diversity and distributions of cryptic species have long been a vexing issue. Identification of species boundaries is made difficult by the lack of obvious morphological differences. Here, we investigate the cryptic diversity and evolutionary history of an underappreciated group of Asian frog species (Megophrys) to explore the pattern and dynamic of amphibian cryptic species. We sequenced four mitochondrial genes and five nuclear genes and delineated species using multiple approaches, combining DNA and mating-call data. A Bayesian species tree was generated to estimate divergence times and to reconstruct ancestral ranges. Macroevolutionary analyses and hybridization tests were conducted to explore the evolutionary dynamics of this cryptic group. Our phylogenies support the current subgenera. We revealed 43 cryptic species, 158% higher than previously thought. The species-delimitation results were further confirmed by mating-call data and morphological divergence. We found that these Asian frogss entered China from the Sunda Shelf 48 Mya, followed by an ancient radiation event during middle Miocene. We confirmed the efficiency of the multispecies coalescent model for delimitation of species with low morphological diversity. Species diversity of Megophrys is severely underappreciated, and species distributions have been misestimated as a result. Copyright © 2018. Published by Elsevier Inc.

  3. Phylogenetic Factor Analysis.

    PubMed

    Tolkoff, Max R; Alfaro, Michael E; Baele, Guy; Lemey, Philippe; Suchard, Marc A

    2018-05-01

    Phylogenetic comparative methods explore the relationships between quantitative traits adjusting for shared evolutionary history. This adjustment often occurs through a Brownian diffusion process along the branches of the phylogeny that generates model residuals or the traits themselves. For high-dimensional traits, inferring all pair-wise correlations within the multivariate diffusion is limiting. To circumvent this problem, we propose phylogenetic factor analysis (PFA) that assumes a small unknown number of independent evolutionary factors arise along the phylogeny and these factors generate clusters of dependent traits. Set in a Bayesian framework, PFA provides measures of uncertainty on the factor number and groupings, combines both continuous and discrete traits, integrates over missing measurements and incorporates phylogenetic uncertainty with the help of molecular sequences. We develop Gibbs samplers based on dynamic programming to estimate the PFA posterior distribution, over 3-fold faster than for multivariate diffusion and a further order-of-magnitude more efficiently in the presence of latent traits. We further propose a novel marginal likelihood estimator for previously impractical models with discrete data and find that PFA also provides a better fit than multivariate diffusion in evolutionary questions in columbine flower development, placental reproduction transitions and triggerfish fin morphometry.

  4. Chemistry in Infrared Dark Cloud Clumps: A Molecular Line Survey at 3 mm

    NASA Astrophysics Data System (ADS)

    Sanhueza, Patricio; Jackson, James M.; Foster, Jonathan B.; Garay, Guido; Silva, Andrea; Finn, Susanna C.

    2012-09-01

    We have observed 37 Infrared Dark Clouds (IRDCs), containing a total of 159 clumps, in high-density molecular tracers at 3 mm using the 22 m ATNF Mopra Telescope located in Australia. After determining kinematic distances, we eliminated clumps that are not located in IRDCs and clumps with a separation between them of less than one Mopra beam. Our final sample consists of 92 IRDC clumps. The most commonly detected molecular lines are (detection rates higher than 8%) N2H+, HNC, HN13C, HCO+, H13CO+, HCN, C2H, HC3N, HNCO, and SiO. We investigate the behavior of the different molecular tracers and look for chemical variations as a function of an evolutionary sequence based on Spitzer IRAC and MIPS emission. We find that the molecular tracers behave differently through the evolutionary sequence and some of them can be used to yield useful relative age information. The presence of HNC and N2H+ lines does not depend on the star formation activity. On the other hand, HC3N, HNCO, and SiO are predominantly detected in later stages of evolution. Optical depth calculations show that in IRDC clumps the N2H+ line is optically thin, the C2H line is moderately optically thick, and HNC and HCO+ are optically thick. The HCN hyperfine transitions are blended, and, in addition, show self-absorbed line profiles and extended wing emission. These factors combined prevent the use of HCN hyperfine transitions for the calculation of physical parameters. Total column densities of the different molecules, except C2H, increase with the evolutionary stage of the clumps. Molecular abundances increase with the evolutionary stage for N2H+ and HCO+. The N2H+/HCO+ and N2H+/HNC abundance ratios act as chemical clocks, increasing with the evolution of the clumps.

  5. Protein-Protein Interactions in a Crowded Environment: An Analysis via Cross-Docking Simulations and Evolutionary Information

    PubMed Central

    Lopes, Anne; Sacquin-Mora, Sophie; Dimitrova, Viktoriya; Laine, Elodie; Ponty, Yann; Carbone, Alessandra

    2013-01-01

    Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simulations and binding site predictions resulting from evolutionary sequence analysis, are possible and realizable on hundreds of proteins with variate structures and interfaces. We demonstrated this on the 168 proteins of the Mintseris Benchmark 2.0. On the one hand, we evaluated the quality of the interaction signal and the contribution of docking information compared to evolutionary information showing that the combination of the two improves partner identification. On the other hand, since protein interactions usually occur in crowded environments with several competing partners, we realized a thorough analysis of the interactions of proteins with true partners but also with non-partners to evaluate whether proteins in the environment, competing with the true partner, affect its identification. We found three populations of proteins: strongly competing, never competing, and interacting with different levels of strength. Populations and levels of strength are numerically characterized and provide a signature for the behavior of a protein in the crowded environment. We showed that partner identification, to some extent, does not depend on the competing partners present in the environment, that certain biochemical classes of proteins are intrinsically easier to analyze than others, and that small proteins are not more promiscuous than large ones. Our approach brings to light that the knowledge of the binding site can be used to reduce the high computational cost of docking simulations with no consequence in the quality of the results, demonstrating the possibility to apply coarse-grain docking to datasets made of thousands of proteins. Comparison with all available large-scale analyses aimed to partner predictions is realized. We release the complete decoys set issued by coarse-grain docking simulations of both true and false interacting partners, and their evolutionary sequence analysis leading to binding site predictions. Download site: http://www.lgm.upmc.fr/CCDMintseris/ PMID:24339765

  6. Biophysics of protein evolution and evolutionary protein biophysics

    PubMed Central

    Sikosek, Tobias; Chan, Hue Sun

    2014-01-01

    The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution. PMID:25165599

  7. dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees.

    PubMed

    Wise, Michael J

    2016-01-01

    Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.

  8. dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees

    PubMed Central

    2016-01-01

    Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa. PMID:27898695

  9. Ignoring heterozygous sites biases phylogenomic estimates of divergence times: implications for the evolutionary history of microtus voles.

    PubMed

    Lischer, Heidi E L; Excoffier, Laurent; Heckel, Gerald

    2014-04-01

    Phylogenetic reconstruction of the evolutionary history of closely related organisms may be difficult because of the presence of unsorted lineages and of a relatively high proportion of heterozygous sites that are usually not handled well by phylogenetic programs. Genomic data may provide enough fixed polymorphisms to resolve phylogenetic trees, but the diploid nature of sequence data remains analytically challenging. Here, we performed a phylogenomic reconstruction of the evolutionary history of the common vole (Microtus arvalis) with a focus on the influence of heterozygosity on the estimation of intraspecific divergence times. We used genome-wide sequence information from 15 voles distributed across the European range. We provide a novel approach to integrate heterozygous information in existing phylogenetic programs by repeated random haplotype sampling from sequences with multiple unphased heterozygous sites. We evaluated the impact of the use of full, partial, or no heterozygous information for tree reconstructions on divergence time estimates. All results consistently showed four deep and strongly supported evolutionary lineages in the vole data. These lineages undergoing divergence processes split only at the end or after the last glacial maximum based on calibration with radiocarbon-dated paleontological material. However, the incorporation of information from heterozygous sites had a significant impact on absolute and relative branch length estimations. Ignoring heterozygous information led to an overestimation of divergence times between the evolutionary lineages of M. arvalis. We conclude that the exclusion of heterozygous sites from evolutionary analyses may cause biased and misleading divergence time estimates in closely related taxa.

  10. Conservation of Endo16 expression in sea urchins despite evolutionary divergence in both cis and trans-acting components of transcriptional regulation

    NASA Technical Reports Server (NTRS)

    Romano, Laura A.; Wray, Gregory A.

    2003-01-01

    Evolutionary changes in transcriptional regulation undoubtedly play an important role in creating morphological diversity. However, there is little information about the evolutionary dynamics of cis-regulatory sequences. This study examines the functional consequence of evolutionary changes in the Endo16 promoter of sea urchins. The Endo16 gene encodes a large extracellular protein that is expressed in the endoderm and may play a role in cell adhesion. Its promoter has been characterized in exceptional detail in the purple sea urchin, Strongylocentrotus purpuratus. We have characterized the structure and function of the Endo16 promoter from a second sea urchin species, Lytechinus variegatus. The Endo16 promoter sequences have evolved in a strongly mosaic manner since these species diverged approximately 35 million years ago: the most proximal region (module A) is conserved, but the remaining modules (B-G) are unalignable. Despite extensive divergence in promoter sequences, the pattern of Endo16 transcription is largely conserved during embryonic and larval development. Transient expression assays demonstrate that 2.2 kb of upstream sequence in either species is sufficient to drive GFP reporter expression that correctly mimics this pattern of Endo16 transcription. Reciprocal cross-species transient expression assays imply that changes have also evolved in the set of transcription factors that interact with the Endo16 promoter. Taken together, these results suggest that stabilizing selection on the transcriptional output may have operated to maintain a similar pattern of Endo16 expression in S. purpuratus and L. variegatus, despite dramatic divergence in promoter sequence and mechanisms of transcriptional regulation.

  11. Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population.

    PubMed

    Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

    2016-06-02

    Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. RNA Editome in Rhesus Macaque Shaped by Purifying Selection

    PubMed Central

    Yang, Xin-Zhuang; Tan, Bertrand Chin-Ming; Fang, Huaying; Liu, Chu-Jun; Shi, Mingming; Ye, Zhi-Qiang; Zhang, Yong E.; Deng, Minghua; Zhang, Xiuqin; Li, Chuan-Yun

    2014-01-01

    Understanding of the RNA editing process has been broadened considerably by the next generation sequencing technology; however, several issues regarding this regulatory step remain unresolved – the strategies to accurately delineate the editome, the mechanism by which its profile is maintained, and its evolutionary and functional relevance. Here we report an accurate and quantitative profile of the RNA editome for rhesus macaque, a close relative of human. By combining genome and transcriptome sequencing of multiple tissues from the same animal, we identified 31,250 editing sites, of which 99.8% are A-to-G transitions. We verified 96.6% of editing sites in coding regions and 97.5% of randomly selected sites in non-coding regions, as well as the corresponding levels of editing by multiple independent means, demonstrating the feasibility of our experimental paradigm. Several lines of evidence supported the notion that the adenosine deamination is associated with the macaque editome – A-to-G editing sites were flanked by sequences with the attributes of ADAR substrates, and both the sequence context and the expression profile of ADARs are relevant factors in determining the quantitative variance of RNA editing across different sites and tissue types. In support of the functional relevance of some of these editing sites, substitution valley of decreased divergence was detected around the editing site, suggesting the evolutionary constraint in maintaining some of these editing substrates with their double-stranded structure. These findings thus complement the “continuous probing” model that postulates tinkering-based origination of a small proportion of functional editing sites. In conclusion, the macaque editome reported here highlights RNA editing as a widespread functional regulation in primate evolution, and provides an informative framework for further understanding RNA editing in human. PMID:24722121

  13. Deciphering evolutionary strata on plant sex chromosomes and fungal mating-type chromosomes through compositional segmentation.

    PubMed

    Pandey, Ravi S; Azad, Rajeev K

    2016-03-01

    Sex chromosomes have evolved from a pair of homologous autosomes which differentiated into sex determination systems, such as XY or ZW system, as a consequence of successive recombination suppression between the gametologous chromosomes. Identifying the regions of recombination suppression, namely, the "evolutionary strata", is central to understanding the history and dynamics of sex chromosome evolution. Evolution of sex chromosomes as a consequence of serial recombination suppressions is well-studied for mammals and birds, but not for plants, although 48 dioecious plants have already been reported. Only two plants Silene latifolia and papaya have been studied until now for the presence of evolutionary strata on their X chromosomes, made possible by the sequencing of sex-linked genes on both the X and Y chromosomes, which is a requirement of all current methods that determine stratum structure based on the comparison of gametologous sex chromosomes. To circumvent this limitation and detect strata even if only the sequence of sex chromosome in the homogametic sex (i.e. X or Z chromosome) is available, we have developed an integrated segmentation and clustering method. In application to gene sequences on the papaya X chromosome and protein-coding sequences on the S. latifolia X chromosome, our method could decipher all known evolutionary strata, as reported by previous studies. Our method, after validating on known strata on the papaya and S. latifolia X chromosome, was applied to the chromosome 19 of Populus trichocarpa, an incipient sex chromosome, deciphering two, yet unknown, evolutionary strata. In addition, we applied this approach to the recently sequenced sex chromosome V of the brown alga Ectocarpus sp. that has a haploid sex determination system (UV system) recovering the sex determining and pseudoautosomal regions, and then to the mating-type chromosomes of an anther-smut fungus Microbotryum lychnidis-dioicae predicting five strata in the non-recombining region of both the chromosomes.

  14. JCoDA: a tool for detecting evolutionary selection.

    PubMed

    Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir

    2010-05-27

    The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.

  15. JCoDA: a tool for detecting evolutionary selection

    PubMed Central

    2010-01-01

    Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581

  16. Insights into the evolution of enzyme substrate promiscuity after the discovery of (βα)₈ isomerase evolutionary intermediates from a diverse metagenome.

    PubMed

    Noda-García, Lianet; Juárez-Vázquez, Ana L; Ávila-Arcos, María C; Verduzco-Castro, Ernesto A; Montero-Morán, Gabriela; Gaytán, Paul; Carrillo-Tripp, Mauricio; Barona-Gómez, Francisco

    2015-06-10

    Current sequence-based approaches to identify enzyme functional shifts, such as enzyme promiscuity, have proven to be highly dependent on a priori functional knowledge, hampering our ability to reconstruct evolutionary history behind these mechanisms. Hidden Markov Model (HMM) profiles, broadly used to classify enzyme families, can be useful to distinguish between closely related enzyme families with different specificities. The (βα)8-isomerase HisA/PriA enzyme family, involved in L-histidine (HisA, mono-substrate) biosynthesis in most bacteria and plants, but also in L-tryptophan (HisA/TrpF or PriA, dual-substrate) biosynthesis in most Actinobacteria, has been used as model system to explore evolutionary hypotheses and therefore has a considerable amount of evolutionary, functional and structural knowledge available. We searched for functional evolutionary intermediates between the HisA and PriA enzyme families in order to understand the functional divergence between these families. We constructed a HMM profile that correctly classifies sequences of unknown function into the HisA and PriA enzyme sub-families. Using this HMM profile, we mined a large metagenome to identify plausible evolutionary intermediate sequences between HisA and PriA. These sequences were used to perform phylogenetic reconstructions and to identify functionally conserved amino acids. Biochemical characterization of one selected enzyme (CAM1) with a mutation within the functionally essential N-terminus phosphate-binding site, namely, an alanine instead of a glycine in HisA or a serine in PriA, showed that this evolutionary intermediate has dual-substrate specificity. Moreover, site-directed mutagenesis of this alanine residue, either backwards into a glycine or forward into a serine, revealed the robustness of this enzyme. None of these mutations, presumably upon functionally essential amino acids, significantly abolished its enzyme activities. A truncated version of this enzyme (CAM2) predicted to adopt a (βα)6-fold, and thus entirely lacking a C-terminus phosphate-binding site, was identified and shown to have HisA activity. As expected, reconstruction of the evolution of PriA from HisA with HMM profiles suggest that functional shifts involve mutations in evolutionarily intermediate enzymes of otherwise functionally essential residues or motifs. These results are in agreement with a link between promiscuous enzymes and intragenic epistasis. HMM provides a convenient approach for gaining insights into these evolutionary processes.

  17. The Evolutionary History of Protein Domains Viewed by Species Phylogeny

    PubMed Central

    Yang, Song; Bourne, Philip E.

    2009-01-01

    Background Protein structural domains are evolutionary units whose relationships can be detected over long evolutionary distances. The evolutionary history of protein domains, including the origin of protein domains, the identification of domain loss, transfer, duplication and combination with other domains to form new proteins, and the formation of the entire protein domain repertoire, are of great interest. Methodology/Principal Findings A methodology is presented for providing a parsimonious domain history based on gain, loss, vertical and horizontal transfer derived from the complete genomic domain assignments of 1015 organisms across the tree of life. When mapped to species trees the evolutionary history of domains and domain combinations is revealed, and the general evolutionary trend of domain and combination is analyzed. Conclusions/Significance We show that this approach provides a powerful tool to study how new proteins and functions emerged and to study such processes as horizontal gene transfer among more distant species. PMID:20041107

  18. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    ERIC Educational Resources Information Center

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  19. Gene-Culture Coevolution in a Social Cetacean: Integrating Acoustic and Genetic Data to Understand Population Structure in the Short-Finned Pilot Whale (Globicephala macrorhynchus)

    NASA Astrophysics Data System (ADS)

    Van Cise, Amy

    The evolutionary ecology of a species is driven by a combination of random events, ecological and environmental mechanisms, and social behavior. Gene-culture coevolutionary theory attempts to understand the evolutionary trajectory of a species by examining the interactions between these potential drivers. Further, our choice of data type will affect the patterns we observe, therefore by integrating several types of data we achieve a holistic understanding of the various aspects of evolutionary ecology within a species. In order to understand population structure in short-finned pilot whales, I use a combination of genetic and acoustic data to examine structure on evolutionary (genetic) and cultural (acoustic) timescales. I first examine structure among geographic populations in the Pacific Ocean. Using genetic sequences from the mitochondrial control region, I show that two genetically and morphologically distinct types of short-finned pilot whale, described off the coast of Japan, have non-overlapping distributions throughout their range in the Pacific Ocean. Analysis of the acoustic features of their social calls indicates that they are acoustically differentiated, possibly due to limited communication between the two types. This evidence supports the hypothesis that the two types may be separate species or subspecies. Next, I examine structure among island communities and social groups within the Hawaiian Island population of short-finned pilot whales. Using a combination of mitochondrial and nuclear DNA, I showed that the hierarchical social structure in Hawaiian pilot whales is driven by genetic relatedness; individuals remain in groups with their immediate family members, and preferentially associate with relatives. Similarly, social structure affects genetic differentiation, likely by restricting access to mates. Acoustic differentiation among social groups indicates that social structure may also restrict the flow of cultural information, such as vocal repertoire or dialect. The qualitative correlation between social structure, cultural information transfer, and genetic structure suggest that gene-culture coevolution may be an important mechanism to the evolutionary ecology of short-finned pilot whales. Further research may reveal a similar structure in the transmission of ecological behaviors, such as diet preference, habitat use, or movements. The results of this research underscore the applicability of gene-culture coevolutionary theory to non-human taxa.

  20. Resolving the Origin of Rabbit Hemorrhagic Disease Virus: Insights from an Investigation of the Viral Stocks Released in Australia

    PubMed Central

    Eden, John-Sebastian; Read, Andrew J.; Duckworth, Janine A.; Strive, Tanja

    2015-01-01

    To resolve the evolutionary history of rabbit hemorrhagic disease virus (RHDV), we performed a genomic analysis of the viral stocks imported and released as a biocontrol measure in Australia, as well as a global phylogenetic analysis. Importantly, conflicts were identified between the sequences determined here and those previously published that may have affected evolutionary rate estimates. By removing likely erroneous sequences, we show that RHDV emerged only shortly before its initial description in China. PMID:26378178

  1. Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error

    PubMed Central

    Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric

    2010-01-01

    It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140

  2. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  3. Exploiting three kinds of interface propensities to identify protein binding sites.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2009-08-01

    Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.

  4. Human Y chromosome copy number variation in the next generation sequencing era and beyond.

    PubMed

    Massaia, Andrea; Xue, Yali

    2017-05-01

    The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.

  5. Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism

    PubMed Central

    Yin, Ling; An, Yunhe; Qu, Junjie; Li, Xinlong; Zhang, Yali; Dry, Ian; Wu, Huijuan; Lu, Jiang

    2017-01-01

    Plasmopara viticola causes downy mildew disease of grapevine which is one of the most devastating diseases of viticulture worldwide. Here we report a 101.3 Mb whole genome sequence of P. viticola isolate ‘JL-7-2’ obtained by a combination of Illumina and PacBio sequencing technologies. The P. viticola genome contains 17,014 putative protein-coding genes and has ~26% repetitive sequences. A total of 1,301 putative secreted proteins, including 100 putative RXLR effectors and 90 CRN effectors were identified in this genome. In the secretome, 261 potential pathogenicity genes and 95 carbohydrate-active enzymes were predicted. Transcriptional analysis revealed that most of the RXLR effectors, pathogenicity genes and carbohydrate-active enzymes were significantly up-regulated during infection. Comparative genomic analysis revealed that P. viticola evolved independently from the Arabidopsis downy mildew pathogen Hyaloperonospora arabidopsidis. The availability of the P. viticola genome provides a valuable resource not only for comparative genomic analysis and evolutionary studies among oomycetes, but also enhance our knowledge on the mechanism of interactions between this biotrophic pathogen and its host. PMID:28417959

  6. Analysis of SINE and LINE repeat content of Y chromosomes in the platypus, Ornithorhynchus anatinus.

    PubMed

    Kortschak, R Daniel; Tsend-Ayush, Enkhjargal; Grützner, Frank

    2009-01-01

    Monotremes feature an extraordinary sex-chromosome system that consists of five X and five Y chromosomes in males. These sex chromosomes share homology with bird sex chromosomes but no homology with the therian X. The genome of a female platypus was recently completed, providing unique insights into sequence and gene content of autosomes and X chromosomes, but no Y-specific sequence has so far been analysed. Here we report the isolation, sequencing and analysis of approximately 700 kb of sequence of the non-recombining regions of Y2, Y3 and Y5, which revealed differences in base composition and repeat content between autosomes and sex chromosomes, and within the sex chromosomes themselves. This provides the first insights into repeat content of Y chromosomes in platypus, which overall show similar patterns of repeat composition to Y chromosomes in other species. Interestingly, we also observed differences between the various Y chromosomes, and in combination with timing and activity patterns we provide an approach that can be used to examine the evolutionary history of the platypus sex-chromosome chain.

  7. Diversity and evolutionary patterns of immune genes in free-ranging Namibian leopards (Panthera pardus pardus).

    PubMed

    Castro-Prieto, Aines; Wachter, Bettina; Melzheimer, Joerg; Thalwitzer, Susanne; Sommer, Simone

    2011-01-01

    The genes of the major histocompatibility complex (MHC) are a key component of the mammalian immune system and have become important molecular markers for fitness-related genetic variation in wildlife populations. Currently, no information about the MHC sequence variation and constitution in African leopards exists. In this study, we isolated and characterized genetic variation at the adaptively most important region of MHC class I and MHC class II-DRB genes in 25 free-ranging African leopards from Namibia and investigated the mechanisms that generate and maintain MHC polymorphism in the species. Using single-stranded conformation polymorphism analysis and direct sequencing, we detected 6 MHC class I and 6 MHC class II-DRB sequences, which likely correspond to at least 3 MHC class I and 3 MHC class II-DRB loci. Amino acid sequence variation in both MHC classes was higher or similar in comparison to other reported felids. We found signatures of positive selection shaping the diversity of MHC class I and MHC class II-DRB loci during the evolutionary history of the species. A comparison of MHC class I and MHC class II-DRB sequences of the leopard to those of other felids revealed a trans-species mode of evolution. In addition, the evolutionary relationships of MHC class II-DRB sequences between African and Asian leopard subspecies are discussed.

  8. MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks

    USDA-ARS?s Scientific Manuscript database

    Motivation: Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary h...

  9. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    PubMed

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  10. Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

    PubMed Central

    Labudde, Dirk

    2015-01-01

    The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540

  11. Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

    PubMed

    Grunert, Steffen; Labudde, Dirk

    2015-01-01

    The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.

  12. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin.

    PubMed

    Peyretaillade, E; Broussolle, V; Peyret, P; Méténier, G; Gouy, M; Vivarès, C P

    1998-06-01

    An intronless gene encoding a protein of 592 amino acid residues with similarity to 70-kDa heat shock proteins (HSP70s) has been cloned and sequenced from the amitochondrial protist Encephalitozoon cuniculi (phylum Microsporidia). Southern blot analyses show the presence of a single gene copy located on chromosome XI. The encoded protein exhibits an N-terminal hydrophobic leader sequence and two motifs shared by proteobacterial and mitochondrially expressed HSP70 homologs. Phylogenetic analysis using maximum likelihood and evolutionary distances place the E. cuniculi sequence in the cluster of mitochondrially expressed HSP70s, with a higher evolutionary rate than those of homologous sequences. Similar results were obtained after cloning a fragment of the homologous gene in the closely related species E. hellem. The presence of a nuclear targeting signal-like sequence supports a role of the Encephalitozoon HSP70 as a molecular chaperone of nuclear proteins. No evidence for cytosolic or endoplasmic reticulum forms of HSP70 was obtained through PCR amplification. These data suggest that Encephalitozoon species have evolved from an ancestor bearing mitochondria, which is in disagreement with the postulated presymbiotic origin of Microsporidia. The specific role and intracellular localization of the mitochondrial HSP70-like protein remain to be elucidated.

  13. Simple versus complex models of trait evolution and stasis as a response to environmental change

    NASA Astrophysics Data System (ADS)

    Hunt, Gene; Hopkins, Melanie J.; Lidgard, Scott

    2015-04-01

    Previous analyses of evolutionary patterns, or modes, in fossil lineages have focused overwhelmingly on three simple models: stasis, random walks, and directional evolution. Here we use likelihood methods to fit an expanded set of evolutionary models to a large compilation of ancestor-descendant series of populations from the fossil record. In addition to the standard three models, we assess more complex models with punctuations and shifts from one evolutionary mode to another. As in previous studies, we find that stasis is common in the fossil record, as is a strict version of stasis that entails no real evolutionary changes. Incidence of directional evolution is relatively low (13%), but higher than in previous studies because our analytical approach can more sensitively detect noisy trends. Complex evolutionary models are often favored, overwhelmingly so for sequences comprising many samples. This finding is consistent with evolutionary dynamics that are, in reality, more complex than any of the models we consider. Furthermore, the timing of shifts in evolutionary dynamics varies among traits measured from the same series. Finally, we use our empirical collection of evolutionary sequences and a long and highly resolved proxy for global climate to inform simulations in which traits adaptively track temperature changes over time. When realistically calibrated, we find that this simple model can reproduce important aspects of our paleontological results. We conclude that observed paleontological patterns, including the prevalence of stasis, need not be inconsistent with adaptive evolution, even in the face of unstable physical environments.

  14. Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

    PubMed

    Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

    2017-02-01

    An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  16. Microsatellite markers identify three lineages of Phytophthora ramorum in US nurseries, yet single lineages in US forest and European nursery populations.

    PubMed

    Ivors, K; Garbelotto, M; Vries, I D E; Ruyter-Spira, C; Te Hekkert, B; Rosenzweig, N; Bonants, P

    2006-05-01

    Analysis of 12 polymorphic simple sequence repeats identified in the genome sequence of Phytophthora ramorum, causal agent of 'sudden oak death', revealed genotypic diversity to be significantly higher in nurseries (91% of total) than in forests (18% of total). Our analysis identified only two closely related genotypes in US forests, while the genetic structure of populations from European nurseries was of intermediate complexity, including multiple, closely related genotypes. Multilocus analysis determined populations in US forests reproduce clonally and are likely descendants of a single introduced individual. The 151 isolates analysed clustered in three clades. US forest and European nursery isolates clustered into two distinct clades, while one isolate from a US nursery belonged to a third novel clade. The combined microsatellite, sequencing and morphological analyses suggest the three clades represent distinct evolutionary lineages. All three clades were identified in some US nurseries, emphasizing the role of commercial plant trade in the movement of this pathogen.

  17. Principles of regulatory information conservation between mouse and human

    DOE PAGES

    Cheng, Yong; Ma, Zhihai; Kim, Bong-Hyun; ...

    2014-11-19

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human–mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and withmore » genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Lastly, single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.« less

  18. Tree-Structured Digital Organisms Model

    NASA Astrophysics Data System (ADS)

    Suzuki, Teruhiko; Nobesawa, Shiho; Tahara, Ikuo

    Tierra and Avida are well-known models of digital organisms. They describe a life process as a sequence of computation codes. A linear sequence model may not be the only way to describe a digital organism, though it is very simple for a computer-based model. Thus we propose a new digital organism model based on a tree structure, which is rather similar to the generic programming. With our model, a life process is a combination of various functions, as if life in the real world is. This implies that our model can easily describe the hierarchical structure of life, and it can simulate evolutionary computation through mutual interaction of functions. We verified our model by simulations that our model can be regarded as a digital organism model according to its definitions. Our model even succeeded in creating species such as viruses and parasites.

  19. Genomic Epidemiology of Salmonella enterica Serotype Enteritidis based on Population Structure of Prevalent Lineages

    PubMed Central

    Desai, Prerak T.; den Bakker, Henk C.; Mikoleit, Matthew; Tolar, Beth; Trees, Eija; Hendriksen, Rene S.; Frye, Jonathan G.; Porwollik, Steffen; Weimer, Bart C.; Wiedmann, Martin; Weinstock, George M.; Fields, Patricia I.; McClelland, Michael

    2014-01-01

    Salmonella enterica serotype Enteritidis is one of the most commonly reported causes of human salmonellosis. Its low genetic diversity, measured by fingerprinting methods, has made subtyping a challenge. We used whole-genome sequencing to characterize 125 S. enterica Enteritidis and 3 S. enterica serotype Nitra strains. Single-nucleotide polymorphisms were filtered to identify 4,887 reliable loci that distinguished all isolates from each other. Our whole-genome single-nucleotide polymorphism typing approach was robust for S. enterica Enteritidis subtyping with combined data for different strains from 2 different sequencing platforms. Five major genetic lineages were recognized, which revealed possible patterns of geographic and epidemiologic distribution. Analyses on the population dynamics and evolutionary history estimated that major lineages emerged during the 17th–18th centuries and diversified during the 1920s and 1950s. PMID:25147968

  20. Novel variable number of tandem repeats of gibbon MAOA gene and its evolutionary significance.

    PubMed

    Choi, Yuri; Jung, Yi-Deun; Ayarpadikannan, Selvam; Koga, Akihiko; Imai, Hiroo; Hirai, Hirohisa; Roos, Christian; Kim, Heui-Soo

    2014-08-01

    Variable number of tandem repeats (VNTRs) are scattered throughout the primate genome, and genetic variation of these VNTRs have been accumulated during primate radiation. Here, we analyzed VNTRs upstream of the monoamine oxidase A (MAOA) gene in 11 different gibbon species. An abundance of truncated VNTR sequences and copy number differences were observed compared to those of human VNTR sequences. To better understand the biological role of these VNTRs, a luciferase activity assay was conducted and results indicated that selected VNTR sequences of the MAOA gene from human and three different gibbon species (Hylobates klossii, Hylobates lar, and Nomascus concolor) showed silencing ability. Together, these data could be useful for understanding the evolutionary history and functional significance of MAOA VNTR sequences in gibbon species.

  1. Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle.

    PubMed

    Zhan, Xiangjiang; Pan, Shengkai; Wang, Junyi; Dixon, Andrew; He, Jing; Muller, Margit G; Ni, Peixiang; Hu, Li; Liu, Yuan; Hou, Haolong; Chen, Yuanping; Xia, Jinquan; Luo, Qiong; Xu, Pengwei; Chen, Ying; Liao, Shengguang; Cao, Changchang; Gao, Shukun; Wang, Zhaobao; Yue, Zhen; Li, Guoqing; Yin, Ye; Fox, Nick C; Wang, Jun; Bruford, Michael W

    2013-05-01

    As top predators, falcons possess unique morphological, physiological and behavioral adaptations that allow them to be successful hunters: for example, the peregrine is renowned as the world's fastest animal. To examine the evolutionary basis of predatory adaptations, we sequenced the genomes of both the peregrine (Falco peregrinus) and saker falcon (Falco cherrug), and we present parallel, genome-wide evidence for evolutionary innovation and selection for a predatory lifestyle. The genomes, assembled using Illumina deep sequencing with greater than 100-fold coverage, are both approximately 1.2 Gb in length, with transcriptome-assisted prediction of approximately 16,200 genes for both species. Analysis of 8,424 orthologs in both falcons, chicken, zebra finch and turkey identified consistent evidence for genome-wide rapid evolution in these raptors. SNP-based inference showed contrasting recent demographic trajectories for the two falcons, and gene-based analysis highlighted falcon-specific evolutionary novelties for beak development and olfaction and specifically for homeostasis-related genes in the arid environment-adapted saker.

  2. DNA tetrominoes: the construction of DNA nanostructures using self-organised heterogeneous deoxyribonucleic acids shapes.

    PubMed

    Ong, Hui San; Rahim, Mohd Syafiq; Firdaus-Raih, Mohd; Ramlan, Effirul Ikhwan

    2015-01-01

    The unique programmability of nucleic acids offers alternative in constructing excitable and functional nanostructures. This work introduces an autonomous protocol to construct DNA Tetris shapes (L-Shape, B-Shape, T-Shape and I-Shape) using modular DNA blocks. The protocol exploits the rich number of sequence combinations available from the nucleic acid alphabets, thus allowing for diversity to be applied in designing various DNA nanostructures. Instead of a deterministic set of sequences corresponding to a particular design, the protocol promotes a large pool of DNA shapes that can assemble to conform to any desired structures. By utilising evolutionary programming in the design stage, DNA blocks are subjected to processes such as sequence insertion, deletion and base shifting in order to enrich the diversity of the resulting shapes based on a set of cascading filters. The optimisation algorithm allows mutation to be exerted indefinitely on the candidate sequences until these sequences complied with all the four fitness criteria. Generated candidates from the protocol are in agreement with the filter cascades and thermodynamic simulation. Further validation using gel electrophoresis indicated the formation of the designed shapes. Thus, supporting the plausibility of constructing DNA nanostructures in a more hierarchical, modular, and interchangeable manner.

  3. ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

    PubMed

    He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

    2013-12-04

    Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.

  4. Isolation, characterization, and primary structure of rubredoxin from the photosynthetic bacterium, Heliobacillus mobilis

    NASA Technical Reports Server (NTRS)

    Lee, W. Y.; Brune, D. C.; LoBrutto, R.; Blankenship, R. E.

    1995-01-01

    Rubredoxin is a small nonheme iron protein that serves as an electron carrier in bacterial systems. Rubredoxin has now been isolated and characterized from the strictly anaerobic phototroph, Heliobacillus mobilis. THe molecular mass (5671.3 Da from the amino acid sequence) was confirmed and partial formylation of the N-terminal methionyl residue was established by matrix-assisted laser desorption mass spectroscopy. The complete 52-amino-acid sequence was determined by a combination of N-terminal sequencing by Edman degradation and C-terminal sequencing by a novel method using carboxypeptidase treatment in conjunction with amino acid analysis and laser desorption time of flight mass spectrometry. The molar absorption coefficient of Hc. mobilis rubredoxin at 490 nm is 6.9 mM-1 cm-1 and the midpoint redox potential at pH 8.0 is -46 mV. The EPR spectrum of the oxidized form shows resonances at g = 9.66 and 4.30 due to a high-spin ferric iron. The amino acid sequence is homologous to those of rubredoxins from other species, in particular, the gram-positive bacteria, and the phototrophic green sulfur bacteria, and the evolutionary implications of this are discussed.

  5. Genome-wide comparative analysis of four Indian Drosophila species.

    PubMed

    Mohanty, Sujata; Khanna, Radhika

    2017-12-01

    Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.

  6. The Evolution of Ion Pumps.

    ERIC Educational Resources Information Center

    Maloney, Peter C.; Wilson, T. Hastings

    1985-01-01

    Constructs an evolutionary sequence to account for the diversity of ion pumps found today. Explanations include primary ion pumps in bacteria, features and distribution of ATP-driven pumps, preference for cation transport, and proton pump reversal. The integrated evolutionary hypothesis should encourage new experimental approaches. (DH)

  7. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  8. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology

    Treesearch

    Richard Cronn; Aaron Liston; Matthew Parks; David S. Gernandt; Rongkun Shen; Todd Mockler

    2008-01-01

    Organellar DNA sequences are widely used in evolutionary and population genetic studies; however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to...

  9. The Rise and Fall of an Evolutionary Innovation: Contrasting Strategies of Venom Evolution in Ancient and Young Animals

    PubMed Central

    Sunagar, Kartik; Moran, Yehu

    2015-01-01

    Animal venoms are theorized to evolve under the significant influence of positive Darwinian selection in a chemical arms race scenario, where the evolution of venom resistance in prey and the invention of potent venom in the secreting animal exert reciprocal selection pressures. Venom research to date has mainly focused on evolutionarily younger lineages, such as snakes and cone snails, while mostly neglecting ancient clades (e.g., cnidarians, coleoids, spiders and centipedes). By examining genome, venom-gland transcriptome and sequences from the public repositories, we report the molecular evolutionary regimes of several centipede and spider toxin families, which surprisingly accumulated low-levels of sequence variations, despite their long evolutionary histories. Molecular evolutionary assessment of over 3500 nucleotide sequences from 85 toxin families spanning the breadth of the animal kingdom has unraveled a contrasting evolutionary strategy employed by ancient and evolutionarily young clades. We show that the venoms of ancient lineages remarkably evolve under the heavy constraints of negative selection, while toxin families in lineages that originated relatively recently rapidly diversify under the influence of positive selection. We propose that animal venoms mostly employ a ‘two-speed’ mode of evolution, where the major influence of diversifying selection accompanies the earlier stages of ecological specialization (e.g., diet and range expansion) in the evolutionary history of the species–the period of expansion, resulting in the rapid diversification of the venom arsenal, followed by longer periods of purifying selection that preserve the potent toxin pharmacopeia–the period of purification and fixation. However, species in the period of purification may re-enter the period of expansion upon experiencing a major shift in ecology or environment. Thus, we highlight for the first time the significant roles of purifying and episodic selections in shaping animal venoms. PMID:26492532

  10. Molecular Divergence and Species Delimitation of the Cultivated Oyster Mushrooms: Integration of IGS1 and ITS

    PubMed Central

    Bhassu, Subha; Tan, Yee Shin; Vikineswary, Sabaratnam

    2014-01-01

    Identification of edible mushrooms particularly Pleurotus genus has been restricted due to various obstacles. The present study attempted to use the combination of two variable regions of IGS1 and ITS for classifying the economically cultivated Pleurotus species. Integration of the two regions proved a high ability that not only could clearly distinguish the species but also served sufficient intraspecies variation. Phylogenetic tree (IGS1 + ITS) showed seven distinct clades, each clade belonging to a separate species group. Moreover, the species differentiation was tested by AMOVA and the results were reconfirmed by presenting appropriate amounts of divergence (91.82% among and 8.18% within the species). In spite of achieving a proper classification of species by combination of IGS1 and ITS sequences, the phylogenetic tree showed the misclassification of the species of P. nebrodensis and P. eryngii var. ferulae with other strains of P. eryngii. However, the constructed median joining (MJ) network could not only differentiate between these species but also offer a profound perception of the species' evolutionary process. Eventually, due to the sufficient variation among and within species, distinct sequences, simple amplification, and location between ideal conserved ribosomal genes, the integration of IGS1 and ITS sequences is recommended as a desirable DNA barcode. PMID:24587752

  11. Carboniferous and Permian evolutionary records for the Paleo-Tethys Ocean constrained by newly discovered Xiangtaohu ophiolites from central Qiangtang, central Tibet

    NASA Astrophysics Data System (ADS)

    Zhang, Xiu-Zheng; Dong, Yong-Sheng; Wang, Qiang; Dan, Wei; Zhang, Chunfu; Deng, Ming-Rong; Xu, Wang; Xia, Xiao-Ping; Zeng, Ji-Peng; Liang, He

    2016-07-01

    Reconstructing the evolutionary history of the Paleo-Tethys Ocean remains at the center of debates over the linkage between Gondwana dispersion and Asian accretion. Identifying the remnants of oceanic lithosphere (ophiolites) has very important implications for identifying suture zones, unveiling the evolutionary history of fossil oceans, and reconstructing the amalgamation history between different blocks. Here we report newly documented ophiolite suites from the Longmu Co-Shuanghu Suture zone (LSSZ) in the Xiangtaohu area, central Qiangtang block, Tibet. Detailed geological investigations and zircon U-Pb dating reveal that the Xiangtaohu ophiolites are composed of a suite of Permian (281-275 Ma) ophiolites with a nearly complete Penrose sequence and a suite of Early Carboniferous (circa 350 Ma) ophiolite remnants containing only part of the lower oceanic crust. Geochemical and Sr-Nd-O isotopic data show that the Permian and Carboniferous ophiolites in this study were derived from an N-mid-ocean ridge basalts-like mantle source with varied suprasubduction-zone (SSZ) signatures and were characterized by crystallization sequences from wet magmas, suggesting typical SSZ-affinity ophiolites. Permian and Carboniferous SSZ ophiolites in the central Qiangtang provide robust evidence for the existence and evolution of an ancient ocean basin. Combining with previous studies on high-pressure metamorphic rocks and pelagic radiolarian cherts, and with tectonostratigraphic and paleontological data, we support the LSSZ as representing the main suture of the Paleo-Tethys Ocean which probably existed and evolved from Devonian to Triassic. The opening and demise of the Paleo-Tethys Ocean dominated the formation of the major framework for the East and/or Southeast Asia.

  12. Disentangling the complex evolutionary history of the Western Palearctic blue tits (Cyanistes spp.) - phylogenomic analyses suggest radiation by multiple colonization events and subsequent isolation.

    PubMed

    Stervander, Martin; Illera, Juan Carlos; Kvist, Laura; Barbosa, Pedro; Keehnen, Naomi P; Pruisscher, Peter; Bensch, Staffan; Hansson, Bengt

    2015-05-01

    Isolated islands and their often unique biota continue to play key roles for understanding the importance of drift, genetic variation and adaptation in the process of population differentiation and speciation. One island system that has inspired and intrigued evolutionary biologists is the blue tit complex (Cyanistes spp.) in Europe and Africa, in particular the complex evolutionary history of the multiple genetically distinct taxa of the Canary Islands. Understanding Afrocanarian colonization events is of particular importance because of recent unconventional suggestions that these island populations acted as source of the widespread population in mainland Africa. We investigated the relationship between mainland and island blue tits using a combination of Sanger sequencing at a population level (20 loci; 12 500 nucleotides) and next-generation sequencing of single population representatives (>3 200 000 nucleotides), analysed in coalescence and phylogenetic frameworks. We found (i) that Afrocanarian blue tits are monophyletic and represent four major clades, (ii) that the blue tit complex has a continental origin and that the Canary Islands were colonized three times, (iii) that all island populations have low genetic variation, indicating low long-term effective population sizes and (iv) that populations on La Palma and in Libya represent relicts of an ancestral North African population. Further, demographic reconstructions revealed (v) that the Canary Islands, conforming to traditional views, hold sink populations, which have not served as source for back colonization of the African mainland. Our study demonstrates the importance of complete taxon sampling and an extensive multimarker study design to obtain robust phylogeographical inferences. © 2015 John Wiley & Sons Ltd.

  13. Phylogenomic evidence for a recent and rapid radiation of lizards in the Patagonian Liolaemus fitzingerii species group.

    PubMed

    Grummer, Jared A; Morando, Mariana M; Avila, Luciano J; Sites, Jack W; Leaché, Adam D

    2018-08-01

    Rapid evolutionary radiations are difficult to resolve because divergence events are nearly synchronous and gene flow among nascent species can be high, resulting in a phylogenetic "bush". Large datasets composed of sequence loci from across the genome can potentially help resolve some of these difficult phylogenetic problems. A suitable test case is the Liolaemus fitzingerii species group of lizards, which includes twelve species that are broadly distributed in Argentinean Patagonia. The species in the group have had a complex evolutionary history that has led to high morphological variation and unstable taxonomy. We generated a sequence capture dataset for 28 ingroup individuals of 580 nuclear loci, alongside a mitogenomic dataset, to infer phylogenetic relationships among species in this group. Relationships among species were generally weakly supported with the nuclear data, and along with an inferred age of ∼2.6 million years old, indicate either rapid evolution, hybridization, incomplete lineage sorting, non-informative data, or a combination thereof. We inferred a signal of mito-nuclear discordance, indicating potential hybridization between L. melanops and L. martorii, and phylogenetic network analyses provided support for 5 reticulation events among species. Phasing the nuclear loci did not provide additional insight into relationships or suspected patterns of hybridization. Only one clade, composed of L. camarones, L. fitzingerii, and L. xanthoviridis was recovered across all analyses. Genomic datasets provide molecular systematists with new opportunities to resolve difficult phylogenetic problems, yet the lack of phylogenetic resolution in Patagonian Liolaemus is biologically meaningful and indicative of a recent and rapid evolutionary radiation. The phylogenetic relationships of the Liolaemus fitzingerii group may be best modeled as a reticulated network instead of a bifurcating phylogeny. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Evolutionary Dynamics of Pathoadaptation Revealed by Three Independent Acquisitions of the VirB/D4 Type IV Secretion System in Bartonella

    PubMed Central

    Harms, Alexander; Segers, Francisca H.I.D.; Quebatte, Maxime; Mistl, Claudia; Manfredi, Pablo; Körner, Jonas; Chomel, Bruno B.; Kosoy, Michael; Maruyama, Soichi; Engel, Philipp

    2017-01-01

    The α-proteobacterial genus Bartonella comprises a group of ubiquitous mammalian pathogens that are studied as a model for the evolution of bacterial pathogenesis. Vast abundance of two particular phylogenetic lineages of Bartonella had been linked to enhanced host adaptability enabled by lineage-specific acquisition of a VirB/D4 type IV secretion system (T4SS) and parallel evolution of complex effector repertoires. However, the limited availability of genome sequences from one of those lineages as well as other, remote branches of Bartonella has so far hampered comprehensive understanding of how the VirB/D4 T4SS and its effectors called Beps have shaped Bartonella evolution. Here, we report the discovery of a third repertoire of Beps associated with the VirB/D4 T4SS of B. ancashensis, a novel human pathogen that lacks any signs of host adaptability and is only distantly related to the two species-rich lineages encoding a VirB/D4 T4SS. Furthermore, sequencing of ten new Bartonella isolates from under-sampled lineages enabled combined in silico analyses and wet lab experiments that suggest several parallel layers of functional diversification during evolution of the three Bep repertoires from a single ancestral effector. Our analyses show that the Beps of B. ancashensis share many features with the two other repertoires, but may represent a more ancestral state that has not yet unleashed the adaptive potential of such an effector set. We anticipate that the effectors of B. ancashensis will enable future studies to dissect the evolutionary history of Bartonella effectors and help unraveling the evolutionary forces underlying bacterial host adaptation. PMID:28338931

  15. The Liverwort Contains a Lectin That Is Structurally and Evolutionary Related to the Monocot Mannose-Binding Lectins1

    PubMed Central

    Peumans, Willy J.; Barre, Annick; Bras, Julien; Rougé, Pierre; Proost, Paul; Van Damme, Els J.M.

    2002-01-01

    A mannose (Man)-binding lectin has been isolated and characterized from the thallus of the liverwort Marchantia polymorpha. N-terminal sequencing indicated that the M. polymorpha agglutinin (Marpola) shares sequence similarity with the superfamily of monocot Man-binding lectins. Searches in the databases yielded expressed sequence tags encoding Marpola. Sequence analysis, molecular modeling, and docking experiments revealed striking structural similarities between Marpola and the monocot Man-binding lectins. Activity and specificity studies further indicated that Marpola is a much stronger agglutinin than the Galanthus nivalis agglutinin and exhibits a preference for methylated Man and glucose, which is unprecedented within the family of monocot Man-binding lectins. The discovery of Marpola allows us, for the first time, to corroborate the evolutionary relationship between a lectin from a lower plant and a well-established lectin family from flowering plants. In addition, the identification of Marpola sheds a new light on the molecular evolution of the superfamily of monocot Man-binding lectins. Beside evolutionary considerations, the occurrence of a G. nivalis agglutinin homolog in a lower plant necessitates the rethinking of the physiological role of the whole family of monocot Man-binding lectins. PMID:12114560

  16. Studying the evolutionary relationships and phylogenetic trees of 21 groups of tRNA sequences based on complex networks.

    PubMed

    Wei, Fangping; Chen, Bowen

    2012-03-01

    To find out the evolutionary relationships among different tRNA sequences of 21 amino acids, 22 networks are constructed. One is constructed from whole tRNAs, and the other 21 networks are constructed from the tRNAs which carry the same amino acids. A new method is proposed such that the alignment scores of any two amino acids groups are determined by the average degree and the average clustering coefficient of their networks. The anticodon feature of isolated tRNA and the phylogenetic trees of 21 group networks are discussed. We find that some isolated tRNA sequences in 21 networks still connect with other tRNAs outside their group, which reflects the fact that those tRNAs might evolve by intercrossing among these 21 groups. We also find that most anticodons among the same cluster are only one base different in the same sites when S ≥ 70, and they stay in the same rank in the ladder of evolutionary relationships. Those observations seem to agree on that some tRNAs might mutate from the same ancestor sequences based on point mutation mechanisms.

  17. Phylogenetic relationships and divergence dates of softshell turtles (Testudines: Trionychidae) inferred from complete mitochondrial genomes.

    PubMed

    Li, H; Liu, J; Xiong, L; Zhang, H; Zhou, H; Yin, H; Jing, W; Li, J; Shi, Q; Wang, Y; Liu, J; Nie, L

    2017-05-01

    The softshell turtles (Trionychidae) are one of the most widely distributed reptile groups in the world, and fossils have been found on all continents except Antarctica. The phylogenetic relationships among members of this group have been previously studied; however, disagreements regarding its taxonomy, its phylogeography and divergence times are still poorly understood as well. Here, we present a comprehensive mitogenomic study of softshell turtles. We sequenced the complete mitochondrial genomes of 10 softshell turtles, in addition to the GenBank sequence of Dogania subplana, Lissemys punctata, Trionyx triunguis, which cover all extant genera within Trionychidae except for Cyclanorbis and Cycloderma. These data were combined with other mitogenomes of turtles for phylogenetic analyses. Divergence time calibration and ancestral reconstruction were calculated using BEAST and RASP software, respectively. Our phylogenetic analyses indicate that Trionychidae is the sister taxon of Carettochelyidae, and support the monophyly of Trionychinae and Cyclanorbinae, which is consistent with morphological data and molecular analysis. Our phylogenetic analyses have established a sister taxon relationship between the Asian Rafetus and the Asian Palea + Pelodiscus + Dogania + Nilssonia + Amyda, whereas a previous study grouped the Asian Rafetus with the American Apalone. The results of divergence time estimates and area ancestral reconstruction show that extant Trionychidae originated in Asia at around 108 million years ago (MA), and radiations mainly occurred during two warm periods, namely Late Cretaceous-Early Eocene and Oligocene. By combining the estimated divergence time and the reconstructed ancestral area of softshell turtles, we determined that the dispersal of softshell turtles out of Asia may have taken three routes. Furthermore, the times of dispersal seem to be in agreement with the time of the India-Asia collision and opening of the Bering Strait, which provide evidence for the accuracy of our estimation of divergence time. Overall, the mitogenomes of this group were used to explore the origin and dispersal route of Trionychidae and have provided new insights on the evolution of this group. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.

  18. Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty

    PubMed Central

    Longo, Mark S; Carone, Dawn M; Green, Eric D; O'Neill, Michael J; O'Neill, Rachel J

    2009-01-01

    Background Large-scale genome rearrangements brought about by chromosome breaks underlie numerous inherited diseases, initiate or promote many cancers and are also associated with karyotype diversification during species evolution. Recent research has shown that these breakpoints are nonrandomly distributed throughout the mammalian genome and many, termed "evolutionary breakpoints" (EB), are specific genomic locations that are "reused" during karyotypic evolution. When the phylogenetic trajectory of orthologous chromosome segments is considered, many of these EB are coincident with ancient centromere activity as well as new centromere formation. While EB have been characterized as repeat-rich regions, it has not been determined whether specific sequences have been retained during evolution that would indicate previous centromere activity or a propensity for new centromere formation. Likewise, the conservation of specific sequence motifs or classes at EBs among divergent mammalian taxa has not been determined. Results To define conserved sequence features of EBs associated with centromere evolution, we performed comparative sequence analysis of more than 4.8 Mb within the tammar wallaby, Macropus eugenii, derived from centromeric regions (CEN), euchromatic regions (EU), and an evolutionary breakpoint (EB) that has undergone convergent breakpoint reuse and past centromere activity in marsupials. We found a dramatic enrichment for long interspersed nucleotide elements (LINE1s) and endogenous retroviruses (ERVs) and a depletion of short interspersed nucleotide elements (SINEs) shared between CEN and EBs. We analyzed the orthologous human EB (14q32.33), known to be associated with translocations in many cancers including multiple myelomas and plasma cell leukemias, and found a conserved distribution of similar repetitive elements. Conclusion Our data indicate that EBs tracked within the class Mammalia harbor sequence features retained since the divergence of marsupials and eutherians that may have predisposed these genomic regions to large-scale chromosomal instability. PMID:19630942

  19. Evolutionary Dynamics of the Gametologous CTNNB1 Gene on the Z and W Chromosomes of Snakes.

    PubMed

    Laopichienpong, Nararat; Muangmai, Narongrit; Chanhome, Lawan; Suntrarachun, Sunutcha; Twilprawat, Panupon; Peyachoknagul, Surin; Srikulnath, Kornsorn

    2017-03-01

    Snakes exhibit genotypic sex determination with female heterogamety (ZZ males and ZW females), and the state of sex chromosome differentiation also varies among lineages. To investigate the evolutionary history of homologous genes located in the nonrecombining region of differentiated sex chromosomes in snakes, partial sequences of the gametologous CTNNB1 gene were analyzed for 12 species belonging to henophid (Cylindrophiidae, Xenopeltidae, and Pythonidae) and caenophid snakes (Viperidae, Elapidae, and Colubridae). Nonsynonymous/synonymous substitution ratios (Ka/Ks) in coding sequences were low (Ka/Ks < 1) between CTNNB1Z and CTNNB1W, suggesting that these 2 genes may have similar functional properties. However, frequencies of intron sequence substitutions and insertion–deletions were higher in CTNNB1Z than CTNNB1W, suggesting that Z-linked sequences evolved faster than W-linked sequences. Molecular phylogeny based on both intron and exon sequences showed the presence of 2 major clades: 1) Z-linked sequences of Caenophidia and 2) W-linked sequences of Caenophidia clustered with Z-linked sequences of Henophidia, which suggests that the sequence divergence between CTNNB1Z and CTNNB1W in Caenophidia may have occurred by the cessation of recombination after the split from Henophidia.

  20. Genomic sequencing of Pleistocene cave bears

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less

  1. Reconsideration of systematic relationships within the order Euplotida (Protista, Ciliophora) using new sequences of the gene coding for small-subunit rRNA and testing the use of combined data sets to construct phylogenies of the Diophrys-complex.

    PubMed

    Yi, Zhenzhen; Song, Weibo; Clamp, John C; Chen, Zigui; Gao, Shan; Zhang, Qianqian

    2009-03-01

    Comprehensive molecular analyses of phylogenetic relationships within euplotid ciliates are relatively rare, and the relationships among some families remain questionable. We performed phylogenetic analyses of the order Euplotida based on new sequences of the gene coding for small-subunit RNA (SSrRNA) from a variety of taxa across the entire order as well as sequences from some of these taxa of other genes (ITS1-5.8S-ITS2 region and histone H4) that have not been included in previous analyses. Phylogenetic trees based on SSrRNA gene sequences constructed with four different methods had a consistent branching pattern that included the following features: (1) the "typical" euplotids comprised a paraphyletic assemblage composed of two divergent clades (family Uronychiidae and families Euplotidae-Certesiidae-Aspidiscidae-Gastrocirrhidae), (2) in the family Uronychiidae, the genera Uronychia and Paradiophrys formed a clearly outlined, well-supported clade that seemed to be rather divergent from Diophrys and Diophryopsis, suggesting that the Diophrys-complex may have had a longer and more separate evolutionary history than previously supposed, (3) inclusion of 12 new SSrRNA sequences in analyses of Euplotidae revealed two new clades of species within the family and cast additional doubt on the present classification of genera within the family, and (4) the intraspecific divergence among five species of Aspidisca was far greater than those of closely related genera. The ITS1-5.8S-ITS2 coding regions and partial histone H4 genes of six morphospecies in the Diophrys-complex were sequenced along with their SSrRNA genes and used to compare phylogenies constructed from single data sets to those constructed from combined sets. Results indicated that combined analyses could be used to construct more reliable, less ambiguous phylogenies of complex groups like the order Euplotida, because they provide a greater amount and diversity of information.

  2. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    PubMed

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  3. Novel antigenic shift in HA sequences of H1N1 viruses detected by big data analysis.

    PubMed

    Zhang, Ruiying; Xu, Chongfeng; Duan, Ziyuan

    2017-07-01

    The influenza virus H1N1 has been prevalent all over the world for nearly a century. Many studies on its evolutionary history, substitution rate and antigenicity-associated sites have been done with small datasets. To have a complete view, we analysed 3171 full-length HA sequences from human H1N1 viruses sampled from 1918 to 2016, and discovered a new clade has formed with sequences isolated in Iran. Based on genetic distance calculations, we revealed an uneven evolutionary rate among sequences isolated in different years. We also found that the HA1 fragment of the new clade is like that of viruses that existed in the 1930s, while the HA2 fragment is closely associated with strains isolated after the 2009 pandemic. This new, "mixed" HA sequence indicates a cryptic antigenic shift event occurred, and it should draw more attention to the new clade identified from sequences from Iran. Copyright © 2017. Published by Elsevier B.V.

  4. Comparative modeling without implicit sequence alignments.

    PubMed

    Kolinski, Andrzej; Gront, Dominik

    2007-10-01

    The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.

  5. Evolutionary relationships of flying foxes (genus Pteropus) in the Philippines inferred from DNA sequences of cytochrome b gene.

    PubMed

    Bastian, S T; Tanaka, K; Anunciado, R V P; Natural, N G; Sumalde, A C; Namikawa, T

    2002-04-01

    Six flying fox species, genus Pteropus (four from the Philippines) were investigated using complete cytochrome b gene sequences (1140 bp) to infer their evolutionary relationships. The DNA sequences generated via polymerase chain reaction were analyzed using the neighbor-joining, parsimony, and maximum likelihood methods. We estimated that the first evolutionary event among these Pteropus species occurred approximately 13.90 +/- 1.49 MYA. Within this short period of evolutionary time we further hypothesized that the ancestors of the flying foxes found in the Philippines experienced a subsequent diversification forming two clusters in the topology. The first cluster is composed of P. pumilus (Philippine endemic), P. speciosus (restricted in western Mindanao) with P. scapulatus, while the second one comprised P. vampyrus and P. dasymallus species based on the analysis from first and second codon positions. Consistently, all phylogenetic analyses divulged close association of P. dasymallus with P. vampyrus contradicting the previous report categorizing P. dasymallus under subniger species group with P. pumilus. P. speciosus, and P. hypomelanus. The Philippine endemic species (P. pumilus) is closely linked with P. speciosus. The representative samples of P. vampyrus showed a large genetic distance of 1.87%. The large genetic distance between P. dasymallus and P. hypomelanus, P. pumilus and P. speciosus denotes a distinct species group.

  6. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    PubMed Central

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  7. Evolution of Nucleotide Punctuation Marks: From Structural to Linear Signals.

    PubMed

    El Houmami, Nawal; Seligmann, Hervé

    2017-01-01

    We present an evolutionary hypothesis assuming that signals marking nucleotide synthesis (DNA replication and RNA transcription) evolved from multi- to unidimensional structures, and were carried over from transcription to translation. This evolutionary scenario presumes that signals combining secondary and primary nucleotide structures are evolutionary transitions. Mitochondrial replication initiation fits this scenario. Some observations reported in the literature corroborate that several signals for nucleotide synthesis function in translation, and vice versa. (a) Polymerase-induced frameshift mutations occur preferentially at translational termination signals (nucleotide deletion is interpreted as termination of nucleotide polymerization, paralleling the role of stop codons in translation). (b) Stem-loop hairpin presence/absence modulates codon-amino acid assignments, showing that translational signals sometimes combine primary and secondary nucleotide structures (here codon and stem-loop). (c) Homopolymer nucleotide triplets (AAA, CCC, GGG, TTT) cause transcriptional and ribosomal frameshifts. Here we find in recently described human mitochondrial RNAs that systematically lack mono-, dinucleotides after each trinucleotide (delRNAs) that delRNA triplets include 2x more homopolymers than mitogenome regions not covered by delRNA. Further analyses of delRNAs show that the natural circular code X (a little-known group of 20 translational signals enabling ribosomal frame retrieval consisting of 20 codons {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} universally overrepresented in coding versus other frames of gene sequences), regulates frameshift in transcription and translation. This dual transcription and translation role confirms for X the hypothesis that translational signals were carried over from transcriptional signals.

  8. Clustering Genes of Common Evolutionary History

    PubMed Central

    Gori, Kevin; Suchan, Tomasz; Alvarez, Nadir; Goldman, Nick; Dessimoz, Christophe

    2016-01-01

    Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent—due to events such as incomplete lineage sorting or horizontal gene transfer—it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such “process-agnostic” approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward’s method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta. We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl). PMID:26893301

  9. Tectonic collision and uplift of Wallacea triggered the global songbird radiation

    NASA Astrophysics Data System (ADS)

    Moyle, Robert G.; Oliveros, Carl H.; Andersen, Michael J.; Hosner, Peter A.; Benz, Brett W.; Manthey, Joseph D.; Travers, Scott L.; Brown, Rafe M.; Faircloth, Brant C.

    2016-08-01

    Songbirds (oscine passerines) are the most species-rich and cosmopolitan bird group, comprising almost half of global avian diversity. Songbirds originated in Australia, but the evolutionary trajectory from a single species in an isolated continent to worldwide proliferation is poorly understood. Here, we combine the first comprehensive genome-scale DNA sequence data set for songbirds, fossil-based time calibrations, and geologically informed biogeographic reconstructions to provide a well-supported evolutionary hypothesis for the group. We show that songbird diversification began in the Oligocene, but accelerated in the early Miocene, at approximately half the age of most previous estimates. This burst of diversification occurred coincident with extensive island formation in Wallacea, which provided the first dispersal corridor out of Australia, and resulted in independent waves of songbird expansion through Asia to the rest of the globe. Our results reconcile songbird evolution with Earth history and link a major radiation of terrestrial biodiversity to early diversification within an isolated Australian continent.

  10. Decontaminate feature for tracking: adaptive tracking via evolutionary feature subset

    NASA Astrophysics Data System (ADS)

    Liu, Qiaoyuan; Wang, Yuru; Yin, Minghao; Ren, Jinchang; Li, Ruizhi

    2017-11-01

    Although various visual tracking algorithms have been proposed in the last 2-3 decades, it remains a challenging problem for effective tracking with fast motion, deformation, occlusion, etc. Under complex tracking conditions, most tracking models are not discriminative and adaptive enough. When the combined feature vectors are inputted to the visual models, this may lead to redundancy causing low efficiency and ambiguity causing poor performance. An effective tracking algorithm is proposed to decontaminate features for each video sequence adaptively, where the visual modeling is treated as an optimization problem from the perspective of evolution. Every feature vector is compared to a biological individual and then decontaminated via classical evolutionary algorithms. With the optimized subsets of features, the "curse of dimensionality" has been avoided while the accuracy of the visual model has been improved. The proposed algorithm has been tested on several publicly available datasets with various tracking challenges and benchmarked with a number of state-of-the-art approaches. The comprehensive experiments have demonstrated the efficacy of the proposed methodology.

  11. Tectonic collision and uplift of Wallacea triggered the global songbird radiation.

    PubMed

    Moyle, Robert G; Oliveros, Carl H; Andersen, Michael J; Hosner, Peter A; Benz, Brett W; Manthey, Joseph D; Travers, Scott L; Brown, Rafe M; Faircloth, Brant C

    2016-08-30

    Songbirds (oscine passerines) are the most species-rich and cosmopolitan bird group, comprising almost half of global avian diversity. Songbirds originated in Australia, but the evolutionary trajectory from a single species in an isolated continent to worldwide proliferation is poorly understood. Here, we combine the first comprehensive genome-scale DNA sequence data set for songbirds, fossil-based time calibrations, and geologically informed biogeographic reconstructions to provide a well-supported evolutionary hypothesis for the group. We show that songbird diversification began in the Oligocene, but accelerated in the early Miocene, at approximately half the age of most previous estimates. This burst of diversification occurred coincident with extensive island formation in Wallacea, which provided the first dispersal corridor out of Australia, and resulted in independent waves of songbird expansion through Asia to the rest of the globe. Our results reconcile songbird evolution with Earth history and link a major radiation of terrestrial biodiversity to early diversification within an isolated Australian continent.

  12. Evolution and Vaccination of Influenza Virus.

    PubMed

    Lam, Ham Ching; Bi, Xuan; Sreevatsan, Srinand; Boley, Daniel

    2017-08-01

    In this study, we present an application paradigm in which an unsupervised machine learning approach is applied to the high-dimensional influenza genetic sequences to investigate whether vaccine is a driving force to the evolution of influenza virus. We first used a visualization approach to visualize the evolutionary paths of vaccine-controlled and non-vaccine-controlled influenza viruses in a low-dimensional space. We then quantified the evolutionary differences between their evolutionary trajectories through the use of within- and between-scatter matrices computation to provide the statistical confidence to support the visualization results. We used the influenza surface Hemagglutinin (HA) gene for this study as the HA gene is the major target of the immune system. The visualization is achieved without using any clustering methods or prior information about the influenza sequences. Our results clearly showed that the evolutionary trajectories between vaccine-controlled and non-vaccine-controlled influenza viruses are different and vaccine as an evolution driving force cannot be completely eliminated.

  13. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures.

    PubMed

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/

  14. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures

    PubMed Central

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/ PMID:18971256

  15. Evolutionary Dynamics and Diversity in Microbial Populations

    NASA Astrophysics Data System (ADS)

    Thompson, Joel; Fisher, Daniel

    2013-03-01

    Diseases such as flu and cancer adapt at an astonishing rate. In large part, viruses and cancers are so difficult to prevent because they are continually evolving. Controlling such ``evolutionary diseases'' requires a better understanding of the underlying evolutionary dynamics. It is conventionally assumed that adaptive mutations are rare and therefore will occur and sweep through the population in succession. Recent experiments using modern sequencing technologies have illuminated the many ways in which real population sequence data does not conform to the predictions of conventional theory. We consider a very simple model of asexual evolution and perform simulations in a range of parameters thought to be relevant for microbes and cancer. Simulation results reveal complex evolutionary dynamics typified by competition between lineages with different sets of adaptive mutations. This dynamical process leads to a distribution of mutant gene frequencies different than expected under the conventional assumption that adaptive mutations are rare. Simulated gene frequencies share several conspicuous features with data collected from laboratory-evolved yeast and the worldwide population of influenza.

  16. Microsatellite loci discovery from next-generation sequencing data and loci characterization in the epizoic barnacle Chelonibia testudinaria (Linnaeus, 1758)

    PubMed Central

    Zardus, John D.; Wares, John P.

    2016-01-01

    Microsatellite markers remain an important tool for ecological and evolutionary research, but are unavailable for many non-model organisms. One such organism with rare ecological and evolutionary features is the epizoic barnacle Chelonibia testudinaria (Linnaeus, 1758). Chelonibia testudinaria appears to be a host generalist, and has an unusual sexual system, androdioecy. Genetic studies on host specificity and mating behavior are impeded by the lack of fine-scale, highly variable markers, such as microsatellite markers. In the present study, we discovered thousands of new microsatellite loci from next-generation sequencing data, and characterized 12 loci thoroughly. We conclude that 11 of these loci will be useful markers in future ecological and evolutionary studies on C. testudinaria. PMID:27231653

  17. Reconstructing the colonisation and diversification history of the endemic freshwater crab (Seychellum alluaudi) in the granitic and volcanic Seychelles Archipelago.

    PubMed

    Daniels, Savel R

    2011-11-01

    The endemic, monotypic freshwater crab species Seychellum alluaudi was used as a template to examine the initial colonisation and evolutionary history among the major islands in the Seychelles Archipelago. Five of the "inner" islands in the Seychelles Archipelago including Mahé, Praslin, Silhouette, La Digue and Frégate were sampled. Two partial mtDNA fragments, 16S rRNA and cytochrome oxidase subunit I (COI) was sequenced for 83 specimens of S. alluaudi. Evolutionary relationships between populations were inferred from the combined mtDNA dataset using maximum parsimony, maximum likelihood and Bayesian inferences. Analyses of molecular variance (AMOVA) were used to examine genetic variation among and within clades. A haplotype network was constructed using TCS while BEAST was employed to date the colonisation and divergence of lineages on the islands. Phylogenetic analyses of the combined mtDNA data set of 1103 base pairs retrieved a monophyletic S. alluaudi group comprised three statistically well-supported monophyletic clades. Clade one was exclusive to Silhouette; clade two included samples from Praslin sister to La Digue, while clade three comprised samples from Mahé sister to Frégate. The haplotype network corresponded to the three clades. Within Mahé, substantial phylogeographic substructure was evident. AMOVA results revealed limited genetic variation within localities with most variation occurring among localities. Divergence time estimations predated the Holocene sea level regressions and indicated a Pliocene/Pleistocene divergence between the three clades evident within S. alluaudi. The monophyly of each clade suggests that transoceanic dispersal is rare. The absence of shared haplotypes between the three clades, coupled with marked sequence divergence values suggests the presence of three allospecies within S. alluaudi. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies.

    PubMed

    Galan, Maxime; Guivier, Emmanuel; Caraux, Gilles; Charbonnel, Nathalie; Cosson, Jean-François

    2010-05-11

    High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives for the study of evolutionary and functional genetics of highly polymorphic genes like major histocompatibility complex genes in vertebrates or loci regulating self-compatibility in plants. Important applications in biomedical research will include the detection of individual variation in disease susceptibility. Similarly, agronomy will benefit from this approach, through the study of genes implicated in productivity or disease susceptibility traits.

  19. Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

    PubMed Central

    Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2013-01-01

    Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856

  20. The protein-protein interface evolution acts in a similar way to antibody affinity maturation.

    PubMed

    Li, Bohua; Zhao, Lei; Wang, Chong; Guo, Huaizu; Wu, Lan; Zhang, Xunming; Qian, Weizhu; Wang, Hao; Guo, Yajun

    2010-02-05

    Understanding the evolutionary mechanism that acts at the interfaces of protein-protein complexes is a fundamental issue with high interest for delineating the macromolecular complexes and networks responsible for regulation and complexity in biological systems. To investigate whether the evolution of protein-protein interface acts in a similar way as antibody affinity maturation, we incorporated evolutionary information derived from antibody affinity maturation with common simulation techniques to evaluate prediction success rates of the computational method in affinity improvement in four different systems: antibody-receptor, antibody-peptide, receptor-membrane ligand, and receptor-soluble ligand. It was interesting to find that the same evolutionary information could improve the prediction success rates in all the four protein-protein complexes with an exceptional high accuracy (>57%). One of the most striking findings in our present study is that not only in the antibody-combining site but in other protein-protein interfaces almost all of the affinity-enhancing mutations are located at the germline hotspot sequences (RGYW or WA), indicating that DNA hot spot mechanisms may be widely used in the evolution of protein-protein interfaces. Our data suggest that the evolution of distinct protein-protein interfaces may use the same basic strategy under selection pressure to maintain interactions. Additionally, our data indicate that classical simulation techniques incorporating the evolutionary information derived from in vivo antibody affinity maturation can be utilized as a powerful tool to improve the binding affinity of protein-protein complex with a high accuracy.

  1. Plant polyadenylation factors: conservation and variety in the polyadenylation complex in plants.

    PubMed

    Hunt, Arthur G; Xing, Denghui; Li, Qingshun Q

    2012-11-20

    Polyadenylation, an essential step in eukaryotic gene expression, requires both cis-elements and a plethora of trans-acting polyadenylation factors. The polyadenylation factors are largely conserved across mammals and fungi. The conservation seems also extended to plants based on the analyses of Arabidopsis polyadenylation factors. To extend this observation, we systemically identified the orthologs of yeast and human polyadenylation factors from 10 plant species chosen based on both the availability of their genome sequences and their positions in the evolutionary tree, which render them representatives of different plant lineages. The evolutionary trajectories revealed several interesting features of plant polyadenylation factors. First, the number of genes encoding plant polyadenylation factors was clearly increased from "lower" to "higher" plants. Second, the gene expansion in higher plants was biased to some polyadenylation factors, particularly those involved in RNA binding. Finally, while there are clear commonalities, the differences in the polyadenylation apparatus were obvious across different species, suggesting an ongoing process of evolutionary change. These features lead to a model in which the plant polyadenylation complex consists of a conserved core, which is rather rigid in terms of evolutionary conservation, and a panoply of peripheral subunits, which are less conserved and associated with the core in various combinations, forming a collection of somewhat distinct complex assemblies. The multiple forms of plant polyadenylation complex, together with the diversified polyA signals may explain the intensive alternative polyadenylation (APA) and its regulatory role in biological functions of higher plants.

  2. Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques

    PubMed Central

    Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn

    2015-01-01

    We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674

  3. Atomic diffusion and mixing in old stars. V. A deeper look into the globular cluster NGC 6752

    NASA Astrophysics Data System (ADS)

    Gruyters, Pieter; Nordlander, Thomas; Korn, Andreas J.

    2014-07-01

    Context. Abundance trends in heavier elements with evolutionary phase have been shown to exist in the globular cluster NGC 6752 ([Fe / H] = -1.6). These trends are a result of atomic diffusion and additional (non-convective) mixing. Studying such trends can provide us with important constraints on the extent to which diffusion modifies the internal structure and surface abundances of solar-type, metal-poor stars. Aims: Taking advantage of a larger data sample, we investigate the reality and the size of these abundance trends and address questions and potential biases associated with the various stellar populations that make up NGC 6752. Methods: We perform an abundance analysis by combining photometric and spectroscopic data of 194 stars located between the turnoff point and the base of the red giant branch. Stellar parameters are derived from uvby Strömgren photometry. Using the quantitative-spectroscopy package SME, stellar surface abundances for light elements such as Li, Na, Mg, Al, and Si as well as heavier elements such as Ca, Ti, and Fe are derived in an automated way by fitting synthetic spectra to individual lines in the stellar spectra, obtained with the VLT/FLAMES-GIRAFFE spectrograph. Results: Based on uvby Strömgren photometry, we are able to separate three stellar populations in NGC 6752 along the evolutionary sequence from the base of the red giant branch down to the turnoff point. We find weak systematic abundance trends with evolutionary phase for Ca, Ti, and Fe which are best explained by stellar-structure models including atomic diffusion with efficient additional mixing. We derive a new value for the initial lithium abundance of NGC 6752 after correcting for the effect of atomic diffusion and additional mixing which falls slightly below the predicted standard BBN value. Conclusions: We find three stellar populations by combining photometric and spectroscopic data of 194 stars in the globular cluster NGC 6752. Abundance trends for groups of elements, differently affected by atomic diffusion and additional mixing, are identified. Although the statistical significance of the individual trends is weak, they all support the notion that atomic diffusion is operational along the evolutionary sequence of NGC 6752. Based on data collected at the ESO telescopes under programs 079.D-0645(A) and 081.D-0253(A).Full Tables 2 and 8 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/567/A72

  4. Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

    PubMed

    Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

    2018-04-24

    mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.

  5. Neutral forces acting on intragenomic variability shape the Escherichia coli regulatory network topology.

    PubMed

    Ruths, Troy; Nakhleh, Luay

    2013-05-07

    Cis-regulatory networks (CRNs) play a central role in cellular decision making. Like every other biological system, CRNs undergo evolution, which shapes their properties by a combination of adaptive and nonadaptive evolutionary forces. Teasing apart these forces is an important step toward functional analyses of the different components of CRNs, designing regulatory perturbation experiments, and constructing synthetic networks. Although tests of neutrality and selection based on molecular sequence data exist, no such tests are currently available based on CRNs. In this work, we present a unique genotype model of CRNs that is grounded in a genomic context and demonstrate its use in identifying portions of the CRN with properties explainable by neutral evolutionary forces at the system, subsystem, and operon levels. We leverage our model against experimentally derived data from Escherichia coli. The results of this analysis show statistically significant and substantial neutral trends in properties previously identified as adaptive in origin--degree distribution, clustering coefficient, and motifs--within the E. coli CRN. Our model captures the tightly coupled genome-interactome of an organism and enables analyses of how evolutionary events acting at the genome level, such as mutation, and at the population level, such as genetic drift, give rise to neutral patterns that we can quantify in CRNs.

  6. Evolutionary engineering of industrial microorganisms-strategies and applications.

    PubMed

    Zhu, Zhengming; Zhang, Juan; Ji, Xiaomei; Fang, Zhen; Wu, Zhimeng; Chen, Jian; Du, Guocheng

    2018-06-01

    Microbial cells have been widely used in the industry to obtain various biochemical products, and evolutionary engineering is a common method in biological research to improve their traits, such as high environmental tolerance and improvement of product yield. To obtain better integrate functions of microbial cells, evolutionary engineering combined with other biotechnologies have attracted more attention in recent years. Classical laboratory evolution has been proven effective to letting more beneficial mutations occur in different genes but also has some inherent limitations such as a long evolutionary period and uncontrolled mutation frequencies. However, recent studies showed that some new strategies may gradually overcome these limitations. In this review, we summarize the evolutionary strategies commonly used in industrial microorganisms and discuss the combination of evolutionary engineering with other biotechnologies such as systems biology and inverse metabolic engineering. Finally, we prospect the importance and application prospect of evolutionary engineering as a powerful tool especially in optimization of industrial microbial cell factories.

  7. Origins of Protein Functions in Cells

    NASA Technical Reports Server (NTRS)

    Seelig, Burchard; Pohorille, Andrzej

    2011-01-01

    In modern organisms proteins perform a majority of cellular functions, such as chemical catalysis, energy transduction and transport of material across cell walls. Although great strides have been made towards understanding protein evolution, a meaningful extrapolation from contemporary proteins to their earliest ancestors is virtually impossible. In an alternative approach, the origin of water-soluble proteins was probed through the synthesis and in vitro evolution of very large libraries of random amino acid sequences. In combination with computer modeling and simulations, these experiments allow us to address a number of fundamental questions about the origins of proteins. Can functionality emerge from random sequences of proteins? How did the initial repertoire of functional proteins diversify to facilitate new functions? Did this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Did protein evolution start from a pool of proteins defined by a frozen accident and other collections of proteins could start a different evolutionary pathway? Although we do not have definitive answers to these questions yet, important clues have been uncovered. In one example (Keefe and Szostak, 2001), novel ATP binding proteins were identified that appear to be unrelated in both sequence and structure to any known ATP binding proteins. One of these proteins was subsequently redesigned computationally to bind GTP through introducing several mutations that introduce targeted structural changes to the protein, improve its binding to guanine and prevent water from accessing the active center. This study facilitates further investigations of individual evolutionary steps that lead to a change of function in primordial proteins. In a second study (Seelig and Szostak, 2007), novel enzymes were generated that can join two pieces of RNA in a reaction for which no natural enzymes are known. Recently it was found that, as in the previous case, the proteins have a structure unknown among modern enzymes. In this case, in vitro evolution started from a small, non-enzymatic protein. A similar selection process initiated from a library of random polypeptides is in progress. These results not only allow for estimating the occurrence of function in random protein assemblies but also provide evidence for the possibility of alternative protein worlds. Extant proteins might simply represent a frozen accident in the world of possible proteins. Alternative collections of proteins, even with similar functions, could originate alternative evolutionary paths.

  8. The genomic landscape of rapid, repeated evolutionary rescue from toxic pollution in wild fish

    USDA-ARS?s Scientific Manuscript database

    Here we describe evolutionary rescue from intense pollution via multiple modes of selection in killifish populations from 4 urban estuaries of the US eastern seaboard. Comparative transcriptomics and analysis of 384 whole genome sequences show that the functioning of a receptor-based signaling pathw...

  9. Analysis of evolutionary patterns of genes in campylobacter jejuni and C. coli

    USDA-ARS?s Scientific Manuscript database

    Background: In order to investigate the population genetics structure of thermophilic Campylobacter spp., we extracted a set of 1029 core gene families (CGF) from 25 sequenced genomes of C. jejuni, C. coli and C. lari. Based on these CGFs we employed different approaches to reveal the evolutionary ...

  10. Evaluating hypotheses of basal animal phylogeny using complete sequences of large and small subunit rRNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Medina, Monica; Collins, Allen G.; Silberman, Jeffrey

    2001-06-21

    We studied the evolutionary relationships among basal metazoan lineages by using complete large subunit (LSU) and small subunit (SSU) ribosomal RNA sequences for 23 taxa. After identifying competing hypotheses, we performed maximum likelihood searches for trees conforming to each hypothesis. Kishino-Hasegawa tests were used to determine whether the data (LSU, SSU, and combined) reject any of the competing hypotheses. We also conducted unconstrained tree searches, compared the resulting topologies, and calculated bootstrap indices. Shimodaira-Hasegawa tests were applied to determine whether the data reject any of the topologies resulting from the constrained and unconstrained tree searches. LSU, SSU, and the combinedmore » data strongly contradict two assertions pertaining to sponge phylogeny. Hexactinellid sponges are not likely to be the basal lineage of amonophyletic Porifera or the sister group to all other animals. Instead, Hexactinellida and Demospongia form a well-supported clade of siliceous sponges, Silicea. It remains unclear, on the basis of these data alone, whether the calcarean sponges are more closely related to Silicea or to nonsponge animals. The SSU and combined data reject the hypothesis that Bilateria is more closely related to Ctenophora than it is to Cnidaria, whereas LSU data alone do not refute either hypothesis. LSU and SSU data agree in supporting the monophyly of Bilateria, Cnidaria, Ctenophora, and Metazoa. LSU sequence data reveal phylogenetic structure in a data set with limited taxon sampling. Continued accumulation of LSU sequences should increase our understanding of animal phylogeny.« less

  11. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    PubMed Central

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357

  12. Phylogeny of the Vitamin K 2,3-Epoxide Reductase (VKOR) Family and Evolutionary Relationship to the Disulfide Bond Formation Protein B (DsbB) Family

    PubMed Central

    Bevans, Carville G.; Krettler, Christoph; Reinhart, Christoph; Watzka, Matthias; Oldenburg, Johannes

    2015-01-01

    In humans and other vertebrate animals, vitamin K 2,3-epoxide reductase (VKOR) family enzymes are the gatekeepers between nutritionally acquired K vitamins and the vitamin K cycle responsible for posttranslational modifications that confer biological activity upon vitamin K-dependent proteins with crucial roles in hemostasis, bone development and homeostasis, hormonal carbohydrate regulation and fertility. We report a phylogenetic analysis of the VKOR family that identifies five major clades. Combined phylogenetic and site-specific conservation analyses point to clade-specific similarities and differences in structure and function. We discovered a single-site determinant uniquely identifying VKOR homologs belonging to human pathogenic, obligate intracellular prokaryotes and protists. Building on previous work by Sevier et al. (Protein Science 14:1630), we analyzed structural data from both VKOR and prokaryotic disulfide bond formation protein B (DsbB) families and hypothesize an ancient evolutionary relationship between the two families where one family arose from the other through a gene duplication/deletion event. This has resulted in circular permutation of primary sequence threading through the four-helical bundle protein folds of both families. This is the first report of circular permutation relating distant α-helical membrane protein sequences and folds. In conclusion, we suggest a chronology for the evolution of the five extant VKOR clades. PMID:26230708

  13. Rodent phylogeny revised: analysis of six nuclear genes from all major rodent clades

    PubMed Central

    Blanga-Kanfi, Shani; Miranda, Hector; Penn, Osnat; Pupko, Tal; DeBry, Ronald W; Huchon, Dorothée

    2009-01-01

    Background Rodentia is the most diverse order of placental mammals, with extant rodent species representing about half of all placental diversity. In spite of many morphological and molecular studies, the family-level relationships among rodents and the location of the rodent root are still debated. Although various datasets have already been analyzed to solve rodent phylogeny at the family level, these are difficult to combine because they involve different taxa and genes. Results We present here the largest protein-coding dataset used to study rodent relationships. It comprises six nuclear genes, 41 rodent species, and eight outgroups. Our phylogenetic reconstructions strongly support the division of Rodentia into three clades: (1) a "squirrel-related clade", (2) a "mouse-related clade", and (3) Ctenohystrica. Almost all evolutionary relationships within these clades are also highly supported. The primary remaining uncertainty is the position of the root. The application of various models and techniques aimed to remove non-phylogenetic signal was unable to solve the basal rodent trifurcation. Conclusion Sequencing and analyzing a large sequence dataset enabled us to resolve most of the evolutionary relationships among Rodentia. Our findings suggest that the uncertainty regarding the position of the rodent root reflects the rapid rodent radiation that occurred in the Paleocene rather than the presence of conflicting phylogenetic and non-phylogenetic signals in the dataset. PMID:19341461

  14. Toward a Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa

    PubMed Central

    Hettling, Hannes; Condamine, Fabien L.; Vos, Karin; Nilsson, R. Henrik; Sanderson, Michael J.; Sauquet, Hervé; Scharn, Ruud; Silvestro, Daniele; Töpel, Mats; Bacon, Christine D.; Oxelman, Bengt; Vos, Rutger A.

    2017-01-01

    Abstract Rapidly growing biological data—including molecular sequences and fossils—hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyze these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive data sets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a “Dated Tree of Life” where all node ages are directly comparable. PMID:27616324

  15. The effect of orthology and coregulation on detecting regulatory motifs.

    PubMed

    Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

    2010-02-03

    Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.

  16. The Effect of Orthology and Coregulation on Detecting Regulatory Motifs

    PubMed Central

    Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

    2010-01-01

    Background Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. Methodology We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Results and Conclusions Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE. PMID:20140085

  17. Phylogeny of the Vitamin K 2,3-Epoxide Reductase (VKOR) Family and Evolutionary Relationship to the Disulfide Bond Formation Protein B (DsbB) Family.

    PubMed

    Bevans, Carville G; Krettler, Christoph; Reinhart, Christoph; Watzka, Matthias; Oldenburg, Johannes

    2015-07-29

    In humans and other vertebrate animals, vitamin K 2,3-epoxide reductase (VKOR) family enzymes are the gatekeepers between nutritionally acquired K vitamins and the vitamin K cycle responsible for posttranslational modifications that confer biological activity upon vitamin K-dependent proteins with crucial roles in hemostasis, bone development and homeostasis, hormonal carbohydrate regulation and fertility. We report a phylogenetic analysis of the VKOR family that identifies five major clades. Combined phylogenetic and site-specific conservation analyses point to clade-specific similarities and differences in structure and function. We discovered a single-site determinant uniquely identifying VKOR homologs belonging to human pathogenic, obligate intracellular prokaryotes and protists. Building on previous work by Sevier et al. (Protein Science 14:1630), we analyzed structural data from both VKOR and prokaryotic disulfide bond formation protein B (DsbB) families and hypothesize an ancient evolutionary relationship between the two families where one family arose from the other through a gene duplication/deletion event. This has resulted in circular permutation of primary sequence threading through the four-helical bundle protein folds of both families. This is the first report of circular permutation relating distant a-helical membrane protein sequences and folds. In conclusion, we suggest a chronology for the evolution of the five extant VKOR clades.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ohshima, Kazusato, E-mail: ohshimak@cc.saga-u.ac.jp; The United Graduate School of Agricultural Sciences, Kagoshima University, Kagoshima; Matsumoto, Kosuke

    Cucumber mosaic virus (CMV) is a damaging pathogen of over 200 mono- and dicotyledonous crop species worldwide. It has the broadest known host range of any virus, but the timescale of its evolution is unknown. To investigate the evolutionary history of this virus, we obtained the genomic sequences of 40 CMV isolates from brassicas sampled in Iran, Turkey and Japan, and combined them with published sequences. Our synonymous ('silent') site analyses revealed that the present CMV population is the progeny of a single ancestor existing 1550–2600 years ago, but that the population mostly radiated 295–545 years ago. We found thatmore » the major CMV lineages are not phylogeographically confined, but that recombination and reassortment is restricted to local populations and that no reassortant lineage is more than 251 years old. Our results highlight the different evolutionary patterns seen among viral pathogens of brassica crops across the world. - Highlights: • Present-day CMV lineages had a most recent common ancestor 1550–2600 years ago. • The CMV population mostly radiated less than 295–545 years ago. • No reassortant found in the present populations is more than 251 years old. • The open-reading frames evolve at around 2.3–4.7×10{sup −4} substitutions/site/year. • Synonymous codons of CMV seem to have a more precise temporal signal than all codons.« less

  19. Datamonkey 2.0: a modern web application for characterizing selective and other evolutionary processes.

    PubMed

    Weaver, Steven; Shank, Stephen D; Spielman, Stephanie J; Li, Michael; Muse, Spencer V; Kosakovsky Pond, Sergei L

    2018-01-02

    Inference of how evolutionary forces have shaped extant genetic diversity is a cornerstone of modern comparative sequence analysis. Advances in sequence generation and increased statistical sophistication of relevant methods now allow researchers to extract ever more evolutionary signal from the data, albeit at an increased computational cost. Here, we announce the release of Datamonkey 2.0, a completely re-engineered version of the Datamonkey web-server for analyzing evolutionary signatures in sequence data. For this endeavor, we leveraged recent developments in open-source libraries that facilitate interactive, robust, and scalable web application development. Datamonkey 2.0 provides a carefully curated collection of methods for interrogating coding-sequence alignments for imprints of natural selection, packaged as a responsive (i.e. can be viewed on tablet and mobile devices), fully interactive, and API-enabled web application. To complement Datamonkey 2.0, we additionally release HyPhy Vision, an accompanying JavaScript application for visualizing analysis results. HyPhy Vision can also be used separately from Datamonkey 2.0 to visualize locally-executed HyPhy analyses. Together, Datamonkey 2.0 and HyPhy Vision showcase how scientific software development can benefit from general-purpose open-source frameworks. Datamonkey 2.0 is freely and publicly available at http://www.datamonkey. org, and the underlying codebase is available from https://github.com/veg/datamonkey-js. © The Author 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

    PubMed Central

    Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540

  1. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    PubMed

    Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.

  2. Evolution of X-ray activity of 1-3 Msun late-type stars in early post-main-sequence phases

    NASA Astrophysics Data System (ADS)

    Pizzolato, N.; Maggio, A.; Sciortino, S.

    2000-09-01

    We have investigated the variation of coronal X-ray emission during early post-main-sequence phases for a sample of 120 late-type stars within 100 pc, and with estimated masses in the range 1-3 Msun, based on Hipparcos parallaxes and recent evolutionary models. These stars were observed with the ROSAT/PSPC, and the data processed with the Palermo-CfA pipeline, including detection and evaluation of X-ray fluxes (or upper limits) by means of a wavelet transform algorithm. We have studied the evolutionary history of X-ray luminosity and surface flux for stars in selected mass ranges, including stars with inactive A-type progenitors on the main sequence and lower mass solar-type stars. Our stellar sample suggests a trend of increasing X-ray emission level with age for stars with masses M > 1.5 Msun, and a decline for lower-mass stars. A similar behavior holds for the average coronal temperature, which follows a power-law correlation with the X-ray luminosity, independently of their mass and evolutionary state. We have also studied the relationship between X-ray luminosity and surface rotation rate for stars in the same mass ranges, and how this relationships departs from the Lx ~ vrot2 law followed by main-sequence stars. Our results are interpreted in terms of a magnetic dynamo whose efficiency depends on the stellar evolutionary state through the mass-dependent changes of the stellar internal structure, including the properties of envelope convection and the internal rotation profile.

  3. Localization, structure and polymorphism of two paralogous Xenopus laevis mitochondrial malate dehydrogenase genes.

    PubMed

    Tlapakova, Tereza; Krylov, Vladimir; Macha, Jaroslav

    2005-01-01

    Two paralogous mitochondrial malate dehydrogenase 2 (Mdh2) genes of Xenopus laevis have been cloned and sequenced, revealing 95% identity. Fluorescence in-situ hybridization (FISH) combined with tyramide amplification discriminates both genes; Mdh2a was localized into chromosome q3 and Mdh2b into chromosome q8. One kb cDNA probes detect both genes with 85% accuracy. The remaining signals were on the paralogous counterpart. Introns interrupt coding sequences at the same nucleotide as defined for mouse. Restriction polymorphism has been detected in the first intron of Mdh2a, while the individual variability in intron 6 of Mdh2b gene is represented by an insertion of incomplete retrotransposon L1Xl. Rates of nucleotide substitutions indicate that both genes are under similar evolutionary constraints. X. laevis Mdh2 genes can be used as markers for physical mapping and linkage analysis.

  4. A probable prehistoric case of meningococcal disease from San Francisco Bay: Next generation sequencing of Neisseria meningitidis from dental calculus and osteological evidence.

    PubMed

    Eerkens, Jelmer W; Nichols, Ruth V; Murray, Gemma G R; Perez, Katherine; Murga, Engel; Kaijankoski, Phil; Rosenthal, Jeffrey S; Engbring, Laurel; Shapiro, Beth

    2018-05-25

    Next Generation Sequencing (NGS) of ancient dental calculus samples from a prehistoric site in San Francisco Bay, CA-SCL-919, reveals a wide range of potentially pathogenic bacteria. One older adult woman, in particular, had high levels of Neisseria meningitidis and low levels of Haemophilus influenzae, species that were not observed in the calculus from three other individuals. Combined with the presence of incipient endocranial lesions and pronounced meningeal grooves, we interpret this as an ancient case of meningococcal disease. This disease afflicts millions around the globe today, but little is known about its (pre)history. With additional sampling, we suggest NGS of calculus offers an exciting new window into the evolutionary history of these bacterial species and their interactions with humans. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. A combined chloroplast atpB-rbcL and trnL-F phylogeny unveils the ancestry of balsams (Impatiens spp.) in the Western Ghats of India.

    PubMed

    Shajitha, P P; Dhanesh, N R; Ebin, P J; Laly, Joseph; Aneesha, Devassy; Reshma, John; Augustine, Jomy; Linu, Mathew

    2016-12-01

    Only a few Impatiens spp. from South India (one of the five centers of diversity for Impatiens species) were included in the published datum of molecular phylogeny of the family Balsaminaceae. The present investigation is a novel attempt to reveal the phylogenetic association of Impatiens species of South India, by placing them in the global phylogeny of Impatiens based on a combined analysis of two chloroplast genes. Thirty species of genus Impatiens were collected from different locations of South India. Total genomic DNA was extracted from fresh plant leaf, and polymerase chain reaction was carried out using atpB-rbcL and trnL-F intergenic spacer-specific forward and reverse primers. Thirteen sequences of Impatiens species from three centers of diversity were obtained from GenBank for reconstructing the evolutionary relationships within the genus Impatiens. Bayesian inference analysis was carried out in MrBayes v.3.2.2. This analysis supported Southeast Asia as the ancestral place of origin of extant Impatiens species. Molecular phylogeny of South Indian Impatiens spp. based on combined chloroplast sequences showed the same association as that of morphological taxonomy. Sections Scapigerae, Tomentosae, Sub-Umbellatae, and Racemosae showed Southeast Asian relationship, while sections Annuae and Microsepalae showed African affinity.

  6. OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals.

    PubMed

    Douzery, Emmanuel J P; Scornavacca, Celine; Romiguier, Jonathan; Belkhir, Khalid; Galtier, Nicolas; Delsuc, Frédéric; Ranwez, Vincent

    2014-07-01

    Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Molecular phylogeny of the higher and lower taxonomy of the Fusarium genus and differences in the evolutionary histories of multiple genes

    PubMed Central

    2011-01-01

    Background Species of the Fusarium genus are important fungi which is associated with health hazards in human and animals. The taxonomy of this genus has been a subject of controversy for many years. Although many researchers have applied molecular phylogenetic analysis to examine the taxonomy of Fusarium species, their phylogenetic relationships remain unclear only few comprehensive phylogenetic analyses of the Fusarium genus and a lack of suitable nucleotides and amino acid substitution rates. A previous stugy with whole genome comparison among Fusairum species revealed the possibility that each gene in Fusarium genomes has a unique evolutionary history, and such gene may bring difficulty to the reconstruction of phylogenetic tree of Fusarium. There is a need not only to check substitution rates of genes but also to perform the exact evaluation of each gene-evolution. Results We performed phylogenetic analyses based on the nucleotide sequences of the rDNA cluster region (rDNA cluster), and the β-tubulin gene (β-tub), the elongation factor 1α gene (EF-1α), and the aminoadipate reductase gene (lys2). Although incongruence of the tree topologies between lys2 and the other genes was detected, all genes supported the classification of Fusarium species into 7 major clades, I to VII. To obtain a reliable phylogeny for Fusarium species, we excluded the lys2 sequences from our dataset, and re-constructed a maximum likelihood (ML) tree based on the combined data of the rDNA cluster, β-tub, and EF-1α. Our ML tree indicated some interesting relationships in the higher and lower taxa of Fusarium species and related genera. Moreover, we observed a novel evolutionary history of lys2. We suggest that the unique tree topologies of lys2 are not due to an analytical artefact, but due to differences in the evolutionary history of genomes caused by positive selection of particular lineages. Conclusion This study showed the reliable species tree of the higher and lower taxonomy in the lineage of the Fusarium genus. Our ML tree clearly indicated 7 major clades within the Fusarium genus. Furthermore, this study reported differences in the evolutionary histories among multiple genes within this genus for the first time. PMID:22047111

  8. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  9. rbcL gene sequences provide evidence for the evolutionary lineages of leptosporangiate ferns.

    PubMed

    Hasebe, M; Omori, T; Nakazawa, M; Sano, T; Kato, M; Iwatsuki, K

    1994-06-07

    Pteriodophytes have a longer evolutionary history than any other vascular land plant and, therefore, have endured greater loss of phylogenetically informative information. This factor has resulted in substantial disagreements in evaluating characters and, thus, controversy in establishing a stable classification. To compare competing classifications, we obtained DNA sequences of a chloroplast gene. The sequence of 1206 nt of the large subunit of the ribulose-bisphosphate carboxylase gene (rbcL) was determined from 58 species, representing almost all families of leptosporangiate ferns. Phlogenetic trees were inferred by the neighbor-joining and the parsimony methods. The two methods produced almost identical phylogenetic trees that provided insights concerning major general evolutionary trends in the leptosporangiate ferns. Interesting findings were as follows: (i) two morphologically distinct heterosporous water ferns, Marsilea and Salvinia, are sister genera; (ii) the tree ferns (Cyatheaceae, Dicksoniaceae, and Metaxyaceae) are monophyletic; and (iii) polypodioids are distantly related to the gleichenioids in spite of the similarity of their exindusiate soral morphology and are close to the higher indusiate ferns. In addition, the affinities of several "problematic genera" were assessed.

  10. Evolutionary origin and phylogeny of the modern holocephalans (Chondrichthyes: Chimaeriformes): a mitogenomic perspective.

    PubMed

    Inoue, Jun G; Miya, Masaki; Lam, Kevin; Tay, Boon-Hui; Danks, Janine A; Bell, Justin; Walker, Terrence I; Venkatesh, Byrappa

    2010-11-01

    With our increasing ability for generating whole-genome sequences, comparative analysis of whole genomes has become a powerful tool for understanding the structure, function, and evolutionary history of human and other vertebrate genomes. By virtue of their position basal to bony vertebrates, cartilaginous fishes (class Chondrichthyes) are a valuable outgroup in comparative studies of vertebrates. Recently, a holocephalan cartilaginous fish, the elephant shark, Callorhinchus milii (Subclass Holocephali: Order Chimaeriformes), has been proposed as a model genome, and low-coverage sequence of its genome has been generated. Despite such an increasing interest, the evolutionary history of the modern holocephalans-a previously successful and diverse group but represented by only 39 extant species-and their relationship with elasmobranchs and other jawed vertebrates has been poorly documented largely owing to a lack of well-preserved fossil materials after the end-Permian about 250 Ma. In this study, we assembled the whole mitogenome sequences for eight representatives from all the three families of the modern holocephalans and investigated their phylogenetic relationships and evolutionary history. Unambiguously aligned sequences from these holocephalans together with 17 other vertebrates (9,409 nt positions excluding entire third codon positions) were subjected to partitioned maximum likelihood analysis. The resulting tree strongly supported a single origin of the modern holocephalans and their sister-group relationship with elasmobranchs. The mitogenomic tree recovered the most basal callorhinchids within the chimaeriforms, which is sister to a clade comprising the remaining two families (rhinochimaerids and chimaerids). The timetree derived from a relaxed molecular clock Bayesian method suggests that the holocephalans originated in the Silurian about 420 Ma, having survived from the end-Permian (250 Ma) mass extinction and undergoing familial diversifications during the late Jurassic to early Cretaceous (170-120 Ma). This postulated evolutionary scenario agrees well with that based on the paleontological observations.

  11. Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.

    PubMed

    Carretero-Paulet, Lorenzo; Albert, Victor A

    2016-01-01

    The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.

  12. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

    PubMed

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R

    2009-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.

  13. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

    PubMed Central

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

    2010-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534

  14. Origin and Reticulate Evolutionary Process of Wheatgrass Elymus trachycaulus (Triticeae: Poaceae)

    PubMed Central

    Zuo, Hongwei; Wu, Panpan; Wu, Dexiang; Sun, Genlou

    2015-01-01

    To study origin and evolutionary dynamics of tetraploid Elymus trachycaulus that has been cytologically defined as containing StH genomes, thirteen accessions of E. trachycaulus were analyzed using two low-copy nuclear gene Pepc (phosphoenolpyruvate carboxylase) and Rpb2 (the second largest subunit of RNA polymerase II), and one chloroplast region trnL–trnF (spacer between the tRNA Leu (UAA) gene and the tRNA-Phe (GAA) gene). Our chloroplast data indicated that Pseudoroegneria (St genome) was the maternal donor of E. trachycaulus. Rpb2 data indicated that the St genome in E. trachycaulus was originated from either P. strigosa, P. stipifolia, P. spicata or P. geniculate. The Hordeum (H genome)-like sequences of E. trachycaulus are polyphyletic in the Pepc tree, suggesting that the H genome in E. trachycaulus was contributed by multiple sources, whether due to multiple origins or introgression resulting from subsequent hybridization. Failure to recovering St copy of Pepc sequence in most accessions of E. trachycaulus might be caused by genome convergent evolution in allopolyploids. Multiple copies of H-like Pepc sequence from each accession with relative large deletions and insertions might be caused by either instability of Pepc sequence in H- genome or incomplete concerted evolution. Our results highlighted complex evolutionary history of E. trachycaulus. PMID:25946188

  15. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

    PubMed

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

    2016-07-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

  16. Beyond Linear Sequence Comparisons: The use of genome-levelcharacters for phylogenetic reconstruction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.

    2004-11-27

    Although the phylogenetic relationships of many organisms have been convincingly resolved by the comparisons of nucleotide or amino acid sequences, others have remained equivocal despite great effort. Now that large-scale genome sequencing projects are sampling many lineages, it is becoming feasible to compare large data sets of genome-level features and to develop this as a tool for phylogenetic reconstruction that has advantages over conventional sequence comparisons. Although it is unlikely that these will address a large number of evolutionary branch points across the broad tree of life due to the infeasibility of such sampling, they have great potential for convincinglymore » resolving many critical, contested relationships for which no other data seems promising. However, it is important that we recognize potential pitfalls, establish reasonable standards for acceptance, and employ rigorous methodology to guard against a return to earlier days of scenario-driven evolutionary reconstructions.« less

  17. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning

    PubMed Central

    Sutphin, George L.; Mahoney, J. Matthew; Sheppard, Keith; Walton, David O.; Korstanje, Ron

    2016-01-01

    The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/. PMID:27812085

  18. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning.

    PubMed

    Sutphin, George L; Mahoney, J Matthew; Sheppard, Keith; Walton, David O; Korstanje, Ron

    2016-11-01

    The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.

  19. Evolutionary history of tall fescue morphotypes inferred from molecular phylogenetics of the Lolium-Festuca species complex

    PubMed Central

    2010-01-01

    Background The agriculturally important pasture grass tall fescue (Festuca arundinacea Schreb. syn. Lolium arundinaceum (Schreb.) Darbysh.) is an outbreeding allohexaploid, that may be more accurately described as a species complex consisting of three major (Continental, Mediterranean and rhizomatous) morphotypes. Observation of hybrid infertility in some crossing combinations between morphotypes suggests the possibility of independent origins from different diploid progenitors. This study aims to clarify the evolutionary relationships between each tall fescue morphotype through phylogenetic analysis using two low-copy nuclear genes (encoding plastid acetyl-CoA carboxylase [Acc1] and centroradialis [CEN]), the nuclear ribosomal DNA internal transcribed spacer (rDNA ITS) and the chloroplast DNA (cpDNA) genome-located matK gene. Other taxa within the closely related Lolium-Festuca species complex were also included in the study, to increase understanding of evolutionary processes in a taxonomic group characterised by multiple inter-specific hybridisation events. Results Putative homoeologous sequences from both nuclear genes were obtained from each polyploid species and compared to counterparts from 15 diploid taxa. Phylogenetic reconstruction confirmed F. pratensis and F. arundinacea var. glaucescens as probable progenitors to Continental tall fescue, and these species are also likely to be ancestral to the rhizomatous morphotype. However, these two morphotypes are sufficiently distinct to be located in separate clades based on the ITS-derived data set. All four of the generated data sets suggest independent evolution of the Mediterranean and Continental morphotypes, with minimal affinity between cognate sequence haplotypes. No obvious candidate progenitor species for Mediterranean tall fescues were identified, and only two putative sub-genome-specific haplotypes were identified for this morphotype. Conclusions This study describes the first phylogenetic analysis of the Festuca genus to include representatives of each tall fescue morphotype, and to use low copy nuclear gene-derived sequences to identify putative progenitors of the polyploid species. The demonstration of distinct tall fescue lineages has implications for both taxonomy and molecular breeding strategies, and may facilitate the generation of morphotype and/or sub-genome-specific molecular markers. PMID:20937141

  20. The evolutionary history of holometabolous insects inferred from transcriptome-based phylogeny and comprehensive morphological data.

    PubMed

    Peters, Ralph S; Meusemann, Karen; Petersen, Malte; Mayer, Christoph; Wilbrandt, Jeanne; Ziesmann, Tanja; Donath, Alexander; Kjer, Karl M; Aspöck, Ulrike; Aspöck, Horst; Aberer, Andre; Stamatakis, Alexandros; Friedrich, Frank; Hünefeld, Frank; Niehuis, Oliver; Beutel, Rolf G; Misof, Bernhard

    2014-03-20

    Despite considerable progress in systematics, a comprehensive scenario of the evolution of phenotypic characters in the mega-diverse Holometabola based on a solid phylogenetic hypothesis was still missing. We addressed this issue by de novo sequencing transcriptome libraries of representatives of all orders of holometabolan insects (13 species in total) and by using a previously published extensive morphological dataset. We tested competing phylogenetic hypotheses by analyzing various specifically designed sets of amino acid sequence data, using maximum likelihood (ML) based tree inference and Four-cluster Likelihood Mapping (FcLM). By maximum parsimony-based mapping of the morphological data on the phylogenetic relationships we traced evolutionary transformations at the phenotypic level and reconstructed the groundplan of Holometabola and of selected subgroups. In our analysis of the amino acid sequence data of 1,343 single-copy orthologous genes, Hymenoptera are placed as sister group to all remaining holometabolan orders, i.e., to a clade Aparaglossata, comprising two monophyletic subunits Mecopterida (Amphiesmenoptera + Antliophora) and Neuropteroidea (Neuropterida + Coleopterida). The monophyly of Coleopterida (Coleoptera and Strepsiptera) remains ambiguous in the analyses of the transcriptome data, but appears likely based on the morphological data. Highly supported relationships within Neuropterida and Antliophora are Raphidioptera + (Neuroptera + monophyletic Megaloptera), and Diptera + (Siphonaptera + Mecoptera). ML tree inference and FcLM yielded largely congruent results. However, FcLM, which was applied here for the first time to large phylogenomic supermatrices, displayed additional signal in the datasets that was not identified in the ML trees. Our phylogenetic results imply that an orthognathous larva belongs to the groundplan of Holometabola, with compound eyes and well-developed thoracic legs, externally feeding on plants or fungi. Ancestral larvae of Aparaglossata were prognathous, equipped with single larval eyes (stemmata), and possibly agile and predacious. Ancestral holometabolan adults likely resembled in their morphology the groundplan of adult neopteran insects. Within Aparaglossata, the adult's flight apparatus and ovipositor underwent strong modifications. We show that the combination of well-resolved phylogenies obtained by phylogenomic analyses and well-documented extensive morphological datasets is an appropriate basis for reconstructing complex morphological transformations and for the inference of evolutionary histories.

  1. Reproduction, symbiosis, and the eukaryotic cell

    PubMed Central

    Godfrey-Smith, Peter

    2015-01-01

    This paper develops a conceptual framework for addressing questions about reproduction, individuality, and the units of selection in symbiotic associations, with special attention to the origin of the eukaryotic cell. Three kinds of reproduction are distinguished, and a possible evolutionary sequence giving rise to a mitochondrion-containing eukaryotic cell from an endosymbiotic partnership is analyzed as a series of transitions between each of the three forms of reproduction. The sequence of changes seen in this “egalitarian” evolutionary transition is compared with those that apply in “fraternal” transitions, such as the evolution of multicellularity in animals. PMID:26286983

  2. Bioinformatics analysis and genetic diversity of the poliovirus.

    PubMed

    Liu, Yanhan; Ma, Tengfei; Liu, Jianzhu; Zhao, Xiaona; Cheng, Ziqiang; Guo, Huijun; Wang, Shujing; Xu, Ruixue

    2014-12-01

    Poliomyelitis, a disease which can manifest as muscle paralysis, is caused by the poliovirus, which is a human enterovirus and member of the family Picornaviridae that usually transmits by the faecal-oral route. The viruses of the OPV (oral poliovirus attenuated-live vaccine) strains can mutate in the human intestine during replication and some of these mutations can lead to the recovery of serious neurovirulence. Informatics research of the poliovirus genome can be used to explain further the characteristics of this virus. In this study, sequences from 100 poliovirus isolates were acquired from GenBank. To determine the evolutionary relationship between the strains, we compared and analysed the sequences of the complete poliovirus genome and the VP1 region. The reconstructed phylogenetic trees for the complete sequences and the VP1 sequences were both divided into two branches, indicating that the genetic relationships of the whole poliovirus genome and the VP1 sequences are very similar. This branching indicates that the virulence and pathogenicity of poliomyelitis may be associated with the VP1 region. Sequence alignment of the VP1 region revealed numerous mutation sites in which mutation rates of >30 % were detected. In a group of strains recorded in the USA, mutation sites and mutation types were the same and this may be associated with their distribution in the evolutionary tree and their genetic relationship. In conclusion, the genetic evolutionary relationships of poliovirus isolate sequences are determined to a great extent by the VP1 protein, and poliovirus strains located on the same branch of the phylogenetic tree contain the same mutation spots and mutation types. Hence, the genetic characteristics of the VP1 region in the poliovirus genome should be analysed to identify the transmission route of poliovirus and provide the basis of viral immunity development. © 2014 The Authors.

  3. Genomic Diversity and Evolution of the Lyssaviruses

    PubMed Central

    Delmas, Olivier; Holmes, Edward C.; Talbi, Chiraz; Larrous, Florence; Dacheux, Laurent; Bouchier, Christiane; Bourhy, Hervé

    2008-01-01

    Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as ‘Lagos Bat’. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses. PMID:18446239

  4. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    PubMed Central

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

  5. VCFtoTree: a user-friendly tool to construct locus-specific alignments and phylogenies from thousands of anthropologically relevant genome sequences.

    PubMed

    Xu, Duo; Jaber, Yousef; Pavlidis, Pavlos; Gokcumen, Omer

    2017-09-26

    Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .

  6. SALAD database: a motif-based database of protein annotations for plant comparative genomics.

    PubMed

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.

  7. Agency, Values, and Well-Being: A Human Development Model

    ERIC Educational Resources Information Center

    Welzel, Christian; Inglehart, Ronald

    2010-01-01

    This paper argues that feelings of agency are linked to human well-being through a sequence of adaptive mechanisms that promote human development, once existential conditions become permissive. In the first part, we elaborate on the evolutionary logic of this model and outline why an evolutionary perspective is helpful to understand changes in…

  8. Kakusan4 and Aminosan: two programs for comparing nonpartitioned, proportional and separate models for combined molecular phylogenetic analyses of multilocus sequence data.

    PubMed

    Tanabe, Akifumi S

    2011-09-01

    Proportional and separate models able to apply different combination of substitution rate matrix (SRM) and among-site rate variation model (ASRVM) to each locus are frequently used in phylogenetic studies of multilocus data. A proportional model assumes that branch lengths are proportional among partitions and a separate model assumes that each partition has an independent set of branch lengths. However, the selection from among nonpartitioned (i.e., a common combination of models is applied to all-loci concatenated sequences), proportional and separate models is usually based on the researcher's preference rather than on any information criteria. This study describes two programs, 'Kakusan4' (for DNA sequences) and 'Aminosan' (for amino-acid sequences), which allow the selection of evolutionary models based on several types of information criteria. The programs can handle both multilocus and single-locus data, in addition to providing an easy-to-use wizard interface and a noninteractive command line interface. In the case of multilocus data, SRMs and ASRVMs are compared at each locus and at all-loci concatenated sequences, after which nonpartitioned, proportional and separate models are compared based on information criteria. The programs also provide model configuration files for mrbayes, paup*, phyml, raxml and Treefinder to support further phylogenetic analysis using a selected model. When likelihoods are optimized by Treefinder, the best-fit models were found to differ depending on the data set. Furthermore, differences in the information criteria among nonpartitioned, proportional and separate models were much larger than those among the nonpartitioned models. These findings suggest that selecting from nonpartitioned, proportional and separate models results in a better phylogenetic tree. Kakusan4 and Aminosan are available at http://www.fifthdimension.jp/. They are licensed under gnugpl Ver.2, and are able to run on Windows, MacOS X and Linux. © 2011 Blackwell Publishing Ltd.

  9. Cocoa/Cotton Comparative Genomics

    USDA-ARS?s Scientific Manuscript database

    With genome sequence from two members of the Malvaceae family recently made available, we are exploring syntenic relationships, gene content, and evolutionary trajectories between the cacao and cotton genomes. An assembly of cacao (Theobroma cacao) using Illumina and 454 sequence technology yielded ...

  10. Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.

    PubMed

    Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric

    2016-08-30

    Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.

  11. An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure.

    PubMed

    Waldispühl, Jérôme; Ponty, Yann

    2011-11-01

    The analysis of the relationship between sequences and structures (i.e., how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases, and to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann-weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has difficulties to sample mutant sequences with low G+C-contents. In this article, we introduce an unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents significantly reduce the size of the evolutionary accessible mutational landscapes.

  12. A tale of swinger insects: Signatures of past sexuality between divergent lineages of a parthenogenetic weevil revealed by ribosomal intraindividual variation.

    PubMed

    Rodriguero, Marcela S; Wirth, Sonia A; Alberghina, Josefina S; Lanteri, Analía A; Confalonieri, Viviana A

    2018-01-01

    Naupactus cervinus (Boheman) (Curculionidae, Naupactini) is a parthenogenetic weevil native to the Paranaense Forest which displays high levels of genetic variation. Two divergent clades were identified, one ranging in forest areas (Forest clade), and the other in open vegetation areas (Grassland clade). Both of them have individuals with high levels of heterozygosity in ribosomal sequences. Investigation of intraindividual variation in ITS1 sequences through cloning and posterior sequencing suggested that mating between both groups most likely occurred in the Paranaense Forest after a secondary contact, which led to fixed heterozygotes as a consequence of parthenogenesis. Otherwise, sexual segregation would have disrupted multilocus genotypes. Only a small number of heterozygous genotypes of all the possible combinations are found in nature. We propose the occurrence of a hybrid zone in the Paranaense Forest. The fact that it is one of the most important biodiversity hotspots of the world, together with its key role for investigating evolutionary processes, makes it worthy of conservation. This is the first genetic evidence of bisexuality in N. cervinus.

  13. Molecular phylogeny of choanoflagellates, the sister group to Metazoa

    PubMed Central

    Carr, M.; Leadbeater, B. S. C.; Hassan, R.; Nelson, M.; Baldauf, S. L.

    2008-01-01

    Choanoflagellates are single-celled aquatic flagellates with a unique morphology consisting of a cell with a single flagellum surrounded by a “collar” of microvilli. They have long interested evolutionary biologists because of their striking resemblance to the collared cells (choanocytes) of sponges. Molecular phylogeny has confirmed a close relationship between choanoflagellates and Metazoa, and the first choanoflagellate genome sequence has recently been published. However, molecular phylogenetic studies within choanoflagellates are still extremely limited. Thus, little is known about choanoflagellate evolution or the exact nature of the relationship between choanoflagellates and Metazoa. We have sequenced four genes from a broad sampling of the morphological diversity of choanoflagellates including most species currently available in culture. Phylogenetic analyses of these sequences, alone and in combination, reject much of the traditional taxonomy of the group. The molecular data also strongly support choanoflagellate monophyly rejecting proposals that Metazoa were derived from a true choanoflagellate ancestor. Mapping of a complementary matrix of morphological and ecological traits onto the phylogeny allows a reinterpretation of choanoflagellate character evolution and predicts the nature of their last common ancestor. PMID:18922774

  14. Simultaneously estimating evolutionary history and repeated traits phylogenetic signal: applications to viral and host phenotypic evolution

    PubMed Central

    Vrancken, Bram; Lemey, Philippe; Rambaut, Andrew; Bedford, Trevor; Longdon, Ben; Günthard, Huldrych F.; Suchard, Marc A.

    2014-01-01

    Phylogenetic signal quantifies the degree to which resemblance in continuously-valued traits reflects phylogenetic relatedness. Measures of phylogenetic signal are widely used in ecological and evolutionary research, and are recently gaining traction in viral evolutionary studies. Standard estimators of phylogenetic signal frequently condition on data summary statistics of the repeated trait observations and fixed phylogenetics trees, resulting in information loss and potential bias. To incorporate the observation process and phylogenetic uncertainty in a model-based approach, we develop a novel Bayesian inference method to simultaneously estimate the evolutionary history and phylogenetic signal from molecular sequence data and repeated multivariate traits. Our approach builds upon a phylogenetic diffusion framework that model continuous trait evolution as a Brownian motion process and incorporates Pagel’s λ transformation parameter to estimate dependence among traits. We provide a computationally efficient inference implementation in the BEAST software package. We evaluate the synthetic performance of the Bayesian estimator of phylogenetic signal against standard estimators, and demonstrate the use of our coherent framework to address several virus-host evolutionary questions, including virulence heritability for HIV, antigenic evolution in influenza and HIV, and Drosophila sensitivity to sigma virus infection. Finally, we discuss model extensions that will make useful contributions to our flexible framework for simultaneously studying sequence and trait evolution. PMID:25780554

  15. Genes with stable DNA methylation levels show higher evolutionary conservation than genes with fluctuant DNA methylation levels.

    PubMed

    Zhang, Ruijie; Lv, Wenhua; Luan, Meiwei; Zheng, Jiajia; Shi, Miao; Zhu, Hongjie; Li, Jin; Lv, Hongchao; Zhang, Mingming; Shang, Zhenwei; Duan, Lian; Jiang, Yongshuai

    2015-11-24

    Different human genes often exhibit different degrees of stability in their DNA methylation levels between tissues, samples or cell types. This may be related to the evolution of human genome. Thus, we compared the evolutionary conservation between two types of genes: genes with stable DNA methylation levels (SM genes) and genes with fluctuant DNA methylation levels (FM genes). For long-term evolutionary characteristics between species, we compared the percentage of the orthologous genes, evolutionary rate dn/ds and protein sequence identity. We found that the SM genes had greater percentages of the orthologous genes, lower dn/ds, and higher protein sequence identities in all the 21 species. These results indicated that the SM genes were more evolutionarily conserved than the FM genes. For short-term evolutionary characteristics among human populations, we compared the single nucleotide polymorphism (SNP) density, and the linkage disequilibrium (LD) degree in HapMap populations and 1000 genomes project populations. We observed that the SM genes had lower SNP densities, and higher degrees of LD in all the 11 HapMap populations and 13 1000 genomes project populations. These results mean that the SM genes had more stable chromosome genetic structures, and were more conserved than the FM genes.

  16. The sequence, and its evolutionary implications, of a Thermococcus celer protein associated with transcription

    NASA Technical Reports Server (NTRS)

    Kaine, B. P.; Mehr, I. J.; Woese, C. R.

    1994-01-01

    Through random search, a gene from Thermococcus celer has been identified and sequenced that appears to encode a transcription-associated protein (110 amino acid residues). The sequence has clear homology to approximately the last half of an open reading frame reported previously for Sulfolobus acidocaldarius [Langer, D. & Zillig, W. (1993) Nucleic Acids Res. 21, 2251]. The protein translations of these two archaeal genes in turn are homologs of a small subunit found in eukaryotic RNA polymerase I (A12.2) and the counterpart of this from RNA polymerase II (B12.6). Homology is also seen with the eukaryotic transcription factor TFIIS, but it involves only the terminal 45 amino acids of the archaeal proteins. Evolutionary implications of these homologies are discussed.

  17. Algorithm to find distant repeats in a single protein sequence

    PubMed Central

    Banerjee, Nirjhar; Sarani, Rangarajan; Ranjani, Chellamuthu Vasuki; Sowmiya, Govindaraj; Michael, Daliah; Balakrishnan, Narayanasamy; Sekar, Kanagaraj

    2008-01-01

    Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies. PMID:19052663

  18. Understanding sequence similarity and framework analysis between centromere proteins using computational biology.

    PubMed

    Doss, C George Priya; Chakrabarty, Chiranjib; Debajyoti, C; Debottam, S

    2014-11-01

    Certain mysteries pointing toward their recruitment pathways, cell cycle regulation mechanisms, spindle checkpoint assembly, and chromosome segregation process are considered the centre of attraction in cancer research. In modern times, with the established databases, ranges of computational platforms have provided a platform to examine almost all the physiological and biochemical evidences in disease-associated phenotypes. Using existing computational methods, we have utilized the amino acid residues to understand the similarity within the evolutionary variance of different associated centromere proteins. This study related to sequence similarity, protein-protein networking, co-expression analysis, and evolutionary trajectory of centromere proteins will speed up the understanding about centromere biology and will create a road map for upcoming researchers who are initiating their work of clinical sequencing using centromere proteins.

  19. Negative correlation between rates of molecular evolution and flowering cycles in temperate woody bamboos revealed by plastid phylogenomics.

    PubMed

    Ma, Peng-Fei; Vorontsova, Maria S; Nanjarisoa, Olinirina Prisca; Razanatsoa, Jacqueline; Guo, Zhen-Hua; Haevermans, Thomas; Li, De-Zhu

    2017-12-21

    Heterogeneous rates of molecular evolution are universal across the tree of life, posing challenges for phylogenetic inference. The temperate woody bamboos (tribe Arundinarieae, Poaceae) are noted for their extremely slow molecular evolutionary rates, supposedly caused by their mysterious monocarpic reproduction. However, the correlation between substitution rates and flowering cycles has not been formally tested. Here we present 15 newly sequenced plastid genomes of temperate woody bamboos, including the first genomes ever sequenced from Madagascar representatives. A data matrix of 46 plastid genomes representing all 12 lineages of Arundinarieae was assembled for phylogenetic and molecular evolutionary analyses. We conducted phylogenetic analyses using different sequences (e.g., coding and noncoding) combined with different data partitioning schemes, revealing conflicting relationships involving internodes among several lineages. A great difference in branch lengths were observed among the major lineages, and topological inconsistency could be attributed to long-branch attraction (LBA). Using clock model-fitting by maximum likelihood and Bayesian approaches, we furthermore demonstrated extensive rate variation among these major lineages. Rate accelerations mainly occurred for the isolated lineages with limited species diversification, totaling 11 rate shifts during the tribe's evolution. Using linear regression analysis, we found a negative correlation between rates of molecular evolution and flowering cycles for Arundinarieae, notwithstanding that the correlation maybe insignificant when taking the phylogenetic structure into account. Using the temperate woody bamboos as an example, we found further evidence that rate heterogeneity is universal in plants, suggesting that this will pose a challenge for phylogenetic reconstruction of bamboos. The bamboos with longer flowering cycles tend to evolve more slowly than those with shorter flowering cycles, in accordance with a putative generation time effect.

  20. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication

    PubMed Central

    2014-01-01

    Background Horseshoe crabs are marine arthropods with a fossil record extending back approximately 450 million years. They exhibit remarkable morphological stability over their long evolutionary history, retaining a number of ancestral arthropod traits, and are often cited as examples of “living fossils.” As arthropods, they belong to the Ecdysozoa, an ancient super-phylum whose sequenced genomes (including insects and nematodes) have thus far shown more divergence from the ancestral pattern of eumetazoan genome organization than cnidarians, deuterostomes and lophotrochozoans. However, much of ecdysozoan diversity remains unrepresented in comparative genomic analyses. Results Here we apply a new strategy of combined de novo assembly and genetic mapping to examine the chromosome-scale genome organization of the Atlantic horseshoe crab, Limulus polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their parents at a mean redundancy of 1.1x per sample. The map includes 84,307 sequence markers grouped into 1,876 distinct genetic intervals and 5,775 candidate conserved protein coding genes. Conclusions Comparison with other metazoan genomes shows that the L. polyphemus genome preserves ancestral bilaterian linkage groups, and that a common ancestor of modern horseshoe crabs underwent one or more ancient whole genome duplications 300 million years ago, followed by extensive chromosome fusion. These results provide a counter-example to the often noted correlation between whole genome duplication and evolutionary radiations. The new, low-cost genetic mapping method for obtaining a chromosome-scale view of non-model organism genomes that we demonstrate here does not require laboratory culture, and is potentially applicable to a broad range of other species. PMID:24987520

  1. Two Functional Copies of the DGCR6 Gene Are Present on Human Chromosome 22q11 Due to a Duplication of an Ancestral Locus

    PubMed Central

    Edelmann, Lisa; Stankiewicz, Pavel; Spiteri, Elizabeth; Pandita, Raj K.; Shaffer, Lisa; Lupski, James; Morrow, Bernice E.

    2001-01-01

    The DGCR6 (DiGeorge critical region) gene encodes a putative protein with sequence similarity to gonadal (gdl), a Drosophila melanogaster gene of unknown function. We mapped the DGCR6 gene to chromosome 22q11 within a low copy repeat, termed sc11.1a, and identified a second copy of the gene, DGCR6L, within the duplicate locus, termed sc11.1b. Both sc11.1 repeats are deleted in most persons with velo-cardio-facial syndrome/DiGeorge syndrome (VCFS/DGS), and they map immediately adjacent and internal to the low copy repeats, termed LCR22, that mediate the deletions associated with VCFS/DGS. We sequenced genomic clones from both loci and determined that the putative initiator methionine is located further upstream than originally described, but in a position similar to the mouse and chicken orthologs. DGCR6L encodes a highly homologous, functional copy of DGCR6, with some base changes rendering amino acid differences. Expression studies of the two genes indicate that both genes are widely expressed in fetal and adult tissues. Evolutionary studies using FISH mapping in several different species of ape combined with sequence analysis of DGCR6 in a number of different primate species indicate that the duplication is at least 12 million years old and may date back to before the divergence of Catarrhines from Platyrrhines, 35 mya. These data suggest that there has been selective evolutionary pressure toward the functional maintenance of both paralogs. Interestingly, a full-length HERV-K provirus integrated into the sc11.1a locus after the divergence of chimpanzees and humans. PMID:11157784

  2. A dated molecular phylogeny of manta and devil rays (Mobulidae) based on mitogenome and nuclear sequences.

    PubMed

    Poortvliet, Marloes; Olsen, Jeanine L; Croll, Donald A; Bernardi, Giacomo; Newton, Kelly; Kollias, Spyros; O'Sullivan, John; Fernando, Daniel; Stevens, Guy; Galván Magaña, Felipe; Seret, Bernard; Wintner, Sabine; Hoarau, Galice

    2015-02-01

    Manta and devil rays are an iconic group of globally distributed pelagic filter feeders, yet their evolutionary history remains enigmatic. We employed next generation sequencing of mitogenomes for nine of the 11 recognized species and two outgroups; as well as additional Sanger sequencing of two mitochondrial and two nuclear genes in an extended taxon sampling set. Analysis of the mitogenome coding regions in a Maximum Likelihood and Bayesian framework provided a well-resolved phylogeny. The deepest divergences distinguished three clades with high support, one containing Manta birostris, Manta alfredi, Mobula tarapacana, Mobula japanica and Mobula mobular; one containing Mobula kuhlii, Mobula eregoodootenkee and Mobula thurstoni; and one containing Mobula munkiana, Mobula hypostoma and Mobula rochebrunei. Mobula remains paraphyletic with the inclusion of Manta, a result that is in agreement with previous studies based on molecular and morphological data. A fossil-calibrated Bayesian random local clock analysis suggests that mobulids diverged from Rhinoptera around 30 Mya. Subsequent divergences are characterized by long internodes followed by short bursts of speciation extending from an initial episode of divergence in the Early and Middle Miocene (19-17 Mya) to a second episode during the Pliocene and Pleistocene (3.6 Mya - recent). Estimates of divergence dates overlap significantly with periods of global warming, during which upwelling intensity - and related high primary productivity in upwelling regions - decreased markedly. These periods are hypothesized to have led to fragmentation and isolation of feeding regions leading to possible regional extinctions, as well as the promotion of allopatric speciation. The closely shared evolutionary history of mobulids in combination with ongoing threats from fisheries and climate change effects on upwelling and food supply, reinforces the case for greater protection of this charismatic family of pelagic filter feeders. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Comparative and Evolutionary Analysis of Grass Pollen Allergens Using Brachypodium distachyon as a Model System.

    PubMed

    Sharma, Akanksha; Sharma, Niharika; Bhalla, Prem; Singh, Mohan

    2017-01-01

    Comparative genomics have facilitated the mining of biological information from a genome sequence, through the detection of similarities and differences with genomes of closely or more distantly related species. By using such comparative approaches, knowledge can be transferred from the model to non-model organisms and insights can be gained in the structural and evolutionary patterns of specific genes. In the absence of sequenced genomes for allergenic grasses, this study was aimed at understanding the structure, organisation and expression profiles of grass pollen allergens using the genomic data from Brachypodium distachyon as it is phylogenetically related to the allergenic grasses. Combining genomic data with the anther RNA-Seq dataset revealed 24 pollen allergen genes belonging to eight allergen groups mapping on the five chromosomes in B. distachyon. High levels of anther-specific expression profiles were observed for the 24 identified putative allergen-encoding genes in Brachypodium. The genomic evidence suggests that gene encoding the group 5 allergen, the most potent trigger of hay fever and allergic asthma originated as a pollen specific orphan gene in a common grass ancestor of Brachypodium and Triticiae clades. Gene structure analysis showed that the putative allergen-encoding genes in Brachypodium either lack or contain reduced number of introns. Promoter analysis of the identified Brachypodium genes revealed the presence of specific cis-regulatory sequences likely responsible for high anther/pollen-specific expression. With the identification of putative allergen-encoding genes in Brachypodium, this study has also described some important plant gene families (e.g. expansin superfamily, EF-Hand family, profilins etc) for the first time in the model plant Brachypodium. Altogether, the present study provides new insights into structural characterization and evolution of pollen allergens and will further serve as a base for their functional characterization in related grass species.

  4. Complete nucleotide sequence of pig (Sus scrofa) mitochondrial genome and dating evolutionary divergence within Artiodactyla.

    PubMed

    Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C

    1999-08-05

    The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.

  5. Determination of Fundamental Properties of an M31 Globular Cluster from Main-Sequence Photometry

    NASA Astrophysics Data System (ADS)

    Ma, Jun; Wu, Zhenyu; Wang, Song; Fan, Zhou; Zhou, Xu; Wu, Jianghua; Jiang, Zhaoji; Chen, Jiansheng

    2010-10-01

    M31 globular cluster B379 is the first extragalactic cluster whose age was determined by main-sequence photometry. In the main-sequence photometric method, the age of a cluster is obtained by fitting its color-magnitude diagram (CMD) with stellar evolutionary models. However, different stellar evolutionary models use different parameters of stellar evolution, such as range of stellar masses, different opacities and equations of state, and different recipes, and so on. So, it is interesting to check whether different stellar evolutionary models can give consistent results for the same cluster. Brown et al. constrained the age of B379 by comparing its CMD with isochrones of the 2006 VandenBerg models. Using SSP models of Bruzual & Charlot and its multiphotometry, ZMa et al. independently determined the age of B379, which is in good agreement with the determination of Brown et al. The models of Bruzual & Charlot are calculated based on the Padova evolutionary tracks. It is necessary to check whether the age of B379 as determined based on the Padova evolutionary tracks is in agreement with the determination of Brown et al.. In this article, we redetermine the age of B379 using isochrones of the Padova stellar evolutionary models. In addition, the metal abundance, the distance modulus, and the reddening value for B379 are reported. The results obtained are consistent with the previous determinations, which include the age obtained by Brown et al. This article thus confirms the consistency of the age scale of B379 between the Padova isochrones and the 2006 VandenBerg isochrones; i.e., the comparison between the results of Brown et al. and Ma et al. is meaningful. The results reported in this article of values found for B379 are: metallicity [M/H] = log(Z/Z ⊙) = -0.325, age τ = 11.0 ± 1.5 Gyr, reddening E(B - V) = 0.08, and distance modulus (m - M)0 = 24.44 ± 0.10.

  6. Exploring Connectivity in Sequence Space of Functional RNA

    NASA Technical Reports Server (NTRS)

    Wei, Chenyu; Pohorille, Andrzej; Popovic, Milena; Ditzler, Mark

    2017-01-01

    Emergence of replicable genetic molecules was one of the marking points in the origin of life, evolution of which can be conceptualized as a walk through the space of all possible sequences. A theoretical concept of fitness landscape helps to understand evolutionary processes through assigning a value of fitness to each genotype. Then, evolution of a phenotype is viewed as a series of consecutive, single-point mutations. Natural selection biases evolution toward peaks of high fitness and away from valleys of low fitness. whereas neutral drift occurs in the sequence space without direction as mutations are introduced at random. Large networks of neutral or near-neutral mutations on a fitness landscape, especially for sufficiently long genomes, are possible or even inevitable. Their detection in experiments, however, has been elusive. Although a few near-neutral evolutionary pathways have been found, recent experimental evidence indicates landscapes consist of largely isolated islands. The generality of these results, however, is not clear, as the genome length or the fraction of functional molecules in the genotypic space might have been insufficient for the emergence of large, neutral networks. Thorough investigation on the structure of the fitness landscape is essential to understand the mechanisms of evolution of early genomes. RNA molecules are commonly assumed to play the pivotal role in the origin of genetic systems. They are widely believed to be early, if not the earliest, genetic and catalytic molecules, with abundant biochemical activities as aptamers and ribozymes, i.e. RNA molecules capable, respectively, to bind small molecules or catalyze chemical reactions. Here, we present results of our recent studies on the structure of the sequence space of RNA ligase ribozymes selected through in vitro evolution. Several hundred thousands of sequences active to a different degree were obtained by way of deep sequencing. Analysis of these sequences revealed several large clusters defined such that every sequence in a cluster can be reached from any other sequence in the same cluster through a series of single point mutations. Sequences in a single cluster appear to adopt more than one secondary structure. The mechanism of refolding within a single cluster was examined. To shed light on possible evolutionary paths in the space of ribozymes, the connectivity between clusters was investigated. The effect of length of RNA molecules on the structure of the fitness landscape and possible evolutionary paths was examined by way of comparing functional sequences of 20 and 80 nucleobases in length. It was found that sequences of different lengths shared secondary structure motifs that were presumed responsible for catalytic activity, with increasing complexity and global structural rearrangements emerging in longer molecules.

  7. An Automated Pipeline for Engineering Many-Enzyme Pathways: Computational Sequence Design, Pathway Expression-Flux Mapping, and Scalable Pathway Optimization.

    PubMed

    Halper, Sean M; Cetnar, Daniel P; Salis, Howard M

    2018-01-01

    Engineering many-enzyme metabolic pathways suffers from the design curse of dimensionality. There are an astronomical number of synonymous DNA sequence choices, though relatively few will express an evolutionary robust, maximally productive pathway without metabolic bottlenecks. To solve this challenge, we have developed an integrated, automated computational-experimental pipeline that identifies a pathway's optimal DNA sequence without high-throughput screening or many cycles of design-build-test. The first step applies our Operon Calculator algorithm to design a host-specific evolutionary robust bacterial operon sequence with maximally tunable enzyme expression levels. The second step applies our RBS Library Calculator algorithm to systematically vary enzyme expression levels with the smallest-sized library. After characterizing a small number of constructed pathway variants, measurements are supplied to our Pathway Map Calculator algorithm, which then parameterizes a kinetic metabolic model that ultimately predicts the pathway's optimal enzyme expression levels and DNA sequences. Altogether, our algorithms provide the ability to efficiently map the pathway's sequence-expression-activity space and predict DNA sequences with desired metabolic fluxes. Here, we provide a step-by-step guide to applying the Pathway Optimization Pipeline on a desired multi-enzyme pathway in a bacterial host.

  8. Rapidly rotating neutron stars in general relativity: Realistic equations of state

    NASA Technical Reports Server (NTRS)

    Cook, Gregory B.; Shapiro, Stuart L.; Teukolsky, Saul A.

    1994-01-01

    We construct equilibrium sequences of rotating neutron stars in general relativity. We compare results for 14 nuclear matter equations of state. We determine a number of important physical parameters for such stars, including the maximum mass and maximum spin rate. The stability of the configurations to quasi-radial perturbations is assessed. We employ a numerical scheme particularly well suited to handle rapid rotation and large departures from spherical symmetry. We provide an extensive tabulation of models for future reference. Two classes of evolutionary sequences of fixed baryon rest mass and entropy are explored: normal sequences, which behave very much like Newtonian sequences, and supramassive sequences, which exist for neutron stars solely because of general relativistic effects. Adiabatic dissipation of energy and angular momentum causes a star to evolve in quasi-stationary fashion along an evolutionary sequence. Supramassive sequences have masses exceeding the maximum mass of a nonrotating neutron star. A supramassive star evolves toward eventual catastrophic collapse to a black hole. Prior to collapse, the star actually spins up as it loses angular momentum, an effect that may provide an observable precursor to gravitational collapse to a black hole.

  9. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes)

    PubMed Central

    Utsunomia, Ricardo; Ruiz-Ruano, Francisco J.; Silva, Duílio M. Z. A.; Serrano, Érica A.; Rosa, Ivana F.; Scudeler, Patrícia E. S.; Hashimoto, Diogo T.; Oliveira, Claudio; Camacho, Juan Pedro M.; Foresti, Fausto

    2017-01-01

    Satellite DNA (satDNA) is an abundant fraction of repetitive DNA in eukaryotic genomes and plays an important role in genome organization and evolution. In general, satDNA sequences follow a concerted evolutionary pattern through the intragenomic homogenization of different repeat units. In addition, the satDNA library hypothesis predicts that related species share a series of satDNA variants descended from a common ancestor species, with differential amplification of different satDNA variants. The finding of a same satDNA family in species belonging to different genera within Characidae fish provided the opportunity to test both concerted evolution and library hypotheses. For this purpose, we analyzed here sequence variation and abundance of this satDNA family in ten species, by a combination of next generation sequencing (NGS), PCR and Sanger sequencing, and fluorescence in situ hybridization (FISH). We found extensive between-species variation for the number and size of pericentromeric FISH signals. At genomic level, the analysis of 1000s of DNA sequences obtained by Illumina sequencing and PCR amplification allowed defining 150 haplotypes which were linked in a common minimum spanning tree, where different patterns of concerted evolution were apparent. This also provided a glimpse into the satDNA library of this group of species. In consistency with the library hypothesis, different variants for this satDNA showed high differences in abundance between species, from highly abundant to simply relictual variants. PMID:28855916

  11. An experimental and computational evolution-based method to study a mode of co-evolution of overlapping open reading frames in the AAV2 viral genome.

    PubMed

    Kawano, Yasuhiro; Neeley, Shane; Adachi, Kei; Nakai, Hiroyuki

    2013-01-01

    Overlapping open reading frames (ORFs) in viral genomes undergo co-evolution; however, how individual amino acids coded by overlapping ORFs are structurally, functionally, and co-evolutionarily constrained remains difficult to address by conventional homologous sequence alignment approaches. We report here a new experimental and computational evolution-based methodology to address this question and report its preliminary application to elucidating a mode of co-evolution of the frame-shifted overlapping ORFs in the adeno-associated virus (AAV) serotype 2 viral genome. These ORFs encode both capsid VP protein and non-structural assembly-activating protein (AAP). To show proof of principle of the new method, we focused on the evolutionarily conserved QVKEVTQ and KSKRSRR motifs, a pair of overlapping heptapeptides in VP and AAP, respectively. In the new method, we first identified a large number of capsid-forming VP3 mutants and functionally competent AAP mutants of these motifs from mutant libraries by experimental directed evolution under no co-evolutionary constraints. We used Illumina sequencing to obtain a large dataset and then statistically assessed the viability of VP and AAP heptapeptide mutants. The obtained heptapeptide information was then integrated into an evolutionary algorithm, with which VP and AAP were co-evolved from random or native nucleotide sequences in silico. As a result, we demonstrate that these two heptapeptide motifs could exhibit high degeneracy if coded by separate nucleotide sequences, and elucidate how overlap-evoked co-evolutionary constraints play a role in making the VP and AAP heptapeptide sequences into the present shape. Specifically, we demonstrate that two valine (V) residues and β-strand propensity in QVKEVTQ are structurally important, the strongly negative and hydrophilic nature of KSKRSRR is functionally important, and overlap-evoked co-evolution imposes strong constraints on serine (S) residues in KSKRSRR, despite high degeneracy of the motifs in the absence of co-evolutionary constraints.

  12. Using hidden Markov models and observed evolution to annotate viral genomes.

    PubMed

    McCauley, Stephen; Hein, Jotun

    2006-06-01

    ssRNA (single stranded) viral genomes are generally constrained in length and utilize overlapping reading frames to maximally exploit the coding potential within the genome length restrictions. This overlapping coding phenomenon leads to complex evolutionary constraints operating on the genome. In regions which code for more than one protein, silent mutations in one reading frame generally have a protein coding effect in another. To maximize coding flexibility in all reading frames, overlapping regions are often compositionally biased towards amino acids which are 6-fold degenerate with respect to the 64 codon alphabet. Previous methodologies have used this fact in an ad hoc manner to look for overlapping genes by motif matching. In this paper differentiated nucleotide compositional patterns in overlapping regions are incorporated into a probabilistic hidden Markov model (HMM) framework which is used to annotate ssRNA viral genomes. This work focuses on single sequence annotation and applies an HMM framework to ssRNA viral annotation. A description of how the HMM is parameterized, whilst annotating within a missing data framework is given. A Phylogenetic HMM (Phylo-HMM) extension, as applied to 14 aligned HIV2 sequences is also presented. This evolutionary extension serves as an illustration of the potential of the Phylo-HMM framework for ssRNA viral genomic annotation. The single sequence annotation procedure (SSA) is applied to 14 different strains of the HIV2 virus. Further results on alternative ssRNA viral genomes are presented to illustrate more generally the performance of the method. The results of the SSA method are encouraging however there is still room for improvement, and since there is overwhelming evidence to indicate that comparative methods can improve coding sequence (CDS) annotation, the SSA method is extended to a Phylo-HMM to incorporate evolutionary information. The Phylo-HMM extension is applied to the same set of 14 HIV2 sequences which are pre-aligned. The performance improvement that results from including the evolutionary information in the analysis is illustrated.

  13. Integrating protein structural dynamics and evolutionary analysis with Bio3D.

    PubMed

    Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J

    2014-12-10

    Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .

  14. Diffusion of influenza viruses among migratory birds with a focus on the Southwest United States.

    PubMed

    Scotch, Matthew; Lam, Tommy Tsan-Yuk; Pabilonia, Kristy L; Anderson, Theodore; Baroch, John; Kohler, Dennis; DeLiberto, Thomas J

    2014-08-01

    The Southwest United States, including Arizona and New Mexico, has a diverse climate and is home to many different avian species. We sequenced the hemagglutinin (HA) gene of twenty influenza specimens for the years 2007-2009. This included four from Arizona, and sixteen from New Mexico. We analyzed the sequences and determined the following HA subtypes: H3, H4, H6, H8, and H11. For each subtype, we combined our virus sequences with those from a public database, and inferred phylogeographic models of influenza diffusion. Statistical phylogeography indicated that overall evolutionary diffusion of avian influenza viruses is geographically structured (p<0.05). In addition, we found that diffusion to the Southwest was often from nearby states including California. For H3, H4 and H6, the intra-flyway gene flow rates were significantly (p<0.001) higher than those of inter-flyway. Such rate difference was also observed in H8 and H11, yet, without statistical significance (p=0.132, p=0.190, respectively). Excluding any one flyway from the calculation generated similar results, suggesting that such barrier effect on gene flow rates is not exclusively produced by any single flyway. We also calculated the Bayes factor test for the significant non-zero rates between states and identified significant routes both within and across flyways. Such inter-flyway spread of influenza was probably the result of birds from four flyways co-mingling on breeding grounds in northern regions or marshaling on staging areas post breeding in Canada or Alaska, before moving south each fall. This study provides an initial analysis of evolutionary diffusion of avian influenza virus to and from the Southwest United States. However, more sequences from this region need to be generated to determine the role of host migration and other factors on influenza diffusion. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. The organization and evolution of the Responder satellite in species of the Drosophila melanogaster group: dynamic evolution of a target of meiotic drive.

    PubMed

    Larracuente, Amanda M

    2014-11-25

    Satellite DNA can make up a substantial fraction of eukaryotic genomes and has roles in genome structure and chromosome segregation. The rapid evolution of satellite DNA can contribute to genomic instability and genetic incompatibilities between species. Despite its ubiquity and its contribution to genome evolution, we currently know little about the dynamics of satellite DNA evolution. The Responder (Rsp) satellite DNA family is found in the pericentric heterochromatin of chromosome 2 of Drosophila melanogaster. Rsp is well-known for being the target of Segregation Distorter (SD)- an autosomal meiotic drive system in D. melanogaster. I present an evolutionary genetic analysis of the Rsp family of repeats in D. melanogaster and its closely-related species in the melanogaster group (D. simulans, D. sechellia, D. mauritiana, D. erecta, and D. yakuba) using a combination of available BAC sequences, whole genome shotgun Sanger reads, Illumina short read deep sequencing, and fluorescence in situ hybridization. I show that Rsp repeats have euchromatic locations throughout the D. melanogaster genome, that Rsp arrays show evidence for concerted evolution, and that Rsp repeats exist outside of D. melanogaster, in the melanogaster group. The repeats in these species are considerably diverged at the sequence level compared to D. melanogaster, and have a strikingly different genomic distribution, even between closely-related sister taxa. The genomic organization of the Rsp repeat in the D. melanogaster genome is complex-it exists of large blocks of tandem repeats in the heterochromatin and small blocks of tandem repeats in the euchromatin. My discovery of heterochromatic Rsp-like sequences outside of D. melanogaster suggests that SD evolved after its target satellite and that the evolution of the Rsp satellite family is highly dynamic over a short evolutionary time scale (<240,000 years).

  16. Effects of Mitochondrial DNA Rate Variation on Reconstruction of Pleistocene Demographic History in a Social Avian Species, Pomatostomus superciliosus

    PubMed Central

    Norman, Janette A.; Blackmore, Caroline J.; Rourke, Meaghan; Christidis, Les

    2014-01-01

    Mitochondrial sequence data is often used to reconstruct the demographic history of Pleistocene populations in an effort to understand how species have responded to past climate change events. However, departures from neutral equilibrium conditions can confound evolutionary inference in species with structured populations or those that have experienced periods of population expansion or decline. Selection can affect patterns of mitochondrial DNA variation and variable mutation rates among mitochondrial genes can compromise inferences drawn from single markers. We investigated the contribution of these factors to patterns of mitochondrial variation and estimates of time to most recent common ancestor (TMRCA) for two clades in a co-operatively breeding avian species, the white-browed babbler Pomatostomus superciliosus. Both the protein-coding ND3 gene and hypervariable domain I control region sequences showed departures from neutral expectations within the superciliosus clade, and a two-fold difference in TMRCA estimates. Bayesian phylogenetic analysis provided evidence of departure from a strict clock model of molecular evolution in domain I, leading to an over-estimation of TMRCA for the superciliosus clade at this marker. Our results suggest mitochondrial studies that attempt to reconstruct Pleistocene demographic histories should rigorously evaluate data for departures from neutral equilibrium expectations, including variation in evolutionary rates across multiple markers. Failure to do so can lead to serious errors in the estimation of evolutionary parameters and subsequent demographic inferences concerning the role of climate as a driver of evolutionary change. These effects may be especially pronounced in species with complex social structures occupying heterogeneous environments. We propose that environmentally driven differences in social structure may explain observed differences in evolutionary rate of domain I sequences, resulting from longer than expected retention times for matriarchal lineages in the superciliosus clade. PMID:25181547

  17. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    PubMed Central

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  18. Evidence Combination From an Evolutionary Game Theory Perspective.

    PubMed

    Deng, Xinyang; Han, Deqiang; Dezert, Jean; Deng, Yong; Shyr, Yu

    2016-09-01

    Dempster-Shafer evidence theory is a primary methodology for multisource information fusion because it is good at dealing with uncertain information. This theory provides a Dempster's rule of combination to synthesize multiple evidences from various information sources. However, in some cases, counter-intuitive results may be obtained based on that combination rule. Numerous new or improved methods have been proposed to suppress these counter-intuitive results based on perspectives, such as minimizing the information loss or deviation. Inspired by evolutionary game theory, this paper considers a biological and evolutionary perspective to study the combination of evidences. An evolutionary combination rule (ECR) is proposed to help find the most biologically supported proposition in a multievidence system. Within the proposed ECR, we develop a Jaccard matrix game to formalize the interaction between propositions in evidences, and utilize the replicator dynamics to mimick the evolution of propositions. Experimental results show that the proposed ECR can effectively suppress the counter-intuitive behaviors appeared in typical paradoxes of evidence theory, compared with many existing methods. Properties of the ECR, such as solution's stability and convergence, have been mathematically proved as well.

  19. Comprehensive definition of genome features in Spirodela polyrhiza by high-depth physical mapping and short-read DNA sequencing strategies.

    PubMed

    Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric

    2017-02-01

    Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  20. Identification of novel microRNAs in Hevea brasiliensis and computational prediction of their targets

    PubMed Central

    2012-01-01

    Background Plants respond to external stimuli through fine regulation of gene expression partially ensured by small RNAs. Of these, microRNAs (miRNAs) play a crucial role. They negatively regulate gene expression by targeting the cleavage or translational inhibition of target messenger RNAs (mRNAs). In Hevea brasiliensis, environmental and harvesting stresses are known to affect natural rubber production. This study set out to identify abiotic stress-related miRNAs in Hevea using next-generation sequencing and bioinformatic analysis. Results Deep sequencing of small RNAs was carried out on plantlets subjected to severe abiotic stress using the Solexa technique. By combining the LeARN pipeline, data from the Plant microRNA database (PMRD) and Hevea EST sequences, we identified 48 conserved miRNA families already characterized in other plant species, and 10 putatively novel miRNA families. The results showed the most abundant size for miRNAs to be 24 nucleotides, except for seven families. Several MIR genes produced both 20-22 nucleotides and 23-27 nucleotides. The two miRNA class sizes were detected for both conserved and putative novel miRNA families, suggesting their functional duality. The EST databases were scanned with conserved and novel miRNA sequences. MiRNA targets were computationally predicted and analysed. The predicted targets involved in "responses to stimuli" and to "antioxidant" and "transcription activities" are presented. Conclusions Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs when the complete genome is not yet available. Our study provided additional information for evolutionary studies and revealed potentially specific regulation of the control of redox status in Hevea. PMID:22330773

  1. Finding a (pine) needle in a haystack: chloroplast genome sequence divergence in rare and widespread pines

    Treesearch

    J.B. Whittall; J. Syring; M. Parks; J. Buenrostro; C. Dick; A. Liston; R. Cronn

    2010-01-01

    Critical to conservation efforts and other investigations at low taxonomic levels, DNA sequence data offer important insights into the distinctiveness, biogeographic partitioning, and evolutionary histories of species. The resolving power of DNA sequences is often limited by insufficient variability at the intraspecific level. This is particularly true of studies...

  2. Invariant glycines and prolines flanking in loops the strand beta 2 of various (alpha/beta)8-barrel enzymes: a hidden homology?

    PubMed Central

    Janecek, S.

    1996-01-01

    The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the existing evolutionary division of the entire family of (alpha/beta)8-barrel proteins. PMID:8762144

  3. Invariant glycines and prolines flanking in loops the strand beta 2 of various (alpha/beta)8-barrel enzymes: a hidden homology?

    PubMed

    Janecek, S

    1996-06-01

    The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the existing evolutionary division of the entire family of (alpha/beta)8-barrel proteins.

  4. Phylogenomics of MADS-Box Genes in Plants - Two Opposing Life Styles in One Gene Family.

    PubMed

    Gramzow, Lydia; Theißen, Günter

    2013-09-12

    The development of multicellular eukaryotes, according to their body plan, is often directed by members of multigene families that encode transcription factors. MADS (for MINICHROMOSOME MAINTENANCE1, AGAMOUS, DEFICIENS and SERUM RESPONSE FACTOR)-box genes form one of those families controlling nearly all major aspects of plant development. Knowing the complete complement of MADS-box genes in sequenced plant genomes will allow a better understanding of the evolutionary patterns of these genes and the association of their evolution with the evolution of plant morphologies. Here, we have applied a combination of automatic and manual annotations to identify the complete set of MADS-box genes in 17 plant genomes. Furthermore, three plant genomes were reanalyzed and published datasets were used for four genomes such that more than 2,600 genes from 24 species were classified into the two types of MADS-box genes, Type I and Type II. Our results extend previous studies, highlighting the remarkably different evolutionary patterns of Type I and Type II genes and provide a basis for further studies on the evolution and function of MADS-box genes.

  5. The identification of a sequence related to apicomplexan enolase from Sarcocystis neurona.

    PubMed

    Wilson, A P; Thelen, J J; Lakritz, J; Brown, C R; Marsh, A E

    2004-11-01

    Equine protozoal myeloencephalitis (EPM) is a neurological disease caused by Sarcocystis neurona, an apicomplexan parasite. S. neurona is also associated with EPM-like diseases in marine and small mammals. The mechanisms of transmission and ability to infect a wide host range remain obscure; therefore, characterization of essential proteins may provide evolutionary information allowing the development of novel chemotherapeutics that target non-mammalian biochemical pathways. In the current study, two-dimensional electrophoresis and matrix-assisted laser desorption ionization-time of flight (MALDI-ToF) mass spectrometry were combined to characterize and identify an enolase protein from S. neurona based on peptide homology to the Toxoplasma gondii protein. Enolase is thought to be a vestigial, non-photosynthetic protein resulting from an evolutionary endosymbiosis event of an apicomplexan ancestor with green algae. Enolase has also been suggested to play a role in parasite stage conversion for T. gondii. Characterization of this protein in S. neurona and comparison to other protozoans indicate a biochemical similarity of S. neurona enolase to other tissue-cyst forming coccidians that cause encephalitis.

  6. Genetic diversity and variation of Chinese fir from Fujian province and Taiwan, China, based on ISSR markers

    PubMed Central

    Chen, Yu; Peng, Zhuqing; Wu, Chao; Ma, Zhihui; Ding, Guochang; Cao, Guangqiu; Ruan, Shaoning; Lin, Sizu

    2017-01-01

    Genetic diversity and variation among 11 populations of Chinese fir from Fujian province and Taiwan were assessed using inter-simple sequence repeat (ISSR) markers to reveal the evolutionary relationship in their distribution range in this report. Analysis of genetic parameters of the different populations showed that populations in Fujian province exhibited a greater level of genetic diversity than did the populations in Taiwan. Compared to Taiwan populations, significant limited gene flow were observed among Fujian populations. An UPGMA cluster analysis showed that the most individuals of Taiwan populations formed a single cluster, whereas 6 discrete clusters were formed by each population from Fujian. All populations were divided into 3 main groups and that all 5 populations from Taiwan were gathered into a subgroup combined with 2 populations, Dehua and Liancheng, formed one of the 3 main groups, which indicated relative stronger relatedness. It is supported by a genetic structure analysis. All those results are suggesting different levels of genetic diversity and variation of Chinese fir between Fujian and Taiwan, and indicating different patterns of evolutionary process and local environmental adaption. PMID:28406956

  7. Genetic diversity and variation of Chinese fir from Fujian province and Taiwan, China, based on ISSR markers.

    PubMed

    Chen, Yu; Peng, Zhuqing; Wu, Chao; Ma, Zhihui; Ding, Guochang; Cao, Guangqiu; Ruan, Shaoning; Lin, Sizu

    2017-01-01

    Genetic diversity and variation among 11 populations of Chinese fir from Fujian province and Taiwan were assessed using inter-simple sequence repeat (ISSR) markers to reveal the evolutionary relationship in their distribution range in this report. Analysis of genetic parameters of the different populations showed that populations in Fujian province exhibited a greater level of genetic diversity than did the populations in Taiwan. Compared to Taiwan populations, significant limited gene flow were observed among Fujian populations. An UPGMA cluster analysis showed that the most individuals of Taiwan populations formed a single cluster, whereas 6 discrete clusters were formed by each population from Fujian. All populations were divided into 3 main groups and that all 5 populations from Taiwan were gathered into a subgroup combined with 2 populations, Dehua and Liancheng, formed one of the 3 main groups, which indicated relative stronger relatedness. It is supported by a genetic structure analysis. All those results are suggesting different levels of genetic diversity and variation of Chinese fir between Fujian and Taiwan, and indicating different patterns of evolutionary process and local environmental adaption.

  8. New Parallaxes for the Upper Scorpius OB Association

    NASA Astrophysics Data System (ADS)

    Donaldson, J. K.; Weinberger, A. J.; Gagné, J.; Boss, A. P.; Keiser, S. A.

    2017-11-01

    Upper Scorpius is a subgroup of the nearest OB association, Scorpius-Centaurus. Its young age makes it an important association to study star and planet formation. We present parallaxes to 52 low-mass stars in Upper Scorpius, 28 of which have full kinematics. We measure ages of the individual stars by combining our measured parallaxes with pre-main-sequence evolutionary tracks. We find a significant difference in the ages of stars with and without circumstellar disks. The stars without disks have a mean age of 4.9 ± 0.8 Myr and those with disks have an older mean age of 8.2 ± 0.9 Myr. This somewhat counterintuitive result suggests that evolutionary effects in young stars can dominate their apparent ages. We also attempt to use the 28 stars with full kinematics (I.e., proper motion, radial velocity (RV), and parallax) to trace the stars back in time to their original birthplace to obtain a trackback age. As expected, given the large measurement uncertainties on available RV measurements, we find that measurement uncertainties alone cause the group to diverge after a few Myr.

  9. Evolutionary algorithm for vehicle driving cycle generation.

    PubMed

    Perhinschi, Mario G; Marlowe, Christopher; Tamayo, Sergio; Tu, Jun; Wayne, W Scott

    2011-09-01

    Modeling transit bus emissions and fuel economy requires a large amount of experimental data over wide ranges of operational conditions. Chassis dynamometer tests are typically performed using representative driving cycles defined based on vehicle instantaneous speed as sequences of "microtrips", which are intervals between consecutive vehicle stops. Overall significant parameters of the driving cycle, such as average speed, stops per mile, kinetic intensity, and others, are used as independent variables in the modeling process. Performing tests at all the necessary combinations of parameters is expensive and time consuming. In this paper, a methodology is proposed for building driving cycles at prescribed independent variable values using experimental data through the concatenation of "microtrips" isolated from a limited number of standard chassis dynamometer test cycles. The selection of the adequate "microtrips" is achieved through a customized evolutionary algorithm. The genetic representation uses microtrip definitions as genes. Specific mutation, crossover, and karyotype alteration operators have been defined. The Roulette-Wheel selection technique with elitist strategy drives the optimization process, which consists of minimizing the errors to desired overall cycle parameters. This utility is part of the Integrated Bus Information System developed at West Virginia University.

  10. Maximizing ecological and evolutionary insight in bisulfite sequencing data sets

    PubMed Central

    Lea, Amanda J.; Vilgalys, Tauras P.; Durst, Paul A.P.; Tung, Jenny

    2017-01-01

    Preface Genome-scale bisulfite sequencing approaches have opened the door to ecological and evolutionary studies of DNA methylation in many organisms. These approaches can be powerful. However, they introduce new methodological and statistical considerations, some of which are particularly relevant to non-model systems. Here, we highlight how these considerations influence a study’s power to link methylation variation with a predictor variable of interest. Relative to current practice, we argue that sample sizes will need to increase to provide robust insights. We also provide recommendations for overcoming common challenges and an R Shiny app to aid in study design. PMID:29046582

  11. Novel non-parametric models to estimate evolutionary rates and divergence times from heterochronous sequence data.

    PubMed

    Fourment, Mathieu; Holmes, Edward C

    2014-07-24

    Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.

  12. A strategy with novel evolutionary features for the iterated prisoner's dilemma.

    PubMed

    Li, Jiawei; Kendall, Graham

    2009-01-01

    In recent iterated prisoner's dilemma tournaments, the most successful strategies were those that had identification mechanisms. By playing a predetermined sequence of moves and learning from their opponents' responses, these strategies managed to identify their opponents. We believe that these identification mechanisms may be very useful in evolutionary games. In this paper one such strategy, which we call collective strategy, is analyzed. Collective strategies apply a simple but efficient identification mechanism (that just distinguishes themselves from other strategies), and this mechanism allows them to only cooperate with their group members and defect against any others. In this way, collective strategies are able to maintain a stable population in evolutionary iterated prisoner's dilemma. By means of an invasion barrier, this strategy is compared with other strategies in evolutionary dynamics in order to demonstrate its evolutionary features. We also find that this collective behavior assists the evolution of cooperation in specific evolutionary environments.

  13. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    PubMed Central

    Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

    2009-01-01

    Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884

  14. RBT-GA: a novel metaheuristic for solving the Multiple Sequence Alignment problem.

    PubMed

    Taheri, Javid; Zomaya, Albert Y

    2009-07-07

    Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences.

  15. The Solute Carrier Families Have a Remarkably Long Evolutionary History with the Majority of the Human Families Present before Divergence of Bilaterian Species

    PubMed Central

    Höglund, Pär J.; Nordström, Karl J.V.; Schiöth, Helgi B.; Fredriksson, Robert

    2011-01-01

    The Solute Carriers (SLCs) are membrane proteins that regulate transport of many types of substances over the cell membrane. The SLCs are found in at least 46 gene families in the human genome. Here, we performed the first evolutionary analysis of the entire SLC family based on whole genome sequences. We systematically mined and analyzed the genomes of 17 species to identify SLC genes. In all, we identified 4,813 SLC sequences in these genomes, and we delineated the evolutionary history of each of the subgroups. Moreover, we also identified ten new human sequences not previously classified as SLCs, which most likely belong to the SLC family. We found that 43 of the 46 SLC families found in Homo sapiens were also found in Caenorhabditis elegans, whereas 42 of them were also found in insects. Mammals have a higher number of SLC genes in most families, perhaps reflecting important roles for these in central nervous system functions. This study provides a systematic analysis of the evolutionary history of the SLC families in Eukaryotes showing that the SLC superfamily is ancient with multiple branches that were present before early divergence of Bilateria. The results provide foundation for overall classification of SLC genes and are valuable for annotation and prediction of substrates for the many SLCs that have not been tested in experimental transport assays. PMID:21186191

  16. Evolutionary distance from human homologs reflects allergenicity of animal food proteins.

    PubMed

    Jenkins, John A; Breiteneder, Heimo; Mills, E N Clare

    2007-12-01

    In silico analysis of allergens can identify putative relationships among protein sequence, structure, and allergenic properties. Such systematic analysis reveals that most plant food allergens belong to a restricted number of protein superfamilies, with pollen allergens behaving similarly. We have investigated the structural relationships of animal food allergens and their evolutionary relatedness to human homologs to define how closely a protein must resemble a human counterpart to lose its allergenic potential. Profile-based sequence homology methods were used to classify animal food allergens into Pfam families, and in silico analyses of their evolutionary and structural relationships were performed. Animal food allergens could be classified into 3 main families--tropomyosins, EF-hand proteins, and caseins--along with 14 minor families each composed of 1 to 3 allergens. The evolutionary relationships of each of these allergen superfamilies showed that in general, proteins with a sequence identity to a human homolog above approximately 62% were rarely allergenic. Single substitutions in otherwise highly conserved regions containing IgE epitopes in EF-hand parvalbumins may modulate allergenicity. These data support the premise that certain protein structures are more allergenic than others. Contrasting with plant food allergens, animal allergens, such as the highly conserved tropomyosins, challenge the capability of the human immune system to discriminate between foreign and self-proteins. Such immune responses run close to becoming autoimmune responses. Exploiting the closeness between animal allergens and their human homologs in the development of recombinant allergens for immunotherapy will need to consider the potential for developing unanticipated autoimmune responses.

  17. EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data.

    PubMed

    Linard, Benjamin; Nguyen, Ngoc Hoan; Prosdocimi, Francisco; Poch, Olivier; Thompson, Julie D

    2012-01-01

    Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes.

  18. Evolutionary genomics of miniature inverted-repeat transposable elements (MITEs) in Brassica.

    PubMed

    Nouroz, Faisal; Noreen, Shumaila; Heslop-Harrison, J S

    2015-12-01

    Miniature inverted-repeat transposable elements (MITEs) are truncated derivatives of autonomous DNA transposons, and are dispersed abundantly in most eukaryotic genomes. We aimed to characterize various MITEs families in Brassica in terms of their presence, sequence characteristics and evolutionary activity. Dot plot analyses involving comparison of homoeologous bacterial artificial chromosome (BAC) sequences allowed identification of 15 novel families of mobile MITEs. Of which, 5 were Stowaway-like with TA Target Site Duplications (TSDs), 4 Tourist-like with TAA/TTA TSDs, 5 Mutator-like with 9-10 bp TSDs and 1 novel MITE (BoXMITE1) flanked by 3 bp TSDs. Our data suggested that there are about 30,000 MITE-related sequences in Brassica rapa and B. oleracea genomes. In situ hybridization showed one abundant family was dispersed in the A-genome, while another was located near 45S rDNA sites. PCR analysis using primers flanking sequences of MITE elements detected MITE insertion polymorphisms between and within the three Brassica (AA, BB, CC) genomes, with many insertions being specific to single genomes and others showing evidence of more recent evolutionary insertions. Our BAC sequence comparison strategy enables identification of evolutionarily active MITEs with no prior knowledge of MITE sequences. The details of MITE families reported in Brassica enable their identification, characterization and annotation. Insertion polymorphisms of MITEs and their transposition activity indicated important mechanism of genome evolution and diversification. MITE families derived from known Mariner, Harbinger and Mutator DNA transposons were discovered, as well as some novel structures. The identification of Brassica MITEs will have broad applications in Brassica genomics, breeding, hybridization and phylogeny through their use as DNA markers.

  19. Rapid evolutionary change of common bean (Phaseolus vulgaris L) plastome, and the genomic diversification of legume chloroplasts

    PubMed Central

    Guo, Xianwu; Castillo-Ramírez, Santiago; González, Víctor; Bustos, Patricia; Luís Fernández-Vázquez, José; Santamaría, Rosa Isela; Arellano, Jesús; Cevallos, Miguel A; Dávila, Guillermo

    2007-01-01

    Background Fabaceae (legumes) is one of the largest families of flowering plants, and some members are important crops. In contrast to what we know about their great diversity or economic importance, our knowledge at the genomic level of chloroplast genomes (cpDNAs or plastomes) for these crops is limited. Results We sequenced the complete genome of the common bean (Phaseolus vulgaris cv. Negro Jamapa) chloroplast. The plastome of P. vulgaris is a 150,285 bp circular molecule. It has gene content similar to that of other legume plastomes, but contains two pseudogenes, rpl33 and rps16. A distinct inversion occurred at the junction points of trnH-GUG/rpl14 and rps19/rps8, as in adzuki bean [1]. These two pseudogenes and the inversion were confirmed in 10 varieties representing the two domestication centers of the bean. Genomic comparative analysis indicated that inversions generally occur in legume plastomes and the magnitude and localization of insertions/deletions (indels) also vary. The analysis of repeat sequences demonstrated that patterns and sequences of tandem repeats had an important impact on sequence diversification between legume plastomes and tandem repeats did not belong to dispersed repeats. Interestingly, P. vulgaris plastome had higher evolutionary rates of change on both genomic and gene levels than G. max, which could be the consequence of pressure from both mutation and natural selection. Conclusion Legume chloroplast genomes are widely diversified in gene content, gene order, indel structure, abundance and localization of repetitive sequences, intracellular sequence exchange and evolutionary rates. The P. vulgaris plastome is a rapidly evolving genome. PMID:17623083

  20. Radiative evolution of polyploid races of the Iberian carnation Dianthus broteri (Caryophyllaceae).

    PubMed

    Balao, Francisco; Valente, Luis M; Vargas, Pablo; Herrera, Javier; Talavera, Salvador

    2010-07-01

    *The micro-evolutionary mechanisms that drive large-scale radiations are not completely understood, partly because of a shortage of population-level studies aimed at identifying putative causes of rapid evolutionary change. The Dianthus broteri complex, representing the largest polyploid series known to date for any species in the genus (2x, 4x, 6x and 12x cytotypes), belongs to a lineage that was recently found to have diversified at unusually rapid rates. *We used a combination of genome sequencing (internal transcribed spacer (ITS), plus chloroplast DNA (cpDNA) regions trnH-psbA, psbA-trnK and trnK-matK) and amplified fragment length polymorphism (AFLP) fingerprinting in 25 populations to infer the evolutionary history of extant polyploid races. *The haplotype, ribotype and AFLP reconstructions showed a star-shaped arrangement suggesting a pattern of radiative evolution. The major, widespread haplotype occurred at all ploidy levels, whereas 20 minor haplotypes were restricted to single populations and cytotypes. In addition, AFLP analyses retrieved well-supported cytogeographic groups: six clades were clearly differentiated in terms of ploidy level and geography. Molecular data indicate that gene flow among different cytotypes is rare or nonexistent. *Our study supports a scenario of rapid diversification in carnations in which autopolyploidy and allopolyploidy, in interaction with geography and/or isolation, have played prominent roles.

  1. Dynamic evolution and biogenesis of small RNAs during sex reversal.

    PubMed

    Liu, Jie; Luo, Majing; Sheng, Yue; Hong, Qiang; Cheng, Hanhua; Zhou, Rongjia

    2015-05-06

    Understanding origin, evolution and functions of small RNA (sRNA) genes has been a great challenge in the past decade. Molecular mechanisms underlying sexual reversal in vertebrates, particularly sRNAs involved in this process, are largely unknown. By deep-sequencing of small RNA transcriptomes in combination with genomic analysis, we identified a large amount of piRNAs and miRNAs including over 1,000 novel miRNAs, which were differentially expressed during gonad reversal from ovary to testis via ovotesis. Biogenesis and expressions of miRNAs were dynamically changed during the reversal. Notably, phylogenetic analysis revealed dynamic expansions of miRNAs in vertebrates and an evolutionary trajectory of conserved miR-17-92 cluster in the Eukarya. We showed that the miR-17-92 cluster in vertebrates was generated through multiple duplications from ancestor miR-92 in invertebrates Tetranychus urticae and Daphnia pulex from the Chelicerata around 580 Mya. Moreover, we identified the sexual regulator Dmrt1 as a direct target of the members miR-19a and -19b in the cluster. These data suggested dynamic biogenesis and expressions of small RNAs during sex reversal and revealed multiple expansions and evolutionary trajectory of miRNAs from invertebrates to vertebrates, which implicate small RNAs in sexual reversal and provide new insight into evolutionary and molecular mechanisms underlying sexual reversal.

  2. A Systematic Survey of an Intragenic Epistatic Landscape

    PubMed Central

    Bank, Claudia; Hietpas, Ryan T.; Jensen, Jeffrey D.; Bolon, Daniel N.A.

    2015-01-01

    Mutations are the source of evolutionary variation. The interactions of multiple mutations can have important effects on fitness and evolutionary trajectories. We have recently described the distribution of fitness effects of all single mutations for a nine-amino-acid region of yeast Hsp90 (Hsp82) implicated in substrate binding. Here, we report and discuss the distribution of intragenic epistatic effects within this region in seven Hsp90 point mutant backgrounds of neutral to slightly deleterious effect, resulting in an analysis of more than 1,000 double mutants. We find negative epistasis between substitutions to be common, and positive epistasis to be rare—resulting in a pattern that indicates a drastic change in the distribution of fitness effects one step away from the wild type. This can be well explained by a concave relationship between phenotype and genotype (i.e., a concave shape of the local fitness landscape), suggesting mutational robustness intrinsic to the local sequence space. Structural analyses indicate that, in this region, epistatic effects are most pronounced when a solvent-inaccessible position is involved in the interaction. In contrast, all 18 observations of positive epistasis involved at least one mutation at a solvent-exposed position. By combining the analysis of evolutionary and biophysical properties of an epistatic landscape, these results contribute to a more detailed understanding of the complexity of protein evolution. PMID:25371431

  3. Evolution of the arginase fold and functional diversity

    PubMed Central

    Dowling, Daniel P.; Costanzo, Luigi Di; Gennadios, Heather A.; Christianson, David W.

    2009-01-01

    The large number of protein structures deposited in the Protein Data Bank allows for the identification of novel structural superfamilies based on conservation of fold in addition to conservation of amino acid sequence. Since sequence diverges more rapidly than fold in protein evolution, proteins with little or no significant sequence identity are occasionally observed to adopt similar folds, thereby reflecting unanticipated evolutionary relationships. Here, we review the unique α/β fold first observed in the manganese metalloenzyme rat liver arginase, consisting of a parallel 8 stranded β-sheet surrounded by several helices, and its evolutionary relationship with the zinc-requiring and/or iron-requiring histone deacetylases and acetylpolyamine amidohydrolases. Structural comparisons reveal key features of the core α/β fold that contribute to the divergent metal ion specificity and stoichiometry required for the chemical and biological functions of these enzymes. PMID:18360740

  4. Genome-wide analysis captures the determinants of the antibiotic cross-resistance interaction network

    PubMed Central

    Lázár, Viktória; Nagy, István; Spohn, Réka; Csörgő, Bálint; Györkei, Ádám; Nyerges, Ákos; Horváth, Balázs; Vörös, Andrea; Busa-Fekete, Róbert; Hrtyan, Mónika; Bogos, Balázs; Méhi, Orsolya; Fekete, Gergely; Szappanos, Balázs; Kégl, Balázs; Papp, Balázs; Pál, Csaba

    2014-01-01

    Understanding how evolution of antimicrobial resistance increases resistance to other drugs is a challenge of profound importance. By combining experimental evolution and genome sequencing of 63 laboratory-evolved lines, we charted a map of cross-resistance interactions between antibiotics in Escherichia coli, and explored the driving evolutionary principles. Here, we show that (1) convergent molecular evolution is prevalent across antibiotic treatments, (2) resistance conferring mutations simultaneously enhance sensitivity to many other drugs and (3) 27% of the accumulated mutations generate proteins with compromised activities, suggesting that antibiotic adaptation can partly be achieved without gain of novel function. By using knowledge on antibiotic properties, we examined the determinants of cross-resistance and identified chemogenomic profile similarity between antibiotics as the strongest predictor. In contrast, cross-resistance between two antibiotics is independent of whether they show synergistic effects in combination. These results have important implications on the development of novel antimicrobial strategies. PMID:25000950

  5. Elucidation of cross-species proteomic effects in human and hominin bone proteome identification through a bioinformatics experiment.

    PubMed

    Welker, F

    2018-02-20

    The study of ancient protein sequences is increasingly focused on the analysis of older samples, including those of ancient hominins. The analysis of such ancient proteomes thereby potentially suffers from "cross-species proteomic effects": the loss of peptide and protein identifications at increased evolutionary distances due to a larger number of protein sequence differences between the database sequence and the analyzed organism. Error-tolerant proteomic search algorithms should theoretically overcome this problem at both the peptide and protein level; however, this has not been demonstrated. If error-tolerant searches do not overcome the cross-species proteomic issue then there might be inherent biases in the identified proteomes. Here, a bioinformatics experiment is performed to test this using a set of modern human bone proteomes and three independent searches against sequence databases at increasing evolutionary distances: the human (0 Ma), chimpanzee (6-8 Ma) and orangutan (16-17 Ma) reference proteomes, respectively. Incorrectly suggested amino acid substitutions are absent when employing adequate filtering criteria for mutable Peptide Spectrum Matches (PSMs), but roughly half of the mutable PSMs were not recovered. As a result, peptide and protein identification rates are higher in error-tolerant mode compared to non-error-tolerant searches but did not recover protein identifications completely. Data indicates that peptide length and the number of mutations between the target and database sequences are the main factors influencing mutable PSM identification. The error-tolerant results suggest that the cross-species proteomics problem is not overcome at increasing evolutionary distances, even at the protein level. Peptide and protein loss has the potential to significantly impact divergence dating and proteome comparisons when using ancient samples as there is a bias towards the identification of conserved sequences and proteins. Effects are minimized between moderately divergent proteomes, as indicated by almost complete recovery of informative positions in the search against the chimpanzee proteome (≈90%, 6-8 Ma). This provides a bioinformatic background to future phylogenetic and proteomic analysis of ancient hominin proteomes, including the future description of novel hominin amino acid sequences, but also has negative implications for the study of fast-evolving proteins in hominins, non-hominin animals, and ancient bacterial proteins in evolutionary contexts.

  6. Spontaneous Spatial Mapping of Learned Sequence in Chimpanzees: Evidence for a SNARC-Like Effect

    PubMed Central

    Adachi, Ikuma

    2014-01-01

    In the last couple of decades, there has been a growing number of reports on space-based representation of numbers and serial order in humans. In the present study, to explore evolutionary origins of such representations, we examined whether our closest evolutionary relatives, chimpanzees, map an acquired sequence onto space in a similar way to humans. The subjects had been trained to perform a number sequence task in which they touched a sequence of “small” to “large” Arabic numerals presented in random locations on the monitor. This task was presented in sessions that also included test trials consisting of only two numerals (1 and 9) horizontally arranged. On half of the trials 1 was located to the left of 9, whereas on the other half 1 was to the right to 9. The Chimpanzees' performance was systematically influenced by the spatial arrangement of the stimuli; specifically, they responded quicker when 1 was on the left and 9 on the right compared to the other way around. This result suggests that chimpanzees, like humans, spontaneously map a learned sequence onto space. PMID:24643044

  7. The use of museum specimens with high-throughput DNA sequencers

    PubMed Central

    Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.

    2015-01-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801

  8. Artificial Intelligence, DNA Mimicry, and Human Health.

    PubMed

    Stefano, George B; Kream, Richard M

    2017-08-14

    The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.

  9. Conflicting Evolutionary Histories of the Mitochondrial and Nuclear Genomes in New World Myotis Bats.

    PubMed

    Platt, Roy N; Faircloth, Brant C; Sullivan, Kevin A M; Kieran, Troy J; Glenn, Travis C; Vandewege, Michael W; Lee, Thomas E; Baker, Robert J; Stevens, Richard D; Ray, David A

    2018-03-01

    The rapid diversification of Myotis bats into more than 100 species is one of the most extensive mammalian radiations available for study. Efforts to understand relationships within Myotis have primarily utilized mitochondrial markers and trees inferred from nuclear markers lacked resolution. Our current understanding of relationships within Myotis is therefore biased towards a set of phylogenetic markers that may not reflect the history of the nuclear genome. To resolve this, we sequenced the full mitochondrial genomes of 37 representative Myotis, primarily from the New World, in conjunction with targeted sequencing of 3648 ultraconserved elements (UCEs). We inferred the phylogeny and explored the effects of concatenation and summary phylogenetic methods, as well as combinations of markers based on informativeness or levels of missing data, on our results. Of the 294 phylogenies generated from the nuclear UCE data, all are significantly different from phylogenies inferred using mitochondrial genomes. Even within the nuclear data, quartet frequencies indicate that around half of all UCE loci conflict with the estimated species tree. Several factors can drive such conflict, including incomplete lineage sorting, introgressive hybridization, or even phylogenetic error. Despite the degree of discordance between nuclear UCE loci and the mitochondrial genome and among UCE loci themselves, the most common nuclear topology is recovered in one quarter of all analyses with strong nodal support. Based on these results, we re-examine the evolutionary history of Myotis to better understand the phenomena driving their unique nuclear, mitochondrial, and biogeographic histories.

  10. Supra-domains: evolutionary units larger than single protein domains.

    PubMed

    Vogel, Christine; Berzuini, Carlo; Bashton, Matthew; Gough, Julian; Teichmann, Sarah A

    2004-02-20

    Domains are the evolutionary units that comprise proteins, and most proteins are built from more than one domain. Domains can be shuffled by recombination to create proteins with new arrangements of domains. Using structural domain assignments, we examined the combinations of domains in the proteins of 131 completely sequenced organisms. We found two-domain and three-domain combinations that recur in different protein contexts with different partner domains. The domains within these combinations have a particular functional and spatial relationship. These units are larger than individual domains and we term them "supra-domains". Amongst the supra-domains, we identified some 1400 (1203 two-domain and 166 three-domain) combinations that are statistically significantly over-represented relative to the occurrence and versatility of the individual component domains. Over one-third of all structurally assigned multi-domain proteins contain these over-represented supra-domains. This means that investigation of the structural and functional relationships of the domains forming these popular combinations would be particularly useful for an understanding of multi-domain protein function and evolution as well as for genome annotation. These and other supra-domains were analysed for their versatility, duplication, their distribution across the three kingdoms of life and their functional classes. By examining the three-dimensional structures of several examples of supra-domains in different biological processes, we identify two basic types of spatial relationships between the component domains: the combined function of the two domains is such that either the geometry of the two domains is crucial and there is a tight constraint on the interface, or the precise orientation of the domains is less important and they are spatially separate. Frequently, the role of the supra-domain becomes clear only once the three-dimensional structure is known. Since this is the case for only a quarter of the supra-domains, we provide a list of the most important unknown supra-domains as potential targets for structural genomics projects.

  11. ATLASGAL-selected massive clumps in the inner Galaxy. III. Dust continuum characterization of an evolutionary sample

    NASA Astrophysics Data System (ADS)

    König, C.; Urquhart, J. S.; Csengeri, T.; Leurini, S.; Wyrowski, F.; Giannetti, A.; Wienen, M.; Pillai, T.; Kauffmann, J.; Menten, K. M.; Schuller, F.

    2017-03-01

    Context. Massive-star formation and the processes involved are still poorly understood. The ATLASGAL survey provides an ideal basis for detailed studies of large numbers of massive-star forming clumps covering the whole range of evolutionary stages. The ATLASGAL Top100 is a sample of clumps selected by their infrared and radio properties to be representative for the whole range of evolutionary stages. Aims: The ATLASGAL Top100 sources are the focus of a number of detailed follow-up studies that will be presented in a series of papers. In the present work we use the dust continuum emission to constrain the physical properties of this sample and identify trends as a function of source evolution. Methods: We determine flux densities from mid-infrared to submillimeter wavelength (8-870 μm) images and use these values to fit their spectral energy distributions and determine their dust temperature and flux. Combining these with recent distances from the literature including maser parallax measurements we determine clump masses, luminosities and column densities. Results: We define four distinct source classes from the available continuum data and arrange these into an evolutionary sequence. This begins with sources found to be dark at 70 μm, followed by 24 μm weak sources with an embedded 70 μm source, continues through mid-infrared bright sources and ends with infrared bright sources associated with radio emission (I.e., H II regions). We find trends for increasing temperature, luminosity, and column density with the proposed evolution sequence, confirming that this sample is representative of different evolutionary stages of massive star formation. Our sources span temperatures from approximately 11 to 41 K, with bolometric luminosities in the range 57 L⊙-3.8 × 106L⊙. The highest masses reach 4.3 × 104M⊙ and peak column densities up to 1.1 × 1024 cm-1, and therefore have the potential to form the most massive O-type stars. We show that at least 93 sources (85%) of this sample have the ability to form massive stars and that most are gravitationally unstable and hence likely to be collapsing. Conclusions: The highest column density ATLASGAL sources cover the whole range of evolutionary stages from the youngest to the most evolved high-mass-star forming clumps. Study of these clumps provides a unique starting point for more in-depth research on massive-star formation in four distinct evolutionary stages whose well defined physical parameters afford more detailed studies. As most of the sample is closer than 5 kpc, these sources are also ideal for follow-up observations with high spatial resolution. Full Table 1, including fluxes, is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/599/A139

  12. A traditional evolutionary history of foot-and-mouth disease viruses in Southeast Asia challenged by analyses of non-structural protein coding sequences

    USDA-ARS?s Scientific Manuscript database

    Molecular epidemiology and evolution of foot-and-mouth disease virus (FMDV) are widely studied using genomic sequences encoding VP1, the capsid protein containing the most relevant antigenic domains. Although sequencing of the full viral genome is not used as a routine diagnostic or surveillance too...

  13. Unusual chromosomal organization of telomeric sequences and expeditious karyotypic differentiation in the recently evolved Mus terricolor complex.

    PubMed

    Sharma, G G; Sharma, T

    1998-01-01

    The Mus terricolor complex displays a stable homozygous arrangement of autosomal heterochromatin variations in the form of accretion of definitive autosomal short arms among three nonoverlapping populations, in concert with an expeditious evolutionary differentiation into three chromosomal species: M. terricolor I, II, and III. In contrast to the highly conservative M. musculus-like chromosomes in the coexisting sibling species, M. booduga, reshuffling and differentiation of centric heterochromatin has occurred in harmony with a revision of centric configurations, resulting in acrocentric and submetacentric autosomes. The chromosomal distribution of the prevalent vertebrate telomeric sequence (TTAGGG)n was examined by fluorescence in situ hybridization to metaphase cells of M. terricolor I, II, and III. An unusual centric organization of internal telomeric sequences was detected in all the submetacentric and acrocentric autosomes. An auxiliary role of these presumably fragile, recombinogenic telomeric sequences in the evolutionary revision of centric configurations in the terricolor complex is hypothesized.

  14. Entropic fluctuations in DNA sequences

    NASA Astrophysics Data System (ADS)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  15. Mathematical model and metaheuristics for simultaneous balancing and sequencing of a robotic mixed-model assembly line

    NASA Astrophysics Data System (ADS)

    Li, Zixiang; Janardhanan, Mukund Nilakantan; Tang, Qiuhua; Nielsen, Peter

    2018-05-01

    This article presents the first method to simultaneously balance and sequence robotic mixed-model assembly lines (RMALB/S), which involves three sub-problems: task assignment, model sequencing and robot allocation. A new mixed-integer programming model is developed to minimize makespan and, using CPLEX solver, small-size problems are solved for optimality. Two metaheuristics, the restarted simulated annealing algorithm and co-evolutionary algorithm, are developed and improved to address this NP-hard problem. The restarted simulated annealing method replaces the current temperature with a new temperature to restart the search process. The co-evolutionary method uses a restart mechanism to generate a new population by modifying several vectors simultaneously. The proposed algorithms are tested on a set of benchmark problems and compared with five other high-performing metaheuristics. The proposed algorithms outperform their original editions and the benchmarked methods. The proposed algorithms are able to solve the balancing and sequencing problem of a robotic mixed-model assembly line effectively and efficiently.

  16. The rRNA evolution and procaryotic phylogeny

    NASA Technical Reports Server (NTRS)

    Fox, G. E.

    1986-01-01

    Studies of ribosomal RNA primary structure allow reconstruction of phylogenetic trees for prokaryotic organisms. Such studies reveal major dichotomy among the bacteria that separates them into eubacteria and archaebacteria. Both groupings are further segmented into several major divisions. The results obtained from 5S rRNA sequences are essentially the same as those obtained with the 16S rRNA data. In the case of Gram negative bacteria the ribosomal RNA sequencing results can also be directly compared with hybridization studies and cytochrome c sequencing studies. There is again excellent agreement among the several methods. It seems likely then that the overall picture of microbial phylogeny that is emerging from the RNA sequence studies is a good approximation of the true history of these organisms. The RNA data allow examination of the evolutionary process in a semi-quantitative way. The secondary structures of these RNAs are largely established. As a result it is possible to recognize examples of local structural evolution. Evolutionary pathways accounting for these events can be proposed and their probability can be assessed.

  17. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: themore » mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.« less

  18. The Most Deeply Conserved Noncoding Sequences in Plants Serve Similar Functions to Those in Vertebrates Despite Large Differences in Evolutionary Rates[W

    PubMed Central

    Burgess, Diane; Freeling, Michael

    2014-01-01

    In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing–associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates. PMID:24681619

  19. The evolutionary rate dynamically tracks changes in HIV-1 epidemics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maljkovic-berry, Irina; Athreya, Gayathri; Daniels, Marcus

    Large-sequence datasets provide an opportunity to investigate the dynamics of pathogen epidemics. Thus, a fast method to estimate the evolutionary rate from large and numerous phylogenetic trees becomes necessary. Based on minimizing tip height variances, we optimize the root in a given phylogenetic tree to estimate the most homogenous evolutionary rate between samples from at least two different time points. Simulations showed that the method had no bias in the estimation of evolutionary rates and that it was robust to tree rooting and topological errors. We show that the evolutionary rates of HIV-1 subtype B and C epidemics have changedmore » over time, with the rate of evolution inversely correlated to the rate of virus spread. For subtype B, the evolutionary rate slowed down and tracked the start of the HAART era in 1996. Subtype C in Ethiopia showed an increase in the evolutionary rate when the prevalence increase markedly slowed down in 1995. Thus, we show that the evolutionary rate of HIV-1 on the population level dynamically tracks epidemic events.« less

  20. A complete mitochondrial genome of wheat (Triticum aestivum cv. Chinese Yumai), and fast evolving mitochondrial genes in higher plants.

    PubMed

    Cui, Peng; Liu, Huitao; Lin, Qiang; Ding, Feng; Zhuo, Guoyin; Hu, Songnian; Liu, Dongcheng; Yang, Wenlong; Zhan, Kehui; Zhang, Aimin; Yu, Jun

    2009-12-01

    Plant mitochondrial genomes, encoding necessary proteins involved in the system of energy production, play an important role in the development and reproduction of the plant. They occupy a specific evolutionary pattern relative to their nuclear counterparts. Here, we determined the winter wheat (Triticum aestivum cv. Chinese Yumai) mitochondrial genome in a length of 452 and 526 bp by shotgun sequencing its BAC library. It contains 202 genes, including 35 known protein-coding genes, three rRNA and 17 tRNA genes, as well as 149 open reading frames (ORFs; greater than 300 bp in length). The sequence is almost identical to the previously reported sequence of the spring wheat (T. aestivum cv. Chinese Spring); we only identified seven SNPs (three transitions and four transversions) and 10 indels (insertions and deletions) between the two independently acquired sequences, and all variations were found in non-coding regions. This result confirmed the accuracy of the previously reported mitochondrial sequence of the Chinese Spring wheat. The nucleotide frequency and codon usage of wheat are common among the lineage of higher plant with a high AT-content of 58%. Molecular evolutionary analysis demonstrated that plant mitochondrial genomes evolved at different rates, which may correlate with substantial variations in metabolic rate and generation time among plant lineages. In addition, through the estimation of the ratio of non-synonymous to synonymous substitution rates between orthologous mitochondrion-encoded genes of higher plants, we found an accelerated evolutionary rate that seems to be the result of relaxed selection.

  1. Evolution of ribozymes in the presence of a mineral surface

    PubMed Central

    Stephenson, James D.; Popović, Milena; Bristow, Thomas F.

    2016-01-01

    Mineral surfaces are often proposed as the sites of critical processes in the emergence of life. Clay minerals in particular are thought to play significant roles in the origin of life including polymerizing, concentrating, organizing, and protecting biopolymers. In these scenarios, the impact of minerals on biopolymer folding is expected to influence evolutionary processes. These processes include both the initial emergence of functional structures in the presence of the mineral and the subsequent transition away from the mineral-associated niche. The initial evolution of function depends upon the number and distribution of sequences capable of functioning in the presence of the mineral, and the transition to new environments depends upon the overlap between sequences that evolve on the mineral surface and sequences that can perform the same functions in the mineral's absence. To examine these processes, we evolved self-cleaving ribozymes in vitro in the presence or absence of Na-saturated montmorillonite clay mineral particles. Starting from a shared population of random sequences, RNA populations were evolved in parallel, along separate evolutionary trajectories. Comparative sequence analysis and activity assays show that the impact of this clay mineral on functional structure selection was minimal; it neither prevented common structures from emerging, nor did it promote the emergence of new structures. This suggests that montmorillonite does not improve RNA's ability to evolve functional structures; however, it also suggests that RNAs that do evolve in contact with montmorillonite retain the same structures in mineral-free environments, potentially facilitating an evolutionary transition away from a mineral-associated niche. PMID:27793980

  2. MySSP: Non-stationary evolutionary sequence simulation, including indels

    PubMed Central

    Rosenberg, Michael S.

    2007-01-01

    MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package. PMID:19325855

  3. TAS3 miR390-dependent loci in non-vascular land plants: towards a comprehensive reconstruction of the gene evolutionary history.

    PubMed

    Morozov, Sergey Y; Milyutina, Irina A; Erokhina, Tatiana N; Ozerova, Liudmila V; Troitsky, Alexey V; Solovyev, Andrey G

    2018-01-01

    Trans-acting small interfering RNAs (ta-siRNAs) are transcribed from protein non-coding genomic TAS loci and belong to a plant-specific class of endogenous small RNAs. These siRNAs have been found to regulate gene expression in most taxa including seed plants, gymnosperms, ferns and mosses. In this study, bioinformatic and experimental PCR-based approaches were used as tools to analyze TAS3 and TAS6 loci in transcriptomes and genomic DNAs from representatives of evolutionary distant non-vascular plant taxa such as Bryophyta, Marchantiophyta and Anthocerotophyta. We revealed previously undiscovered TAS3 loci in plant classes Sphagnopsida and Anthocerotopsida, as well as TAS6 loci in Bryophyta classes Tetraphidiopsida, Polytrichopsida, Andreaeopsida and Takakiopsida. These data further unveil the evolutionary pathway of the miR390-dependent TAS3 loci in land plants. We also identified charophyte alga sequences coding for SUPPRESSOR OF GENE SILENCING 3 (SGS3), which is required for generation of ta-siRNAs in plants, and hypothesized that the appearance of TAS3-related sequences could take place at a very early step in evolutionary transition from charophyte algae to an earliest common ancestor of land plants.

  4. Observing Clonal Dynamics across Spatiotemporal Axes: A Prelude to Quantitative Fitness Models for Cancer.

    PubMed

    McPherson, Andrew W; Chan, Fong Chun; Shah, Sohrab P

    2018-02-01

    The ability to accurately model evolutionary dynamics in cancer would allow for prediction of progression and response to therapy. As a prelude to quantitative understanding of evolutionary dynamics, researchers must gather observations of in vivo tumor evolution. High-throughput genome sequencing now provides the means to profile the mutational content of evolving tumor clones from patient biopsies. Together with the development of models of tumor evolution, reconstructing evolutionary histories of individual tumors generates hypotheses about the dynamics of evolution that produced the observed clones. In this review, we provide a brief overview of the concepts involved in predicting evolutionary histories, and provide a workflow based on bulk and targeted-genome sequencing. We then describe the application of this workflow to time series data obtained for transformed and progressed follicular lymphomas (FL), and contrast the observed evolutionary dynamics between these two subtypes. We next describe results from a spatial sampling study of high-grade serous (HGS) ovarian cancer, propose mechanisms of disease spread based on the observed clonal mixtures, and provide examples of diversification through subclonal acquisition of driver mutations and convergent evolution. Finally, we state implications of the techniques discussed in this review as a necessary but insufficient step on the path to predictive modelling of disease dynamics. Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved.

  5. Evolutionary Dynamics of Pathoadaptation Revealed by Three Independent Acquisitions of the VirB/D4 Type IV Secretion System in Bartonella.

    PubMed

    Harms, Alexander; Segers, Francisca H I D; Quebatte, Maxime; Mistl, Claudia; Manfredi, Pablo; Körner, Jonas; Chomel, Bruno B; Kosoy, Michael; Maruyama, Soichi; Engel, Philipp; Dehio, Christoph

    2017-03-01

    The α-proteobacterial genus Bartonella comprises a group of ubiquitous mammalian pathogens that are studied as a model for the evolution of bacterial pathogenesis. Vast abundance of two particular phylogenetic lineages of Bartonella had been linked to enhanced host adaptability enabled by lineage-specific acquisition of a VirB/D4 type IV secretion system (T4SS) and parallel evolution of complex effector repertoires. However, the limited availability of genome sequences from one of those lineages as well as other, remote branches of Bartonella has so far hampered comprehensive understanding of how the VirB/D4 T4SS and its effectors called Beps have shaped Bartonella evolution. Here, we report the discovery of a third repertoire of Beps associated with the VirB/D4 T4SS of B. ancashensis, a novel human pathogen that lacks any signs of host adaptability and is only distantly related to the two species-rich lineages encoding a VirB/D4 T4SS. Furthermore, sequencing of ten new Bartonella isolates from under-sampled lineages enabled combined in silico analyses and wet lab experiments that suggest several parallel layers of functional diversification during evolution of the three Bep repertoires from a single ancestral effector. Our analyses show that the Beps of B. ancashensis share many features with the two other repertoires, but may represent a more ancestral state that has not yet unleashed the adaptive potential of such an effector set. We anticipate that the effectors of B. ancashensis will enable future studies to dissect the evolutionary history of Bartonella effectors and help unraveling the evolutionary forces underlying bacterial host adaptation. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  7. Evolutionary origins of the endosperm in flowering plants

    PubMed Central

    Baroux, Célia; Spillane, Charles; Grossniklaus, Ueli

    2002-01-01

    The evolutionary origin of double fertilization and the resultant endosperm tissue in flowering plants remains a puzzle, despite over a century of research. The recent resurgence of approaches to evolutionary developmental biology combining comparative biology with phylogenetics provides new understanding of endosperm origins. PMID:12225592

  8. Eco-Evo PVAs: Incorporating Eco-Evolutionary Processes into Population Viability Models

    EPA Science Inventory

    We synthesize how advances in computational methods and population genomics can be combined within an Ecological-Evolutionary (Eco-Evo) PVA model. Eco-Evo PVA models are powerful new tools for understanding the influence of evolutionary processes on plant and animal population pe...

  9. The Awesome Power of Yeast Evolutionary Genetics: New Genome Sequences and Strain Resources for the Saccharomyces sensu stricto Genus

    PubMed Central

    Scannell, Devin R.; Zill, Oliver A.; Rokas, Antonis; Payen, Celia; Dunham, Maitreya J.; Eisen, Michael B.; Rine, Jasper; Johnston, Mark; Hittinger, Chris Todd

    2011-01-01

    High-quality, well-annotated genome sequences and standardized laboratory strains fuel experimental and evolutionary research. We present improved genome sequences of three species of Saccharomyces sensu stricto yeasts: S. bayanus var. uvarum (CBS 7001), S. kudriavzevii (IFO 1802T and ZP 591), and S. mikatae (IFO 1815T), and describe their comparison to the genomes of S. cerevisiae and S. paradoxus. The new sequences, derived by assembling millions of short DNA sequence reads together with previously published Sanger shotgun reads, have vastly greater long-range continuity and far fewer gaps than the previously available genome sequences. New gene predictions defined a set of 5261 protein-coding orthologs across the five most commonly studied Saccharomyces yeasts, enabling a re-examination of the tempo and mode of yeast gene evolution and improved inferences of species-specific gains and losses. To facilitate experimental investigations, we generated genetically marked, stable haploid strains for all three of these Saccharomyces species. These nearly complete genome sequences and the collection of genetically marked strains provide a valuable toolset for comparative studies of gene function, metabolism, and evolution, and render Saccharomyces sensu stricto the most experimentally tractable model genus. These resources are freely available and accessible through www.SaccharomycesSensuStricto.org. PMID:22384314

  10. Evidence Combination From an Evolutionary Game Theory Perspective

    PubMed Central

    Deng, Xinyang; Han, Deqiang; Dezert, Jean; Deng, Yong; Shyr, Yu

    2017-01-01

    Dempster-Shafer evidence theory is a primary methodology for multi-source information fusion because it is good at dealing with uncertain information. This theory provides a Dempster’s rule of combination to synthesize multiple evidences from various information sources. However, in some cases, counter-intuitive results may be obtained based on that combination rule. Numerous new or improved methods have been proposed to suppress these counter-intuitive results based on perspectives, such as minimizing the information loss or deviation. Inspired by evolutionary game theory, this paper considers a biological and evolutionary perspective to study the combination of evidences. An evolutionary combination rule (ECR) is proposed to help find the most biologically supported proposition in a multi-evidence system. Within the proposed ECR, we develop a Jaccard matrix game (JMG) to formalize the interaction between propositions in evidences, and utilize the replicator dynamics to mimick the evolution of propositions. Experimental results show that the proposed ECR can effectively suppress the counter-intuitive behaviors appeared in typical paradoxes of evidence theory, compared with many existing methods. Properties of the ECR, such as solution’s stability and convergence, have been mathematically proved as well. PMID:26285231

  11. Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution

    PubMed Central

    Wolf, Maxim Y; Wolf, Yuri I; Koonin, Eugene V

    2008-01-01

    Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude. Conclusion Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution. Reviewers This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section. PMID:18840284

  12. Evolutionary trajectory of Pack-MULEs is determined by their epigenetic status

    USDA-ARS?s Scientific Manuscript database

    Acquisition and rearrangement of host genes by transposable elements is one mechanism to increase gene diversity. The rice genome is replete in such sequences and while ~3,000 Pack- Mutator-like transposable elements containing gene sequences (Pack-MULEs) have been identified, their function remains...

  13. Phylogenetic estimates of diversification rate are affected by molecular rate variation.

    PubMed

    Duchêne, D A; Hua, X; Bromham, L

    2017-10-01

    Molecular phylogenies are increasingly being used to investigate the patterns and mechanisms of macroevolution. In particular, node heights in a phylogeny can be used to detect changes in rates of diversification over time. Such analyses rest on the assumption that node heights in a phylogeny represent the timing of diversification events, which in turn rests on the assumption that evolutionary time can be accurately predicted from DNA sequence divergence. But there are many influences on the rate of molecular evolution, which might also influence node heights in molecular phylogenies, and thus affect estimates of diversification rate. In particular, a growing number of studies have revealed an association between the net diversification rate estimated from phylogenies and the rate of molecular evolution. Such an association might, by influencing the relative position of node heights, systematically bias estimates of diversification time. We simulated the evolution of DNA sequences under several scenarios where rates of diversification and molecular evolution vary through time, including models where diversification and molecular evolutionary rates are linked. We show that commonly used methods, including metric-based, likelihood and Bayesian approaches, can have a low power to identify changes in diversification rate when molecular substitution rates vary. Furthermore, the association between the rates of speciation and molecular evolution rate can cause the signature of a slowdown or speedup in speciation rates to be lost or misidentified. These results suggest that the multiple sources of variation in molecular evolutionary rates need to be considered when inferring macroevolutionary processes from phylogenies. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.

  14. Efficient high-throughput sequencing of a laser microdissected chromosome arm

    PubMed Central

    2013-01-01

    Background Genomic sequence assemblies are key tools for a broad range of gene function and evolutionary studies. The diploid amphibian Xenopus tropicalis plays a pivotal role in these fields due to its combination of experimental flexibility, diploid genome, and early-branching tetrapod taxonomic position, having diverged from the amniote lineage ~360 million years ago. A genome assembly and a genetic linkage map have recently been made available. Unfortunately, large gaps in the linkage map attenuate long-range integrity of the genome assembly. Results We laser dissected the short arm of X. tropicalis chromosome 7 for next generation sequencing and computational mapping to the reference genome. This arm is of particular interest as it encodes the sex determination locus, but its genetic map contains large gaps which undermine available genome assemblies. Whole genome amplification of 15 laser-microdissected 7p arms followed by next generation sequencing yielded ~35 million reads, over four million of which uniquely mapped to the X. tropicalis genome. Our analysis placed more than 200 previously unmapped scaffolds on the analyzed chromosome arm, providing valuable low-resolution physical map information for de novo genome assembly. Conclusion We present a new approach for improving and validating genetic maps and sequence assemblies. Whole genome amplification of 15 microdissected chromosome arms provided sufficient high-quality material for localizing previously unmapped scaffolds and genes as well as recognizing mislocalized scaffolds. PMID:23714049

  15. Characterization and prediction of residues determining protein functional specificity.

    PubMed

    Capra, John A; Singh, Mona

    2008-07-01

    Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular functional specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs. We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolutionary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs. Datasets and GroupSim code are available online at http://compbio.cs.princeton.edu/specificity/. Supplementary data are available at Bioinformatics online.

  16. Novel features of ARS selection in budding yeast Lachancea kluyveri

    PubMed Central

    2011-01-01

    Background The characterization of DNA replication origins in yeast has shed much light on the mechanisms of initiation of DNA replication. However, very little is known about the evolution of origins or the evolution of mechanisms through which origins are recognized by the initiation machinery. This lack of understanding is largely due to the vast evolutionary distances between model organisms in which origins have been examined. Results In this study we have isolated and characterized autonomously replicating sequences (ARSs) in Lachancea kluyveri - a pre-whole genome duplication (WGD) budding yeast. Through a combination of experimental work and rigorous computational analysis, we show that L. kluyveri ARSs require a sequence that is similar but much longer than the ARS Consensus Sequence well defined in Saccharomyces cerevisiae. Moreover, compared with S. cerevisiae and K. lactis, the replication licensing machinery in L. kluyveri seems more tolerant to variations in the ARS sequence composition. It is able to initiate replication from almost all S. cerevisiae ARSs tested and most Kluyveromyces lactis ARSs. In contrast, only about half of the L. kluyveri ARSs function in S. cerevisiae and less than 10% function in K. lactis. Conclusions Our findings demonstrate a replication initiation system with novel features and underscore the functional diversity within the budding yeasts. Furthermore, we have developed new approaches for analyzing biologically functional DNA sequences with ill-defined motifs. PMID:22204614

  17. Novel features of ARS selection in budding yeast Lachancea kluyveri.

    PubMed

    Liachko, Ivan; Tanaka, Emi; Cox, Katherine; Chung, Shau Chee Claire; Yang, Lu; Seher, Arael; Hallas, Lindsay; Cha, Eugene; Kang, Gina; Pace, Heather; Barrow, Jasmine; Inada, Maki; Tye, Bik-Kwoon; Keich, Uri

    2011-12-28

    The characterization of DNA replication origins in yeast has shed much light on the mechanisms of initiation of DNA replication. However, very little is known about the evolution of origins or the evolution of mechanisms through which origins are recognized by the initiation machinery. This lack of understanding is largely due to the vast evolutionary distances between model organisms in which origins have been examined. In this study we have isolated and characterized autonomously replicating sequences (ARSs) in Lachancea kluyveri - a pre-whole genome duplication (WGD) budding yeast. Through a combination of experimental work and rigorous computational analysis, we show that L. kluyveri ARSs require a sequence that is similar but much longer than the ARS Consensus Sequence well defined in Saccharomyces cerevisiae. Moreover, compared with S. cerevisiae and K. lactis, the replication licensing machinery in L. kluyveri seems more tolerant to variations in the ARS sequence composition. It is able to initiate replication from almost all S. cerevisiae ARSs tested and most Kluyveromyces lactis ARSs. In contrast, only about half of the L. kluyveri ARSs function in S. cerevisiae and less than 10% function in K. lactis. Our findings demonstrate a replication initiation system with novel features and underscore the functional diversity within the budding yeasts. Furthermore, we have developed new approaches for analyzing biologically functional DNA sequences with ill-defined motifs.

  18. Deep-sequencing to resolve complex diversity of apicomplexan parasites in platypuses and echidnas: Proof of principle for wildlife disease investigation.

    PubMed

    Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard

    2017-11-01

    The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

    DOE PAGES

    Zhang, Qian; Jun, Se -Ran; Leuze, Michael; ...

    2017-01-19

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less

  20. Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Qian; Jun, Se -Ran; Leuze, Michael

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less

  1. Dense infraspecific sampling reveals rapid and independent trajectories of plastome degradation in a heterotrophic orchid complex.

    PubMed

    Barrett, Craig F; Wicke, Susann; Sass, Chodon

    2018-05-01

    Heterotrophic plants provide excellent opportunities to study the effects of altered selective regimes on genome evolution. Plastid genome (plastome) studies in heterotrophic plants are often based on one or a few highly divergent species or sequences as representatives of an entire lineage, thus missing important evolutionary-transitory events. Here, we present the first infraspecific analysis of plastome evolution in any heterotrophic plant. By combining genome skimming and targeted sequence capture, we address hypotheses on the degree and rate of plastome degradation in a complex of leafless orchids (Corallorhiza striata) across its geographic range. Plastomes provide strong support for relationships and evidence of reciprocal monophyly between C. involuta and the endangered C. bentleyi. Plastome degradation is extensive, occurring rapidly over a few million years, with evidence of differing rates of genomic change among the two principal clades of the complex. Genome skimming and targeted sequence capture differ widely in coverage depth overall, with depth in targeted sequence capture datasets varying immensely across the plastome as a function of GC content. These findings will help to fill a knowledge gap in models of heterotrophic plastid genome evolution, and have implications for future studies in heterotrophs. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.

  2. Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

    PubMed Central

    Zhang, Qian; Jun, Se-Ran; Leuze, Michael; Ussery, David; Nookaew, Intawat

    2017-01-01

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. PMID:28102365

  3. Accelerated probabilistic inference of RNA structure evolution

    PubMed Central

    Holmes, Ian

    2005-01-01

    Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. PMID:15790387

  4. SIGMAR1 mutation associated with autosomal recessive Silver-like syndrome

    PubMed Central

    Horga, Alejandro; Tomaselli, Pedro J.; Gonzalez, Michael A.; Laurà, Matilde; Muntoni, Francesco; Manzur, Adnan Y.; Hanna, Michael G.; Blake, Julian C.; Houlden, Henry; Züchner, Stephan

    2016-01-01

    Objective: To describe the genetic and clinical features of a simplex patient with distal hereditary motor neuropathy (dHMN) and lower limb spasticity (Silver-like syndrome) due to a mutation in the sigma nonopioid intracellular receptor–1 gene (SIGMAR1) and review the phenotypic spectrum of mutations in this gene. Methods: We used whole-exome sequencing to investigate the proband. The variants of interest were investigated for segregation in the family using Sanger sequencing. Subsequently, a larger cohort of 16 unrelated dHMN patients was specifically screened for SIGMAR1 mutations. Results: In the proband, we identified a homozygous missense variant (c.194T>A, p.Leu65Gln) in exon 2 of SIGMAR1 as the probable causative mutation. Pathogenicity is supported by evolutionary conservation, in silico analyses, and the strong phenotypic similarities with previously reported cases carrying coding sequence mutations in SIGMAR1. No other mutations were identified in 16 additional patients with dHMN. Conclusions: We suggest that coding sequence mutations in SIGMAR1 present clinically with a combination of dHMN and pyramidal tract signs, with or without spasticity, in the lower limbs. Preferential involvement of extensor muscles of the upper limbs may be a distinctive feature of the disease. These observations should be confirmed in future studies. PMID:27629094

  5. SIGMAR1 mutation associated with autosomal recessive Silver-like syndrome.

    PubMed

    Horga, Alejandro; Tomaselli, Pedro J; Gonzalez, Michael A; Laurà, Matilde; Muntoni, Francesco; Manzur, Adnan Y; Hanna, Michael G; Blake, Julian C; Houlden, Henry; Züchner, Stephan; Reilly, Mary M

    2016-10-11

    To describe the genetic and clinical features of a simplex patient with distal hereditary motor neuropathy (dHMN) and lower limb spasticity (Silver-like syndrome) due to a mutation in the sigma nonopioid intracellular receptor-1 gene (SIGMAR1) and review the phenotypic spectrum of mutations in this gene. We used whole-exome sequencing to investigate the proband. The variants of interest were investigated for segregation in the family using Sanger sequencing. Subsequently, a larger cohort of 16 unrelated dHMN patients was specifically screened for SIGMAR1 mutations. In the proband, we identified a homozygous missense variant (c.194T>A, p.Leu65Gln) in exon 2 of SIGMAR1 as the probable causative mutation. Pathogenicity is supported by evolutionary conservation, in silico analyses, and the strong phenotypic similarities with previously reported cases carrying coding sequence mutations in SIGMAR1. No other mutations were identified in 16 additional patients with dHMN. We suggest that coding sequence mutations in SIGMAR1 present clinically with a combination of dHMN and pyramidal tract signs, with or without spasticity, in the lower limbs. Preferential involvement of extensor muscles of the upper limbs may be a distinctive feature of the disease. These observations should be confirmed in future studies. © 2016 American Academy of Neurology.

  6. Modelling and strategy optimisation for a kind of networked evolutionary games with memories under the bankruptcy mechanism

    NASA Astrophysics Data System (ADS)

    Fu, Shihua; Li, Haitao; Zhao, Guodong

    2018-05-01

    This paper investigates the evolutionary dynamic and strategy optimisation for a kind of networked evolutionary games whose strategy updating rules incorporate 'bankruptcy' mechanism, and the situation that each player's bankruptcy is due to the previous continuous low profits gaining from the game is considered. First, by using semi-tensor product of matrices method, the evolutionary dynamic of this kind of games is expressed as a higher order logical dynamic system and then converted into its algebraic form, based on which, the evolutionary dynamic of the given games can be discussed. Second, the strategy optimisation problem is investigated, and some free-type control sequences are designed to maximise the total payoff of the whole game. Finally, an illustrative example is given to show that our new results are very effective.

  7. Cell periphery-related proteins as major genomic targets behind the adaptive evolution of an industrial Saccharomyces cerevisiae strain to combined heat and hydrolysate stress.

    PubMed

    Wallace-Salinas, Valeria; Brink, Daniel P; Ahrén, Dag; Gorwa-Grauslund, Marie F

    2015-07-09

    Laboratory evolution is an important tool for developing robust yeast strains for bioethanol production since the biological basis behind combined tolerance requires complex alterations whose proper regulation is difficult to achieve by rational metabolic engineering. Previously, we reported on the evolved industrial Saccharomyces cerevisiae strain ISO12 that had acquired improved tolerance to grow and ferment in the presence of lignocellulose-derived inhibitors at high temperature (39 °C). In the current study, we used comparative genomics to uncover the extent of the genomic alterations that occurred during the evolution process and investigated possible associations between the mutations and the phenotypic traits in ISO12. Through whole-genome sequencing and variant calling we identified a high number of strain-unique SNPs and INDELs in both ISO12 and the parental strain Ethanol Red. The variants were predicted to have 760 non-synonymous effects in both strains combined and were significantly enriched in Gene Ontology terms related to cell periphery, membranes and cell wall. Eleven genes, including MTL1, FLO9/FLO11, and CYC3 were found to be under positive selection in ISO12. Additionally, the FLO genes exhibited changes in copy number, and the alterations to this gene family were correlated with experimental results of multicellularity and invasive growth in the adapted strain. An independent lipidomic analysis revealed further differences between the strains in the content of nine lipid species. Finally, ISO12 displayed improved viability in undiluted spruce hydrolysate that was unrelated to reduction of inhibitors and changes in cell wall integrity, as shown by HPLC and lyticase assays. Together, the results of the sequence comparison and the physiological characterisations indicate that cell-periphery proteins (e.g. extracellular sensors such as MTL1) and peripheral lipids/membranes are important evolutionary targets in the process of adaptation to the combined stresses. The capacity of ISO12 to develop complex colony formation also revealed multicellularity as a possible evolutionary strategy to improve competitiveness and tolerance to environmental stresses (also reflected by the FLO genes). Although a panel of altered genes with high relevance to the novel phenotype was detected, this study also demonstrates that the observed long-term molecular effects of thermal and inhibitor stress have polygenetic basis.

  8. Interordinal gene capture, the phylogenetic position of Steller's sea cow based on molecular and morphological data, and the macroevolutionary history of Sirenia.

    PubMed

    Springer, Mark S; Signore, Anthony V; Paijmans, Johanna L A; Vélez-Juarbe, Jorge; Domning, Daryl P; Bauer, Cameron E; He, Kai; Crerar, Lorelei; Campos, Paula F; Murphy, William J; Meredith, Robert W; Gatesy, John; Willerslev, Eske; MacPhee, Ross D E; Hofreiter, Michael; Campbell, Kevin L

    2015-10-01

    The recently extinct (ca. 1768) Steller's sea cow (Hydrodamalis gigas) was a large, edentulous North Pacific sirenian. The phylogenetic affinities of this taxon to other members of this clade, living and extinct, are uncertain based on previous morphological and molecular studies. We employed hybridization capture methods and second generation sequencing technology to obtain >30kb of exon sequences from 26 nuclear genes for both H. gigas and Dugong dugon. We also obtained complete coding sequences for the tooth-related enamelin (ENAM) gene. Hybridization probes designed using dugong and manatee sequences were both highly effective in retrieving sequences from H. gigas (mean=98.8% coverage), as were more divergent probes for regions of ENAM (99.0% coverage) that were designed exclusively from a proboscidean (African elephant) and a hyracoid (Cape hyrax). New sequences were combined with available sequences for representatives of all other afrotherian orders. We also expanded a previously published morphological matrix for living and fossil Sirenia by adding both new taxa and nine new postcranial characters. Maximum likelihood and parsimony analyses of the molecular data provide robust support for an association of H. gigas and D. dugon to the exclusion of living trichechids (manatees). Parsimony analyses of the morphological data also support the inclusion of H. gigas in Dugongidae with D. dugon and fossil dugongids. Timetree analyses based on calibration density approaches with hard- and soft-bounded constraints suggest that H. gigas and D. dugon diverged in the Oligocene and that crown sirenians last shared a common ancestor in the Eocene. The coding sequence for the ENAM gene in H. gigas does not contain frameshift mutations or stop codons, but there is a transversion mutation (AG to CG) in the acceptor splice site of intron 2. This disruption in the edentulous Steller's sea cow is consistent with previous studies that have documented inactivating mutations in tooth-specific loci of a variety of edentulous and enamelless vertebrates including birds, turtles, aardvarks, pangolins, xenarthrans, and baleen whales. Further, branch-site dN/dS analyses provide evidence for positive selection in ENAM on the stem dugongid branch where extensive tooth reduction occurred, followed by neutral evolution on the Hydrodamalis branch. Finally, we present a synthetic evolutionary tree for living and fossil sirenians showing several key innovations in the history of this clade including character state changes that parallel those that occurred in the evolutionary history of cetaceans. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Conserved domains and SINE diversity during animal evolution.

    PubMed

    Luchetti, Andrea; Mantovani, Barbara

    2013-10-01

    Eukaryotic genomes harbour a number of mobile genetic elements (MGEs); moving from one genomic location to another, they are known to impact on the host genome. Short interspersed elements (SINEs) are well-represented, non-autonomous retroelements and they are likely the most diversified MGEs. In some instances, sequence domains conserved across unrelated SINEs have been identified; remarkably, one of these, called Nin, has been conserved since the Radiata-Bilateria splitting. Here we report on two new domains: Inv, derived from Nin, identified in insects and in deuterostomes, and Pln, restricted to polyneopteran insects. The identification of Inv and Pln sequences allowed us to retrieve new SINEs, two in insects and one in a hemichordate. The diverse structural combination of the different domains in different SINE families, during metazoan evolution, offers a clearer view of SINE diversity and their frequent de novo emergence through module exchange, possibly underlying the high evolutionary success of SINEs. © 2013 Elsevier Inc. All rights reserved.

  10. POOL server: machine learning application for functional site prediction in proteins.

    PubMed

    Somarowthu, Srinivas; Ondrechen, Mary Jo

    2012-08-01

    We present an automated web server for partial order optimum likelihood (POOL), a machine learning application that combines computed electrostatic and geometric information for high-performance prediction of catalytic residues from 3D structures. Input features consist of THEMATICS electrostatics data and pocket information from ConCavity. THEMATICS measures deviation from typical, sigmoidal titration behavior to identify functionally important residues and ConCavity identifies binding pockets by analyzing the surface geometry of protein structures. Both THEMATICS and ConCavity (structure only) do not require the query protein to have any sequence or structure similarity to other proteins. Hence, POOL is applicable to proteins with novel folds and engineered proteins. As an additional option for cases where sequence homologues are available, users can include evolutionary information from INTREPID for enhanced accuracy in site prediction. The web site is free and open to all users with no login requirements at http://www.pool.neu.edu. m.ondrechen@neu.edu Supplementary data are available at Bioinformatics online.

  11. Investigating the Genome Diversity of B. cereus and Evolutionary Aspects of B. anthracis Emergence

    PubMed Central

    Papazisi, Leka; Rasko, David A.; Ratnayake, Shashikala; Bock, Geoff R.; Remortel, Brian G.; Appalla, Lakshmi; Liu, Jia; Dracheva, Tatiana; Braisted, John C.; Shallom, Shamira; Jarrahi, Benham; Snesrud, Erik; Ahn, Susie; Sun, Qiang; Rilstone, Jenifer; Økstad, Ole Andreas; Kolstø, Anne-Brit; Fleischmann, Robert D.; Peterson, Scott N.

    2011-01-01

    Here we report the use of a multi-genome DNA microarray to investigate the genome diversity of Bacillus cereus group members and elucidate the events associated with the emergence of B. anthracis the causative agent of anthrax–a lethal zoonotic disease. We initially performed directed genome sequencing of seven diverse B. cereus strains to identify novel sequences encoded in those genomes. The novel genes identified, combined with those publicly available, allowed the design of a “species” DNA microarray. Comparative genomic hybridization analyses of 41 strains indicates that substantial heterogeneity exists with respect to the genes comprising functional role categories. While the acquisition of the plasmid-encoded pathogenicity island (pXO1) and capsule genes (pXO2) represent a crucial landmark dictating the emergence of B. anthracis, the evolution of this species and its close relatives was associated with an overall a shift in the fraction of genes devoted to energy metabolism, cellular processes, transport, as well as virulence. PMID:21447378

  12. Deconstruction of the Ras switching cycle through saturation mutagenesis

    PubMed Central

    Bandaru, Pradeep; Shah, Neel H; Bhattacharyya, Moitrayee; Barton, John P; Kondo, Yasushi; Cofsky, Joshua C; Gee, Christine L; Chakraborty, Arup K; Kortemme, Tanja; Ranganathan, Rama; Kuriyan, John

    2017-01-01

    Ras proteins are highly conserved signaling molecules that exhibit regulated, nucleotide-dependent switching between active and inactive states. The high conservation of Ras requires mechanistic explanation, especially given the general mutational tolerance of proteins. Here, we use deep mutational scanning, biochemical analysis and molecular simulations to understand constraints on Ras sequence. Ras exhibits global sensitivity to mutation when regulated by a GTPase activating protein and a nucleotide exchange factor. Removing the regulators shifts the distribution of mutational effects to be largely neutral, and reveals hotspots of activating mutations in residues that restrain Ras dynamics and promote the inactive state. Evolutionary analysis, combined with structural and mutational data, argue that Ras has co-evolved with its regulators in the vertebrate lineage. Overall, our results show that sequence conservation in Ras depends strongly on the biochemical network in which it operates, providing a framework for understanding the origin of global selection pressures on proteins. DOI: http://dx.doi.org/10.7554/eLife.27810.001 PMID:28686159

  13. The molecular dynamics of long noncoding RNA control of transcription in PTEN and its pseudogene

    PubMed Central

    Lister, Nicholas; Shevchenko, Galina; Walshe, James L.; Groen, Jessica; Johnsson, Per; Vidarsdóttir, Linda; Grander, Dan; Ataide, Sandro F.; Morris, Kevin V.

    2017-01-01

    RNA has been found to interact with chromatin and modulate gene transcription. In human cells, little is known about how long noncoding RNAs (lncRNAs) interact with target loci in the context of chromatin. We find here, using the phosphatase and tensin homolog (PTEN) pseudogene as a model system, that antisense lncRNAs interact first with a 5′ UTR-containing promoter-spanning transcript, which is then followed by the recruitment of DNA methyltransferase 3a (DNMT3a), ultimately resulting in the transcriptional and epigenetic control of gene expression. Moreover, we find that the lncRNA and promoter-spanning transcript interaction are based on a combination of structural and sequence components of the antisense lncRNA. These observations suggest, on the basis of this one example, that evolutionary pressures may be placed on RNA structure more so than sequence conservation. Collectively, the observations presented here suggest a much more complex and vibrant RNA regulatory world may be operative in the regulation of gene expression. PMID:28847966

  14. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and analyses of adaptive molecular evolution. Since not all genes in the floral transcriptome will be associated with flowering, these EST resources will also be of interest to plant scientists working on other functions, such as photosynthesis, signal transduction, and metabolic pathways. PMID:15799777

  15. Molecular Evolution and Phylogeography of Co-circulating IHNV and VHSV in Italy

    PubMed Central

    Abbadi, Miriam; Fusaro, Alice; Ceolin, Chiara; Casarotto, Claudia; Quartesan, Rosita; Dalla Pozza, Manuela; Cattoli, Giovanni; Toffan, Anna; Holmes, Edward C.; Panzarin, Valentina

    2016-01-01

    Infectious haematopoietic necrosis virus (IHNV) and viral haemorrhagic septicaemia virus (VHSV) are the most important viral pathogens impacting rainbow trout farming. These viruses are persistent in Italy, where they are responsible for severe disease outbreaks (epizootics) that affect the profitability of the trout industry. Despite the importance of IHNV and VHSV, little is known about their evolution at a local scale, although this is likely to be important for virus eradication and control. To address this issue we performed a detailed molecular evolutionary and epidemiological analysis of IHNV and VHSV in trout farms from northern Italy. Full-length glycoprotein gene sequences of a selection of VHSV (n = 108) and IHNV (n = 89) strains were obtained. This revealed that Italian VHSV strains belong to sublineages Ia1 and Ia2 of genotype Ia and are distributed into 7 genetic clusters. In contrast, all Italian IHNV isolates fell within genogroup E, for which only a single genetic cluster was identified. More striking was that IHNV has evolved more rapidly than VHSV (mean rates of 11 and 7.3 × 10−4 nucleotide substitutions per site, per year, respectively), indicating that these viruses exhibit fundamentally different evolutionary dynamics. The time to the most recent common ancestor of both IHNV and VHSV was consistent with the first reports of these pathogens in Italy. By combining sequence data with epidemiological information it was possible to identify different patterns of virus spread among trout farms, in which adjacent facilities can be infected by either genetically similar or different viruses, and farms located in different water catchments can be infected by identical strains. Overall, these findings highlight the importance of combining molecular and epidemiological information to identify the determinants of IHN and VHS spread, and to provide data that is central to future surveillance strategies and possibly control. PMID:27602026

  16. Supermatrix and species tree methods resolve phylogenetic relationships within the big cats, Panthera (Carnivora: Felidae).

    PubMed

    Davis, Brian W; Li, Gang; Murphy, William J

    2010-07-01

    The pantherine lineage of cats diverged from the remainder of modern Felidae less than 11 million years ago and consists of the five big cats of the genus Panthera, the lion, tiger, jaguar, leopard, and snow leopard, as well as the closely related clouded leopard. A significant problem exists with respect to the precise phylogeny of these highly threatened great cats. Despite multiple publications on the subject, no two molecular studies have reconstructed Panthera with the same topology. These evolutionary relationships remain unresolved partially due to the recent and rapid radiation of pantherines in the Pliocene, individual speciation events occurring within less than 1 million years, and probable introgression between lineages following their divergence. We provide an alternative, highly supported interpretation of the evolutionary history of the pantherine lineage using novel and published DNA sequence data from the autosomes, both sex chromosomes and the mitochondrial genome. New sequences were generated for 39 single-copy regions of the felid Y chromosome, as well as four mitochondrial and four autosomal gene segments, totaling 28.7 kb. Phylogenetic analysis of these new data, combined with all published data in GenBank, highlighted the prevalence of phylogenetic disparities stemming either from the amplification of a mitochondrial to nuclear translocation event (numt), or errors in species identification. Our 47.6 kb combined dataset was analyzed as a supermatrix and with respect to individual partitions using maximum likelihood and Bayesian phylogenetic inference, in conjunction with Bayesian Estimation of Species Trees (BEST) which accounts for heterogeneous gene histories. Our results yield a robust consensus topology supporting the monophyly of lion and leopard, with jaguar sister to these species, as well as a sister species relationship of tiger and snow leopard. These results highlight new avenues for the study of speciation genomics and understanding the historical events surrounding the origin of the members of this lineage. Copyright 2010 Elsevier Inc. All rights reserved.

  17. Molecular exploration of hidden diversity in the Indo-West Pacific sciaenid clade

    PubMed Central

    Lo, Pei-Chun; Liu, Shu-Hui; Nor, Siti Azizah Mohd

    2017-01-01

    The family Sciaenidae, known as croakers or drums, is one of the largest perciform fish families. A recent multi-gene based study investigating the phylogeny and biogeography of global sciaenids revealed that the origin and early diversification of this family occurred in tropical America during the Late Oligocene—Early Miocene before undergoing range expansions to other seas including the Indo-West Pacific, where high species richness is observed. Despite this clarification of the overall evolutionary history of the family, knowledge of the taxonomy and phylogeny of sciaenid genera endemic to the Indo-West Pacific is still limited due to lack of a thorough survey of all taxa. In this study, we used DNA-based approaches to investigate the evolutionary relationships, to explore the species diversity, and to elucidate the taxonomic status of sciaenid species/genera within the Indo-West Pacific clade. Three datasets were herein built for the above objectives: the combined dataset (248 samples from 45 currently recognized species) from one nuclear gene (RAG1) and one mitochondrial gene (COI); the dataset with only RAG1 gene sequences (245 samples from 44 currently recognized species); and the dataset with only COI gene sequences (308 samples from 51 currently recognized species). The latter was primarily used for our biodiversity exploration with two different species delimitation methods (Automatic Barcode Gap Discovery, ABGD and Generalized Mixed Yule Coalescent, GMYC). The results were further evaluated with help of four supplementary criteria for species delimitation (genetic similarity, monophyly inferred from individual gene and combined data trees, geographic distribution, and morphology). Our final results confirmed the validity of 32 currently recognized species and identified several potential new species waiting for formal descriptions. We also reexamined the taxonomic status of the genera, Larimichthys, Nibea, Protonibea and Megalonibea, and suggested a revision of Nibea and proposed a new genus Pseudolarimichthys. PMID:28453569

  18. Molecular exploration of hidden diversity in the Indo-West Pacific sciaenid clade.

    PubMed

    Lo, Pei-Chun; Liu, Shu-Hui; Nor, Siti Azizah Mohd; Chen, Wei-Jen

    2017-01-01

    The family Sciaenidae, known as croakers or drums, is one of the largest perciform fish families. A recent multi-gene based study investigating the phylogeny and biogeography of global sciaenids revealed that the origin and early diversification of this family occurred in tropical America during the Late Oligocene-Early Miocene before undergoing range expansions to other seas including the Indo-West Pacific, where high species richness is observed. Despite this clarification of the overall evolutionary history of the family, knowledge of the taxonomy and phylogeny of sciaenid genera endemic to the Indo-West Pacific is still limited due to lack of a thorough survey of all taxa. In this study, we used DNA-based approaches to investigate the evolutionary relationships, to explore the species diversity, and to elucidate the taxonomic status of sciaenid species/genera within the Indo-West Pacific clade. Three datasets were herein built for the above objectives: the combined dataset (248 samples from 45 currently recognized species) from one nuclear gene (RAG1) and one mitochondrial gene (COI); the dataset with only RAG1 gene sequences (245 samples from 44 currently recognized species); and the dataset with only COI gene sequences (308 samples from 51 currently recognized species). The latter was primarily used for our biodiversity exploration with two different species delimitation methods (Automatic Barcode Gap Discovery, ABGD and Generalized Mixed Yule Coalescent, GMYC). The results were further evaluated with help of four supplementary criteria for species delimitation (genetic similarity, monophyly inferred from individual gene and combined data trees, geographic distribution, and morphology). Our final results confirmed the validity of 32 currently recognized species and identified several potential new species waiting for formal descriptions. We also reexamined the taxonomic status of the genera, Larimichthys, Nibea, Protonibea and Megalonibea, and suggested a revision of Nibea and proposed a new genus Pseudolarimichthys.

  19. Transcriptome-Mining for Single-Copy Nuclear Markers in Ferns

    PubMed Central

    Rothfels, Carl J.; Larsson, Anders; Li, Fay-Wei; Sigel, Erin M.; Huiet, Layne; Burge, Dylan O.; Ruhsam, Markus; Graham, Sean W.; Stevenson, Dennis W.; Wong, Gane Ka-Shu; Korall, Petra; Pryer, Kathleen M.

    2013-01-01

    Background Molecular phylogenetic investigations have revolutionized our understanding of the evolutionary history of ferns—the second-most species-rich major group of vascular plants, and the sister clade to seed plants. The general absence of genomic resources available for this important group of plants, however, has resulted in the strong dependence of these studies on plastid data; nuclear or mitochondrial data have been rarely used. In this study, we utilize transcriptome data to design primers for nuclear markers for use in studies of fern evolutionary biology, and demonstrate the utility of these markers across the largest order of ferns, the Polypodiales. Principal Findings We present 20 novel single-copy nuclear regions, across 10 distinct protein-coding genes: ApPEFP_C, cryptochrome 2, cryptochrome 4, DET1, gapCpSh, IBR3, pgiC, SQD1, TPLATE, and transducin. These loci, individually and in combination, show strong resolving power across the Polypodiales phylogeny, and are readily amplified and sequenced from our genomic DNA test set (from 15 diploid Polypodiales species). For each region, we also present transcriptome alignments of the focal locus and related paralogs—curated broadly across ferns—that will allow researchers to develop their own primer sets for fern taxa outside of the Polypodiales. Analyses of sequence data generated from our genomic DNA test set reveal strong effects of partitioning schemes on support levels and, to a much lesser extent, on topology. A model partitioned by codon position is strongly favored, and analyses of the combined data yield a Polypodiales phylogeny that is well-supported and consistent with earlier studies of this group. Conclusions The 20 single-copy regions presented here more than triple the single-copy nuclear regions available for use in ferns. They provide a much-needed opportunity to assess plastid-derived hypotheses of relationships within the ferns, and increase our capacity to explore aspects of fern evolution previously unavailable to scientific investigation. PMID:24116189

  20. Datasets for evolutionary comparative genomics

    PubMed Central

    Liberles, David A

    2005-01-01

    Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. PMID:16086856

  1. Voltage-Gated Sodium Channels: Evolutionary History and Distinctive Sequence Features.

    PubMed

    Kasimova, M A; Granata, D; Carnevale, V

    2016-01-01

    Voltage-gated sodium channels (Nav) are responsible for the rising phase of the action potential. Their role in electrical signal transmission is so relevant that their emergence is believed to be one of the crucial factors enabling development of nervous system. The presence of voltage-gated sodium-selective channels in bacteria (BacNav) has raised questions concerning the evolutionary history of the ones in animals. Here we review some of the milestones in the field of Nav phylogenetic analysis and discuss some of the most important sequence features that distinguish these channels from voltage-gated potassium channels and transient receptor potential channels. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. A history estimate and evolutionary analysis of rabies virus variants in China.

    PubMed

    Ming, Pinggang; Yan, Jiaxin; Rayner, Simon; Meng, Shengli; Xu, Gelin; Tang, Qing; Wu, Jie; Luo, Jing; Yang, Xiaoming

    2010-03-01

    To investigate the evolutionary dynamics of rabies virus (RABV) in China, we collected and sequenced 55 isolates sampled from 14 Chinese provinces over the last 40 years and performed a coalescent-based analysis of the G gene. This revealed that the RABV currently circulating in China is composed of three main groups. Bayesian coalescent analysis estimated the date of the most recent common ancestor for the current RABV Chinese strains to be 1412 (with a 95 % confidence interval of 1006-1736). The estimated mean substitution rate for the G gene sequences (3.961x10(-4) substitutions per site per year) was in accordance with previous reports for RABV.

  3. On the path to genetic novelties: insights from programmed DNA elimination and RNA splicing.

    PubMed

    Catania, Francesco; Schmitz, Jürgen

    2015-01-01

    Understanding how genetic novelties arise is a central goal of evolutionary biology. To this end, programmed DNA elimination and RNA splicing deserve special consideration. While programmed DNA elimination reshapes genomes by eliminating chromatin during organismal development, RNA splicing rearranges genetic messages by removing intronic regions during transcription. Small RNAs help to mediate this class of sequence reorganization, which is not error-free. It is this imperfection that makes programmed DNA elimination and RNA splicing excellent candidates for generating evolutionary novelties. Leveraging a number of these two processes' mechanistic and evolutionary properties, which have been uncovered over the past years, we present recently proposed models and empirical evidence for how splicing can shape the structure of protein-coding genes in eukaryotes. We also chronicle a number of intriguing similarities between the processes of programmed DNA elimination and RNA splicing, and highlight the role that the variation in the population-genetic environment may play in shaping their target sequences. © 2015 Wiley Periodicals, Inc.

  4. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors

    PubMed Central

    Jin, Jinpu; Zhang, He; Kong, Lei; Gao, Ge; Luo, Jingchu

    2014-01-01

    With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences. PMID:24174544

  5. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis.

    PubMed

    Kumar, Sudhir; Stecher, Glen; Peterson, Daniel; Tamura, Koichiro

    2012-10-15

    There is a growing need in the research community to apply the molecular evolutionary genetics analysis (MEGA) software tool for batch processing a large number of datasets and to integrate it into analysis workflows. Therefore, we now make available the computing core of the MEGA software as a stand-alone executable (MEGA-CC), along with an analysis prototyper (MEGA-Proto). MEGA-CC provides users with access to all the computational analyses available through MEGA's graphical user interface version. This includes methods for multiple sequence alignment, substitution model selection, evolutionary distance estimation, phylogeny inference, substitution rate and pattern estimation, tests of natural selection and ancestral sequence inference. Additionally, we have upgraded the source code for phylogenetic analysis using the maximum likelihood methods for parallel execution on multiple processors and cores. Here, we describe MEGA-CC and outline the steps for using MEGA-CC in tandem with MEGA-Proto for iterative and automated data analysis. http://www.megasoftware.net/.

  6. Genomics of Actinobacteria: Tracing the Evolutionary History of an Ancient Phylum†

    PubMed Central

    Ventura, Marco; Canchaya, Carlos; Tauch, Andreas; Chandra, Govind; Fitzgerald, Gerald F.; Chater, Keith F.; van Sinderen, Douwe

    2007-01-01

    Summary: Actinobacteria constitute one of the largest phyla among Bacteria and represent gram-positive bacteria with a high G+C content in their DNA. This bacterial group includes microorganisms exhibiting a wide spectrum of morphologies, from coccoid to fragmenting hyphal forms, as well as possessing highly variable physiological and metabolic properties. Furthermore, Actinobacteria members have adopted different lifestyles, and can be pathogens (e.g., Corynebacterium, Mycobacterium, Nocardia, Tropheryma, and Propionibacterium), soil inhabitants (Streptomyces), plant commensals (Leifsonia), or gastrointestinal commensals (Bifidobacterium). The divergence of Actinobacteria from other bacteria is ancient, making it impossible to identify the phylogenetically closest bacterial group to Actinobacteria. Genome sequence analysis has revolutionized every aspect of bacterial biology by enhancing the understanding of the genetics, physiology, and evolutionary development of bacteria. Various actinobacterial genomes have been sequenced, revealing a wide genomic heterogeneity probably as a reflection of their biodiversity. This review provides an account of the recent explosion of actinobacterial genomics data and an attempt to place this in a biological and evolutionary context. PMID:17804669

  7. Double-stranded telomeric DNA binding proteins: Diversity matters.

    PubMed

    Červenák, Filip; Juríková, Katarína; Sepšiová, Regina; Neboháčová, Martina; Nosek, Jozef; Tomáška, L'ubomír

    2017-01-01

    Telomeric sequences constitute only a small fraction of the whole genome yet they are crucial for ensuring genomic stability. This function is in large part mediated by protein complexes recruited to telomeric sequences by specific telomere-binding proteins (TBPs). Although the principal tasks of nuclear telomeres are the same in all eukaryotes, TBPs in various taxa exhibit a surprising diversity indicating their distinct evolutionary origin. This diversity is especially pronounced in ascomycetous yeasts where they must have co-evolved with rapidly diversifying sequences of telomeric repeats. In this article we (i) provide a historical overview of the discoveries leading to the current list of TBPs binding to double-stranded (ds) regions of telomeres, (ii) describe examples of dsTBPs highlighting their diversity in even closely related species, and (iii) speculate about possible evolutionary trajectories leading to a long list of various dsTBPs fulfilling the same general role(s) in their own unique ways.

  8. The genome sequence of the emerging common midwife toad virus identifies an evolutionary intermediate within ranaviruses.

    PubMed

    Mavian, Carla; López-Bueno, Alberto; Balseiro, Ana; Casais, Rosa; Alcamí, Antonio; Alejo, Alí

    2012-04-01

    Worldwide amphibian population declines have been ascribed to global warming, increasing pollution levels, and other factors directly related to human activities. These factors may additionally be favoring the emergence of novel pathogens. In this report, we have determined the complete genome sequence of the emerging common midwife toad ranavirus (CMTV), which has caused fatal disease in several amphibian species across Europe. Phylogenetic and gene content analyses of the first complete genomic sequence from a ranavirus isolated in Europe show that CMTV is an amphibian-like ranavirus (ALRV). However, the CMTV genome structure is novel and represents an intermediate evolutionary stage between the two previously described ALRV groups. We find that CMTV clusters with several other ranaviruses isolated from different hosts and locations which might also be included in this novel ranavirus group. This work sheds light on the phylogenetic relationships within this complex group of emerging, disease-causing viruses.

  9. Computational analysis and functional expression of ancestral copepod luciferase.

    PubMed

    Takenaka, Yasuhiro; Noda-Ogura, Akiko; Imanishi, Tadashi; Yamaguchi, Atsushi; Gojobori, Takashi; Shigeri, Yasushi

    2013-10-10

    We recently reported the cDNA sequences of 11 copepod luciferases from the superfamily Augaptiloidea in the order Calanoida. They were classified into two groups, Metridinidae and Heterorhabdidae/Lucicutiidae families, by phylogenetic analyses. To elucidate the evolutionary processes, we have now further isolated 12 copepod luciferases from Augaptiloidea species (Metridia asymmetrica, Metridia curticauda, Pleuromamma scutullata, Pleuromamma xiphias, Lucicutia ovaliformis and Heterorhabdus tanneri). Codon-based synonymous/nonsynonymous tests of positive selection for 25 identified copepod luciferases suggested that positive Darwinian selection operated in the evolution of Heterorhabdidae luciferases, whereas two types of Metridinidae luciferases had diversified via neutral mechanism. By in silico analysis of the decoded amino acid sequences of 25 copepod luciferases, we inferred two protein sequences as ancestral copepod luciferases. They were expressed in HEK293 cells where they exhibited notable luciferase activity both in intracellular lysates and cultured media, indicating that the luciferase activity was established before evolutionary diversification of these copepod species. © 2013.

  10. Phylogenetic and Protein Sequence Analysis of Bacterial Chemoreceptors.

    PubMed

    Ortega, Davi R; Zhulin, Igor B

    2018-01-01

    Identifying chemoreceptors in sequenced bacterial genomes, revealing their domain architecture, inferring their evolutionary relationships, and comparing them to chemoreceptors of known function become important steps in genome annotation and chemotaxis research. Here, we describe bioinformatics procedures that enable such analyses, using two closely related bacterial genomes as examples.

  11. Inquiry-Based Learning of Molecular Phylogenetics

    ERIC Educational Resources Information Center

    Campo, Daniel; Garcia-Vazquez, Eva

    2008-01-01

    Reconstructing phylogenies from nucleotide sequences is a challenge for students because it strongly depends on evolutionary models and computer tools that are frequently updated. We present here an inquiry-based course aimed at learning how to trace a phylogeny based on sequences existing in public databases. Computer tools are freely available…

  12. Phylogeny and strain typing of Escherichia coli, inferred from variation at mononucleotide repeat loci.

    PubMed

    Diamant, Eran; Palti, Yniv; Gur-Arie, Riva; Cohen, Helit; Hallerman, Eric M; Kashi, Yechezkel

    2004-04-01

    Multilocus sequencing of housekeeping genes has been used previously for bacterial strain typing and for inferring evolutionary relationships among strains of Escherichia coli. In this study, we used shorter intergenic sequences that contained simple sequence repeats (SSRs) of repeating mononucleotide motifs (mononucleotide repeats [MNRs]) to infer the phylogeny of pathogenic and commensal E. coli strains. Seven noncoding loci (four MNRs and three non-SSRs) were sequenced in 27 strains, including enterohemorrhagic (six isolates of O157:H7), enteropathogenic, enterotoxigenic, B, and K-12 strains. The four MNRs were also sequenced in 20 representative strains of the E. coli reference (ECOR) collection. Sequence polymorphism was significantly higher at the MNR loci, including the flanking sequences, indicating a higher mutation rate in the sequences flanking the MNR tracts. The four MNR loci were amplifiable by PCR in the standard ECOR A, B1, and D groups, but only one (yaiN) in the B2 group was amplified, which is consistent with previous studies that suggested that B2 is the most ancient group. High sequence compatibility was found between the four MNR loci, indicating that they are in the same clonal frame. The phylogenetic trees that were constructed from the sequence data were in good agreement with those of previous studies that used multilocus enzyme electrophoresis. The results demonstrate that MNR loci are useful for inferring phylogenetic relationships and provide much higher sequence variation than housekeeping genes. Therefore, the use of MNR loci for multilocus sequence typing should prove efficient for clinical diagnostics, epidemiology, and evolutionary study of bacteria.

  13. Phylogeny and Strain Typing of Escherichia coli, Inferred from Variation at Mononucleotide Repeat Loci

    PubMed Central

    Diamant, Eran; Palti, Yniv; Gur-Arie, Riva; Cohen, Helit; Hallerman, Eric M.; Kashi, Yechezkel

    2004-01-01

    Multilocus sequencing of housekeeping genes has been used previously for bacterial strain typing and for inferring evolutionary relationships among strains of Escherichia coli. In this study, we used shorter intergenic sequences that contained simple sequence repeats (SSRs) of repeating mononucleotide motifs (mononucleotide repeats [MNRs]) to infer the phylogeny of pathogenic and commensal E. coli strains. Seven noncoding loci (four MNRs and three non-SSRs) were sequenced in 27 strains, including enterohemorrhagic (six isolates of O157:H7), enteropathogenic, enterotoxigenic, B, and K-12 strains. The four MNRs were also sequenced in 20 representative strains of the E. coli reference (ECOR) collection. Sequence polymorphism was significantly higher at the MNR loci, including the flanking sequences, indicating a higher mutation rate in the sequences flanking the MNR tracts. The four MNR loci were amplifiable by PCR in the standard ECOR A, B1, and D groups, but only one (yaiN) in the B2 group was amplified, which is consistent with previous studies that suggested that B2 is the most ancient group. High sequence compatibility was found between the four MNR loci, indicating that they are in the same clonal frame. The phylogenetic trees that were constructed from the sequence data were in good agreement with those of previous studies that used multilocus enzyme electrophoresis. The results demonstrate that MNR loci are useful for inferring phylogenetic relationships and provide much higher sequence variation than housekeeping genes. Therefore, the use of MNR loci for multilocus sequence typing should prove efficient for clinical diagnostics, epidemiology, and evolutionary study of bacteria. PMID:15066845

  14. Why is the correlation between gene importance and gene evolutionary rate so weak?

    PubMed

    Wang, Zhi; Zhang, Jianzhi

    2009-01-01

    One of the few commonly believed principles of molecular evolution is that functionally more important genes (or DNA sequences) evolve more slowly than less important ones. This principle is widely used by molecular biologists in daily practice. However, recent genomic analysis of a diverse array of organisms found only weak, negative correlations between the evolutionary rate of a gene and its functional importance, typically measured under a single benign lab condition. A frequently suggested cause of the above finding is that gene importance determined in the lab differs from that in an organism's natural environment. Here, we test this hypothesis in yeast using gene importance values experimentally determined in 418 lab conditions or computationally predicted for 10,000 nutritional conditions. In no single condition or combination of conditions did we find a much stronger negative correlation, which is explainable by our subsequent finding that always-essential (enzyme) genes do not evolve significantly more slowly than sometimes-essential or always-nonessential ones. Furthermore, we verified that functional density, approximated by the fraction of amino acid sites within protein domains, is uncorrelated with gene importance. Thus, neither the lab-nature mismatch nor a potentially biased among-gene distribution of functional density explains the observed weakness of the correlation between gene importance and evolutionary rate. We conclude that the weakness is factual, rather than artifactual. In addition to being weakened by population genetic reasons, the correlation is likely to have been further weakened by the presence of multiple nontrivial rate determinants that are independent from gene importance. These findings notwithstanding, we show that the principle of slower evolution of more important genes does have some predictive power when genes with vastly different evolutionary rates are compared, explaining why the principle can be practically useful despite the weakness of the correlation.

  15. The evolution of virulence in primate malaria parasites based on Bayesian reconstructions of ancestral states.

    PubMed

    Garamszegi, László Zsolt

    2011-02-01

    Plasmodium parasites, the causative agents of malaria, are generally considered as harmful parasites, but many of them cause mild symptoms. Little is known about the evolutionary history and phylogenetic constraints that generate this interspecific variation in virulence due to uncertainties about the phylogenetic associations of parasites. Here, to account for such phylogenetic uncertainty, phylogenetic methods based on Bayesian statistics were followed in combination with sequence data from five genes to estimate the ancestral state of virulence in primate Plasmodium parasites. When recent parasites were categorised according to the damage caused to the host, Bayesian estimates of ancestral states indicated that the acquisition of a harmful host exploitation strategy is more likely to be a recent evolutionary event than a result of an ancient change in a character state altering virulence. On the contrary, there was more evidence for moderate host exploitation having a deep origin along the phylogenetic tree. Moreover, the evolution of host severity is determined by the phylogenetic relationships of parasites, as severity gains did not appear randomly on the evolutionary tree. Such phylogenetic constraints can be mediated by the acquisition of virulence genes. As the impact of a parasite on a host is the result of both the parasite's investment in reproduction and host sensitivity, virulence was also estimated by calculating peak parasitemia after eliminating host effects. A directional random-walk evolutionary model showed that the ancestral primate malarias reproduced at very low parasitemia in their hosts. Consequently, the extreme variation in the outcome of malaria infection in different host species can be better understood in light of the phylogeny of parasites. Copyright © 2010 Australian Society for Parasitology Inc. Published by Elsevier Ltd. All rights reserved.

  16. Evolutionary modes of emergence of short interspersed nuclear element (SINE) families in grasses.

    PubMed

    Kögler, Anja; Schmidt, Thomas; Wenke, Torsten

    2017-11-01

    Short interspersed nuclear elements (SINEs) are non-autonomous transposable elements which are propagated by retrotransposition and constitute an inherent part of the genome of most eukaryotic species. Knowledge of heterogeneous and highly abundant SINEs is crucial for de novo (or improvement of) annotation of whole genome sequences. We scanned Poaceae genome sequences of six important cereals (Oryza sativa, Triticum aestivum, Hordeum vulgare, Panicum virgatum, Sorghum bicolor, Zea mays) and Brachypodium distachyon to examine the diversity and evolution of SINE populations. We comparatively analyzed the structural features, distribution, evolutionary relation and abundance of 32 SINE families and subfamilies within grasses, comprising 11 052 individual copies. The investigation of activity profiles within the Poaceae provides insights into their species-specific diversification and amplification. We found that Poaceae SINEs (PoaS) fall into two length categories: simple SINEs of up to 180 bp and dimeric SINEs larger than 240 bp. Detailed analysis at the nucleotide level revealed that multimerization of related and unrelated SINE copies is an important evolutionary mechanism of SINE formation. We conclude that PoaS families diversify by massive reshuffling between SINE families, likely caused by insertion of truncated copies, and provide a model for this evolutionary scenario. Twenty-eight of 32 PoaS families and subfamilies show significant conservation, in particular either in the 5' or 3' regions, across Poaceae species and share large sequence stretches with one or more other PoaS families. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  17. Short- and Long-term Evolutionary Dynamics of Bacterial Insertion Sequences: Insights from Wolbachia Endosymbionts

    PubMed Central

    Cerveau, Nicolas; Leclercq, Sébastien; Leroy, Elodie; Bouchon, Didier; Cordaux, Richard

    2011-01-01

    Transposable elements (TE) are one of the major driving forces of genome evolution, raising the question of the long-term dynamics underlying their evolutionary success. Long-term TE evolution can readily be reconstructed in eukaryotes, thanks to many degraded copies constituting genomic fossil records of past TE proliferations. By contrast, bacterial genomes usually experience high sequence turnover and short TE retention times, thereby obscuring ancient TE evolutionary patterns. We found that Wolbachia bacterial genomes contain 52–171 insertion sequence (IS) TEs. IS account for 11% of Wolbachia wRi, which is one of the highest IS genomic coverage reported in prokaryotes to date. We show that many IS groups are currently expanding in various Wolbachia genomes and that IS horizontal transfers are frequent among strains, which can explain the apparent synchronicity of these IS proliferations. Remarkably, >70% of Wolbachia IS are nonfunctional. They constitute an unusual bacterial IS genomic fossil record providing direct empirical evidence for a long-term IS evolutionary dynamics following successive periods of intense transpositional activity. Our results show that comprehensive IS annotations have the potential to provide new insights into prokaryote TE evolution and, more generally, prokaryote genome evolution. Indeed, the identification of an important IS genomic fossil record in Wolbachia demonstrates that IS elements are not always of recent origin, contrary to the conventional view of TE evolution in prokaryote genomes. Our results also raise the question whether the abundance of IS fossils is specific to Wolbachia or it may be a general, albeit overlooked, feature of prokaryote genomes. PMID:21940637

  18. Short- and long-term evolutionary dynamics of bacterial insertion sequences: insights from Wolbachia endosymbionts.

    PubMed

    Cerveau, Nicolas; Leclercq, Sébastien; Leroy, Elodie; Bouchon, Didier; Cordaux, Richard

    2011-01-01

    Transposable elements (TE) are one of the major driving forces of genome evolution, raising the question of the long-term dynamics underlying their evolutionary success. Long-term TE evolution can readily be reconstructed in eukaryotes, thanks to many degraded copies constituting genomic fossil records of past TE proliferations. By contrast, bacterial genomes usually experience high sequence turnover and short TE retention times, thereby obscuring ancient TE evolutionary patterns. We found that Wolbachia bacterial genomes contain 52-171 insertion sequence (IS) TEs. IS account for 11% of Wolbachia wRi, which is one of the highest IS genomic coverage reported in prokaryotes to date. We show that many IS groups are currently expanding in various Wolbachia genomes and that IS horizontal transfers are frequent among strains, which can explain the apparent synchronicity of these IS proliferations. Remarkably, >70% of Wolbachia IS are nonfunctional. They constitute an unusual bacterial IS genomic fossil record providing direct empirical evidence for a long-term IS evolutionary dynamics following successive periods of intense transpositional activity. Our results show that comprehensive IS annotations have the potential to provide new insights into prokaryote TE evolution and, more generally, prokaryote genome evolution. Indeed, the identification of an important IS genomic fossil record in Wolbachia demonstrates that IS elements are not always of recent origin, contrary to the conventional view of TE evolution in prokaryote genomes. Our results also raise the question whether the abundance of IS fossils is specific to Wolbachia or it may be a general, albeit overlooked, feature of prokaryote genomes.

  19. Evolutionary divergence of chloroplast FAD synthetase proteins

    PubMed Central

    2010-01-01

    Background Flavin adenine dinucleotide synthetases (FADSs) - a group of bifunctional enzymes that carry out the dual functions of riboflavin phosphorylation to produce flavin mononucleotide (FMN) and its subsequent adenylation to generate FAD in most prokaryotes - were studied in plants in terms of sequence, structure and evolutionary history. Results Using a variety of bioinformatics methods we have found that FADS enzymes localized to the chloroplasts, which we term as plant-like FADS proteins, are distributed across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. The C-terminal module of these enzymes does not contain the typical riboflavin kinase active site sequence, while the N-terminal module is broadly conserved. These results agree with a previous work reported by Sandoval et al. in 2008. Furthermore, our observations and preliminary experimental results indicate that the C-terminus of plant-like FADS proteins may contain a catalytic activity, but different to that of their prokaryotic counterparts. In fact, homology models predict that plant-specific conserved residues constitute a distinct active site in the C-terminus. Conclusions A structure-based sequence alignment and an in-depth evolutionary survey of FADS proteins, thought to be crucial in plant metabolism, are reported, which will be essential for the correct annotation of plant genomes and further structural and functional studies. This work is a contribution to our understanding of the evolutionary history of plant-like FADS enzymes, which constitute a new family of FADS proteins whose C-terminal module might be involved in a distinct catalytic activity. PMID:20955574

  20. A reassessment of the evolutionary timescale of bat rabies viruses based upon glycoprotein gene sequences.

    PubMed

    Kuzmina, Natalia A; Kuzmin, Ivan V; Ellison, James A; Taylor, Steven T; Bergman, David L; Dew, Beverly; Rupprecht, Charles E

    2013-10-01

    Rabies, an acute progressive encephalomyelitis caused by viruses in the genus Lyssavirus, is one of the oldest known infectious diseases. Although dogs and other carnivores represent the greatest threat to public health as rabies reservoirs, it is commonly accepted that bats are the primary evolutionary hosts of lyssaviruses. Despite early historical documentation of rabies, molecular clock analyses indicate a quite young age of lyssaviruses, which is confusing. For example, the results obtained for partial and complete nucleoprotein gene sequences of rabies viruses (RABV), or for a limited number of glycoprotein gene sequences, indicated that the time of the most recent common ancestor (TMRCA) for current bat RABV diversity in the Americas lies in the seventeenth to eighteenth centuries and might be directly or indirectly associated with the European colonization. Conversely, several other reports demonstrated high genetic similarity between lyssavirus isolates, including RABV, obtained within a time interval of 25-50 years. In the present study, we attempted to re-estimate the age of several North American bat RABV lineages based on the largest set of complete and partial glycoprotein gene sequences compiled to date (n = 201) employing a codon substitution model. Although our results overlap with previous estimates in marginal areas of the 95 % high probability density (HPD), they suggest a longer evolutionary history of American bat RABV lineages (TMRCA at least 732 years, with a 95 % HPD 436-1107 years).

Top