Sample records for evolutionary sequence modeling

  1. Hidden long evolutionary memory in a model biochemical network

    NASA Astrophysics Data System (ADS)

    Ali, Md. Zulfikar; Wingreen, Ned S.; Mukhopadhyay, Ranjan

    2018-04-01

    We introduce a minimal model for the evolution of functional protein-interaction networks using a sequence-based mutational algorithm, and apply the model to study neutral drift in networks that yield oscillatory dynamics. Starting with a functional core module, random evolutionary drift increases network complexity even in the absence of specific selective pressures. Surprisingly, we uncover a hidden order in sequence space that gives rise to long-term evolutionary memory, implying strong constraints on network evolution due to the topology of accessible sequence space.

  2. A Generative Angular Model of Protein Structure Evolution

    PubMed Central

    Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

    2017-01-01

    Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724

  3. Reconstructing evolutionary trees in parallel for massive sequences.

    PubMed

    Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

    2017-12-14

    Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .

  4. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules

    PubMed Central

    Ashkenazy, Haim; Abadi, Shiran; Martz, Eric; Chay, Ofer; Mayrose, Itay; Pupko, Tal; Ben-Tal, Nir

    2016-01-01

    The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree. PMID:27166375

  5. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

    PubMed Central

    Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928

  6. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.

    PubMed

    Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.

  7. Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies.

    PubMed

    Spielman, Stephanie J; Wilke, Claus O

    2015-01-01

    We introduce Pyvolve, a flexible Python module for simulating genetic data along a phylogeny using continuous-time Markov models of sequence evolution. Easily incorporated into Python bioinformatics pipelines, Pyvolve can simulate sequences according to most standard models of nucleotide, amino-acid, and codon sequence evolution. All model parameters are fully customizable. Users can additionally specify custom evolutionary models, with custom rate matrices and/or states to evolve. This flexibility makes Pyvolve a convenient framework not only for simulating sequences under a wide variety of conditions, but also for developing and testing new evolutionary models. Pyvolve is an open-source project under a FreeBSD license, and it is available for download, along with a detailed user-manual and example scripts, from http://github.com/sjspielman/pyvolve.

  8. Evolution of sparsity and modularity in a model of protein allostery

    NASA Astrophysics Data System (ADS)

    Hemery, Mathieu; Rivoire, Olivier

    2015-04-01

    The sequence of a protein is not only constrained by its physical and biochemical properties under current selection, but also by features of its past evolutionary history. Understanding the extent and the form that these evolutionary constraints may take is important to interpret the information in protein sequences. To study this problem, we introduce a simple but physical model of protein evolution where selection targets allostery, the functional coupling of distal sites on protein surfaces. This model shows how the geometrical organization of couplings between amino acids within a protein structure can depend crucially on its evolutionary history. In particular, two scenarios are found to generate a spatial concentration of functional constraints: high mutation rates and fluctuating selective pressures. This second scenario offers a plausible explanation for the high tolerance of natural proteins to mutations and for the spatial organization of their least tolerant amino acids, as revealed by sequence analysis and mutagenesis experiments. It also implies a faculty to adapt to new selective pressures that is consistent with observations. The model illustrates how several independent functional modules may emerge within the same protein structure, depending on the nature of past environmental fluctuations. Our model thus relates the evolutionary history of proteins to the geometry of their functional constraints, with implications for decoding and engineering protein sequences.

  9. Evolutionary distances in the twilight zone--a rational kernel approach.

    PubMed

    Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian

    2010-12-31

    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.

  10. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

    PubMed

    Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

    2017-10-03

    Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.

  11. LS³: A Method for Improving Phylogenomic Inferences When Evolutionary Rates Are Heterogeneous among Taxa

    PubMed Central

    Rivera-Rivera, Carlos J.; Montoya-Burgos, Juan I.

    2016-01-01

    Phylogenetic inference artifacts can occur when sequence evolution deviates from assumptions made by the models used to analyze them. The combination of strong model assumption violations and highly heterogeneous lineage evolutionary rates can become problematic in phylogenetic inference, and lead to the well-described long-branch attraction (LBA) artifact. Here, we define an objective criterion for assessing lineage evolutionary rate heterogeneity among predefined lineages: the result of a likelihood ratio test between a model in which the lineages evolve at the same rate (homogeneous model) and a model in which different lineage rates are allowed (heterogeneous model). We implement this criterion in the algorithm Locus Specific Sequence Subsampling (LS³), aimed at reducing the effects of LBA in multi-gene datasets. For each gene, LS³ sequentially removes the fastest-evolving taxon of the ingroup and tests for lineage rate homogeneity until all lineages have uniform evolutionary rates. The sequences excluded from the homogeneously evolving taxon subset are flagged as potentially problematic. The software implementation provides the user with the possibility to remove the flagged sequences for generating a new concatenated alignment. We tested LS³ with simulations and two real datasets containing LBA artifacts: a nucleotide dataset regarding the position of Glires within mammals and an amino-acid dataset concerning the position of nematodes within bilaterians. The initially incorrect phylogenies were corrected in all cases upon removing data flagged by LS³. PMID:26912812

  12. Simple versus complex models of trait evolution and stasis as a response to environmental change

    NASA Astrophysics Data System (ADS)

    Hunt, Gene; Hopkins, Melanie J.; Lidgard, Scott

    2015-04-01

    Previous analyses of evolutionary patterns, or modes, in fossil lineages have focused overwhelmingly on three simple models: stasis, random walks, and directional evolution. Here we use likelihood methods to fit an expanded set of evolutionary models to a large compilation of ancestor-descendant series of populations from the fossil record. In addition to the standard three models, we assess more complex models with punctuations and shifts from one evolutionary mode to another. As in previous studies, we find that stasis is common in the fossil record, as is a strict version of stasis that entails no real evolutionary changes. Incidence of directional evolution is relatively low (13%), but higher than in previous studies because our analytical approach can more sensitively detect noisy trends. Complex evolutionary models are often favored, overwhelmingly so for sequences comprising many samples. This finding is consistent with evolutionary dynamics that are, in reality, more complex than any of the models we consider. Furthermore, the timing of shifts in evolutionary dynamics varies among traits measured from the same series. Finally, we use our empirical collection of evolutionary sequences and a long and highly resolved proxy for global climate to inform simulations in which traits adaptively track temperature changes over time. When realistically calibrated, we find that this simple model can reproduce important aspects of our paleontological results. We conclude that observed paleontological patterns, including the prevalence of stasis, need not be inconsistent with adaptive evolution, even in the face of unstable physical environments.

  13. Sequence data - Magnitude and implications of some ambiguities.

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Jukes, T. H.

    1972-01-01

    A stochastic model is applied to the divergence of the horse-pig lineage from a common ansestor in terms of the alpha and beta chains of hemoglobin and fibrinopeptides. The results are compared with those based on the minimum mutation distance model of Fitch (1972). Buckwheat and cauliflower cytochrome c sequences are analyzed to demonstrate their ambiguities. A comparative analysis of evolutionary rates for various proteins of horses and pigs shows that errors of considerable magnitude are introduced by Glx and Asx ambiguities into evolutionary conclusions drawn from sequences of incompletely analyzed proteins.

  14. Beyond Reasonable Doubt: Evolution from DNA Sequences

    PubMed Central

    Penny, David

    2013-01-01

    We demonstrate quantitatively that, as predicted by evolutionary theory, sequences of homologous proteins from different species converge as we go further and further back in time. The converse, a non-evolutionary model can be expressed as probabilities, and the test works for chloroplast, nuclear and mitochondrial sequences, as well as for sequences that diverged at different time depths. Even on our conservative test, the probability that chance could produce the observed levels of ancestral convergence for just one of the eight datasets of 51 proteins is ≈1×10−19 and combined over 8 datasets is ≈1×10−132. By comparison, there are about 1080 protons in the universe, hence the probability that the sequences could have been produced by a process involving unrelated ancestral sequences is about 1050 lower than picking, among all protons, the same proton at random twice in a row. A non-evolutionary control model shows no convergence, and only a small number of parameters are required to account for the observations. It is time that that researchers insisted that doubters put up testable alternatives to evolution. PMID:23950906

  15. LS³: A Method for Improving Phylogenomic Inferences When Evolutionary Rates Are Heterogeneous among Taxa.

    PubMed

    Rivera-Rivera, Carlos J; Montoya-Burgos, Juan I

    2016-06-01

    Phylogenetic inference artifacts can occur when sequence evolution deviates from assumptions made by the models used to analyze them. The combination of strong model assumption violations and highly heterogeneous lineage evolutionary rates can become problematic in phylogenetic inference, and lead to the well-described long-branch attraction (LBA) artifact. Here, we define an objective criterion for assessing lineage evolutionary rate heterogeneity among predefined lineages: the result of a likelihood ratio test between a model in which the lineages evolve at the same rate (homogeneous model) and a model in which different lineage rates are allowed (heterogeneous model). We implement this criterion in the algorithm Locus Specific Sequence Subsampling (LS³), aimed at reducing the effects of LBA in multi-gene datasets. For each gene, LS³ sequentially removes the fastest-evolving taxon of the ingroup and tests for lineage rate homogeneity until all lineages have uniform evolutionary rates. The sequences excluded from the homogeneously evolving taxon subset are flagged as potentially problematic. The software implementation provides the user with the possibility to remove the flagged sequences for generating a new concatenated alignment. We tested LS³ with simulations and two real datasets containing LBA artifacts: a nucleotide dataset regarding the position of Glires within mammals and an amino-acid dataset concerning the position of nematodes within bilaterians. The initially incorrect phylogenies were corrected in all cases upon removing data flagged by LS³. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331

  17. A Case-by-Case Evolutionary Analysis of Four Imprinted Retrogenes

    PubMed Central

    McCole, Ruth B; Loughran, Noeleen B; Chahal, Mandeep; Fernandes, Luis P; Roberts, Roland G; Fraternali, Franca; O'Connell, Mary J; Oakey, Rebecca J

    2011-01-01

    Retroposition is a widespread phenomenon resulting in the generation of new genes that are initially related to a parent gene via very high coding sequence similarity. We examine the evolutionary fate of four retrogenes generated by such an event; mouse Inpp5f_v2, Mcts2, Nap1l5, and U2af1-rs1. These genes are all subject to the epigenetic phenomenon of parental imprinting. We first provide new data on the age of these retrogene insertions. Using codon-based models of sequence evolution, we show these retrogenes have diverse evolutionary trajectories, including divergence from the parent coding sequence under positive selection pressure, purifying selection pressure maintaining parent-retrogene similarity, and neutral evolution. Examination of the expression pattern of retrogenes shows an atypical, broad pattern across multiple tissues. Protein 3D structure modeling reveals that a positively selected residue in U2af1-rs1, not shared by its parent, may influence protein conformation. Our case-by-case analysis of the evolution of four imprinted retrogenes reveals that this interesting class of imprinted genes, while similar in regulation and sequence characteristics, follow very varied evolutionary paths. PMID:21166792

  18. The Ancient Evolutionary History of Polyomaviruses

    PubMed Central

    Buck, Christopher B.; Van Doorslaer, Koenraad; Peretti, Alberto; Geoghegan, Eileen M.; Tisza, Michael J.; An, Ping; Katz, Joshua P.; Pipas, James M.; McBride, Alison A.; Camus, Alvin C.; McDermott, Alexa J.; Dill, Jennifer A.; Delwart, Eric; Ng, Terry F. F.; Farkas, Kata; Austin, Charlotte; Kraberger, Simona; Davison, William; Pastrana, Diana V.; Varsani, Arvind

    2016-01-01

    Polyomaviruses are a family of DNA tumor viruses that are known to infect mammals and birds. To investigate the deeper evolutionary history of the family, we used a combination of viral metagenomics, bioinformatics, and structural modeling approaches to identify and characterize polyomavirus sequences associated with fish and arthropods. Analyses drawing upon the divergent new sequences indicate that polyomaviruses have been gradually co-evolving with their animal hosts for at least half a billion years. Phylogenetic analyses of individual polyomavirus genes suggest that some modern polyomavirus species arose after ancient recombination events involving distantly related polyomavirus lineages. The improved evolutionary model provides a useful platform for developing a more accurate taxonomic classification system for the viral family Polyomaviridae. PMID:27093155

  19. Using single cell sequencing data to model the evolutionary history of a tumor.

    PubMed

    Kim, Kyung In; Simon, Richard

    2014-01-24

    The introduction of next-generation sequencing (NGS) technology has made it possible to detect genomic alterations within tumor cells on a large scale. However, most applications of NGS show the genetic content of mixtures of cells. Recently developed single cell sequencing technology can identify variation within a single cell. Characterization of multiple samples from a tumor using single cell sequencing can potentially provide information on the evolutionary history of that tumor. This may facilitate understanding how key mutations accumulate and evolve in lineages to form a heterogeneous tumor. We provide a computational method to infer an evolutionary mutation tree based on single cell sequencing data. Our approach differs from traditional phylogenetic tree approaches in that our mutation tree directly describes temporal order relationships among mutation sites. Our method also accommodates sequencing errors. Furthermore, we provide a method for estimating the proportion of time from the earliest mutation event of the sample to the most recent common ancestor of the sample of cells. Finally, we discuss current limitations on modeling with single cell sequencing data and possible improvements under those limitations. Inferring the temporal ordering of mutational sites using current single cell sequencing data is a challenge. Our proposed method may help elucidate relationships among key mutations and their role in tumor progression.

  20. Adaptive evolutionary walks require neutral intermediates in RNA fitness landscapes.

    PubMed

    Rendel, Mark D

    2011-01-01

    In RNA fitness landscapes with interconnected networks of neutral mutations, neutral precursor mutations can play an important role in facilitating the accessibility of epistatic adaptive mutant combinations. I use an exhaustively surveyed fitness landscape model based on short sequence RNA genotypes (and their secondary structure phenotypes) to calculate the minimum rate at which mutants initially appearing as neutral are incorporated into an adaptive evolutionary walk. I show first, that incorporating neutral mutations significantly increases the number of point mutations in a given evolutionary walk when compared to estimates from previous adaptive walk models. Second, that incorporating neutral mutants into such a walk significantly increases the final fitness encountered on that walk - indeed evolutionary walks including neutral steps often reach the global optimum in this model. Third, and perhaps most importantly, evolutionary paths of this kind are often extremely winding in their nature and have the potential to undergo multiple mutations at a given sequence position within a single walk; the potential of these winding paths to mislead phylogenetic reconstruction is briefly considered. Copyright © 2010 Elsevier Inc. All rights reserved.

  1. 3D RNA and functional interactions from evolutionary couplings

    PubMed Central

    Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.

    2016-01-01

    Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444

  2. Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

    PubMed Central

    Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.

    2012-01-01

    High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309

  3. Mathematical model and metaheuristics for simultaneous balancing and sequencing of a robotic mixed-model assembly line

    NASA Astrophysics Data System (ADS)

    Li, Zixiang; Janardhanan, Mukund Nilakantan; Tang, Qiuhua; Nielsen, Peter

    2018-05-01

    This article presents the first method to simultaneously balance and sequence robotic mixed-model assembly lines (RMALB/S), which involves three sub-problems: task assignment, model sequencing and robot allocation. A new mixed-integer programming model is developed to minimize makespan and, using CPLEX solver, small-size problems are solved for optimality. Two metaheuristics, the restarted simulated annealing algorithm and co-evolutionary algorithm, are developed and improved to address this NP-hard problem. The restarted simulated annealing method replaces the current temperature with a new temperature to restart the search process. The co-evolutionary method uses a restart mechanism to generate a new population by modifying several vectors simultaneously. The proposed algorithms are tested on a set of benchmark problems and compared with five other high-performing metaheuristics. The proposed algorithms outperform their original editions and the benchmarked methods. The proposed algorithms are able to solve the balancing and sequencing problem of a robotic mixed-model assembly line effectively and efficiently.

  4. TARGETED CAPTURE IN EVOLUTIONARY AND ECOLOGICAL GENOMICS

    PubMed Central

    Jones, Matthew R.; Good, Jeffrey M.

    2016-01-01

    The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole genome sequencing in ecological and evolutionary genomic studies. High throughput targeted capture is one such strategy that involves the parallel enrichment of pre-selected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across labs focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to 1) increase the accessibility of targeted capture to researchers working in non-model taxa by discussing capture methods that circumvent the need of a reference genome, 2) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy, and 3) discuss the future of targeted capture and other genome partitioning approaches in light of the increasing accessibility of whole genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole genome sequencing. PMID:26137993

  5. Emerging Concepts of Data Integration in Pathogen Phylodynamics.

    PubMed

    Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.

  6. Emerging Concepts of Data Integration in Pathogen Phylodynamics

    PubMed Central

    Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504

  7. Determination of Fundamental Properties of an M31 Globular Cluster from Main-Sequence Photometry

    NASA Astrophysics Data System (ADS)

    Ma, Jun; Wu, Zhenyu; Wang, Song; Fan, Zhou; Zhou, Xu; Wu, Jianghua; Jiang, Zhaoji; Chen, Jiansheng

    2010-10-01

    M31 globular cluster B379 is the first extragalactic cluster whose age was determined by main-sequence photometry. In the main-sequence photometric method, the age of a cluster is obtained by fitting its color-magnitude diagram (CMD) with stellar evolutionary models. However, different stellar evolutionary models use different parameters of stellar evolution, such as range of stellar masses, different opacities and equations of state, and different recipes, and so on. So, it is interesting to check whether different stellar evolutionary models can give consistent results for the same cluster. Brown et al. constrained the age of B379 by comparing its CMD with isochrones of the 2006 VandenBerg models. Using SSP models of Bruzual & Charlot and its multiphotometry, ZMa et al. independently determined the age of B379, which is in good agreement with the determination of Brown et al. The models of Bruzual & Charlot are calculated based on the Padova evolutionary tracks. It is necessary to check whether the age of B379 as determined based on the Padova evolutionary tracks is in agreement with the determination of Brown et al.. In this article, we redetermine the age of B379 using isochrones of the Padova stellar evolutionary models. In addition, the metal abundance, the distance modulus, and the reddening value for B379 are reported. The results obtained are consistent with the previous determinations, which include the age obtained by Brown et al. This article thus confirms the consistency of the age scale of B379 between the Padova isochrones and the 2006 VandenBerg isochrones; i.e., the comparison between the results of Brown et al. and Ma et al. is meaningful. The results reported in this article of values found for B379 are: metallicity [M/H] = log(Z/Z ⊙) = -0.325, age τ = 11.0 ± 1.5 Gyr, reddening E(B - V) = 0.08, and distance modulus (m - M)0 = 24.44 ± 0.10.

  8. Evolutionary models of interstellar chemistry

    NASA Technical Reports Server (NTRS)

    Prasad, Sheo S.

    1987-01-01

    The goal of evolutionary models of interstellar chemistry is to understand how interstellar clouds came to be the way they are, how they will change with time, and to place them in an evolutionary sequence with other celestial objects such as stars. An improved Mark II version of an earlier model of chemistry in dynamically evolving clouds is presented. The Mark II model suggests that the conventional elemental C/O ratio less than one can explain the observed abundances of CI and the nondetection of O2 in dense clouds. Coupled chemical-dynamical models seem to have the potential to generate many observable discriminators of the evolutionary tracks. This is exciting, because, in general, purely dynamical models do not yield enough verifiable discriminators of the predicted tracks.

  9. Agency, Values, and Well-Being: A Human Development Model

    ERIC Educational Resources Information Center

    Welzel, Christian; Inglehart, Ronald

    2010-01-01

    This paper argues that feelings of agency are linked to human well-being through a sequence of adaptive mechanisms that promote human development, once existential conditions become permissive. In the first part, we elaborate on the evolutionary logic of this model and outline why an evolutionary perspective is helpful to understand changes in…

  10. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

    PubMed Central

    Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

    2011-01-01

    Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353

  11. Grand challenges in evolutionary and population genetics: The importance of integrating epigenetics, genomics, modeling, and experimentation

    Treesearch

    Samuel A. Cushman

    2014-01-01

    This is a time of explosive growth in the fields of evolutionary and population genetics, with whole genome sequencing and bioinformatics driving a transformative paradigm shift (Morozova and Marra, 2008). At the same time, advances in epigenetics are thoroughly transforming our understanding of evolutionary processes and their implications for populations, species and...

  12. Biophysics of protein evolution and evolutionary protein biophysics

    PubMed Central

    Sikosek, Tobias; Chan, Hue Sun

    2014-01-01

    The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution. PMID:25165599

  13. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.

    PubMed

    Waddell, Peter J; Ota, Rissa; Penny, David

    2009-10-01

    Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P < 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.

  14. A Stochastic Evolutionary Model for Protein Structure Alignment and Phylogeny

    PubMed Central

    Challis, Christopher J.; Schmidler, Scott C.

    2012-01-01

    We present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model, and mutations follow a standard substitution matrix, whereas backbone atoms diffuse in three-dimensional space according to an Ornstein–Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables phylogenetic inference on time scales not previously attainable with sequence evolution models. The model also provides a tool for testing evolutionary hypotheses and improving our understanding of protein structural evolution. PMID:22723302

  15. Simultaneously estimating evolutionary history and repeated traits phylogenetic signal: applications to viral and host phenotypic evolution

    PubMed Central

    Vrancken, Bram; Lemey, Philippe; Rambaut, Andrew; Bedford, Trevor; Longdon, Ben; Günthard, Huldrych F.; Suchard, Marc A.

    2014-01-01

    Phylogenetic signal quantifies the degree to which resemblance in continuously-valued traits reflects phylogenetic relatedness. Measures of phylogenetic signal are widely used in ecological and evolutionary research, and are recently gaining traction in viral evolutionary studies. Standard estimators of phylogenetic signal frequently condition on data summary statistics of the repeated trait observations and fixed phylogenetics trees, resulting in information loss and potential bias. To incorporate the observation process and phylogenetic uncertainty in a model-based approach, we develop a novel Bayesian inference method to simultaneously estimate the evolutionary history and phylogenetic signal from molecular sequence data and repeated multivariate traits. Our approach builds upon a phylogenetic diffusion framework that model continuous trait evolution as a Brownian motion process and incorporates Pagel’s λ transformation parameter to estimate dependence among traits. We provide a computationally efficient inference implementation in the BEAST software package. We evaluate the synthetic performance of the Bayesian estimator of phylogenetic signal against standard estimators, and demonstrate the use of our coherent framework to address several virus-host evolutionary questions, including virulence heritability for HIV, antigenic evolution in influenza and HIV, and Drosophila sensitivity to sigma virus infection. Finally, we discuss model extensions that will make useful contributions to our flexible framework for simultaneously studying sequence and trait evolution. PMID:25780554

  16. Observing Clonal Dynamics across Spatiotemporal Axes: A Prelude to Quantitative Fitness Models for Cancer.

    PubMed

    McPherson, Andrew W; Chan, Fong Chun; Shah, Sohrab P

    2018-02-01

    The ability to accurately model evolutionary dynamics in cancer would allow for prediction of progression and response to therapy. As a prelude to quantitative understanding of evolutionary dynamics, researchers must gather observations of in vivo tumor evolution. High-throughput genome sequencing now provides the means to profile the mutational content of evolving tumor clones from patient biopsies. Together with the development of models of tumor evolution, reconstructing evolutionary histories of individual tumors generates hypotheses about the dynamics of evolution that produced the observed clones. In this review, we provide a brief overview of the concepts involved in predicting evolutionary histories, and provide a workflow based on bulk and targeted-genome sequencing. We then describe the application of this workflow to time series data obtained for transformed and progressed follicular lymphomas (FL), and contrast the observed evolutionary dynamics between these two subtypes. We next describe results from a spatial sampling study of high-grade serous (HGS) ovarian cancer, propose mechanisms of disease spread based on the observed clonal mixtures, and provide examples of diversification through subclonal acquisition of driver mutations and convergent evolution. Finally, we state implications of the techniques discussed in this review as a necessary but insufficient step on the path to predictive modelling of disease dynamics. Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved.

  17. Computationally mapping sequence space to understand evolutionary protein engineering.

    PubMed

    Armstrong, Kathryn A; Tidor, Bruce

    2008-01-01

    Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.

  18. Incorporating evolution of transcription factor binding sites into annotated alignments.

    PubMed

    Bais, Abha S; Grossmann, Stefen; Vingron, Martin

    2007-08-01

    Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield "conserved TFBSs". Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits)are generated. Moreover,the pair- profile related parameters are derived in a sound statistical framework. In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions,as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs,we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution. Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this.Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.

  19. Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.

    PubMed

    Sharma, S; Raina, S N

    2005-01-01

    A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.

  20. Evolutionary inference via the Poisson Indel Process.

    PubMed

    Bouchard-Côté, Alexandre; Jordan, Michael I

    2013-01-22

    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.

  1. Evolutionary inference via the Poisson Indel Process

    PubMed Central

    Bouchard-Côté, Alexandre; Jordan, Michael I.

    2013-01-01

    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114–124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments. PMID:23275296

  2. Markov-modulated Markov chains and the covarion process of molecular evolution.

    PubMed

    Galtier, N; Jean-Marie, A

    2004-01-01

    The covarion (or site specific rate variation, SSRV) process of biological sequence evolution is a process by which the evolutionary rate of a nucleotide/amino acid/codon position can change in time. In this paper, we introduce time-continuous, space-discrete, Markov-modulated Markov chains as a model for representing SSRV processes, generalizing existing theory to any model of rate change. We propose a fast algorithm for diagonalizing the generator matrix of relevant Markov-modulated Markov processes. This algorithm makes phylogeny likelihood calculation tractable even for a large number of rate classes and a large number of states, so that SSRV models become applicable to amino acid or codon sequence datasets. Using this algorithm, we investigate the accuracy of the discrete approximation to the Gamma distribution of evolutionary rates, widely used in molecular phylogeny. We show that a relatively large number of classes is required to achieve accurate approximation of the exact likelihood when the number of analyzed sequences exceeds 20, both under the SSRV and among site rate variation (ASRV) models.

  3. Evolutionary profiles from the QR factorization of multiple sequence alignments

    PubMed Central

    Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida

    2005-01-01

    We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270

  4. Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks

    PubMed Central

    2011-01-01

    Background Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. Results A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Conclusions Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced. PMID:21849086

  5. Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks.

    PubMed

    Xie, Xueying; Jin, Jing; Mao, Yongyi

    2011-08-18

    Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced.

  6. The painted turtle, Chrysemys picta: a model system for vertebrate evolution, ecology, and human health.

    PubMed

    Valenzuela, Nicole

    2009-07-01

    Painted turtles (Chrysemys picta) are representatives of a vertebrate clade whose biology and phylogenetic position hold a key to our understanding of fundamental aspects of vertebrate evolution. These features make them an ideal emerging model system. Extensive ecological and physiological research provide the context in which to place new research advances in evolutionary genetics, genomics, evolutionary developmental biology, and ecological developmental biology which are enabled by current resources, such as a bacterial artificial chromosome (BAC) library of C. picta, and the imminent development of additional ones such as genome sequences and cDNA and expressed sequence tag (EST) libraries. This integrative approach will allow the research community to continue making advances to provide functional and evolutionary explanations for the lability of biological traits found not only among reptiles but vertebrates in general. Moreover, because humans and reptiles share a common ancestor, and given the ease of using nonplacental vertebrates in experimental biology compared with mammalian embryos, painted turtles are also an emerging model system for biomedical research. For example, painted turtles have been studied to understand many biological responses to overwintering and anoxia, as potential sentinels for environmental xenobiotics, and as a model to decipher the ecology and evolution of sexual development and reproduction. Thus, painted turtles are an excellent reptilian model system for studies with human health, environmental, ecological, and evolutionary significance.

  7. Insights into the evolution of enzyme substrate promiscuity after the discovery of (βα)₈ isomerase evolutionary intermediates from a diverse metagenome.

    PubMed

    Noda-García, Lianet; Juárez-Vázquez, Ana L; Ávila-Arcos, María C; Verduzco-Castro, Ernesto A; Montero-Morán, Gabriela; Gaytán, Paul; Carrillo-Tripp, Mauricio; Barona-Gómez, Francisco

    2015-06-10

    Current sequence-based approaches to identify enzyme functional shifts, such as enzyme promiscuity, have proven to be highly dependent on a priori functional knowledge, hampering our ability to reconstruct evolutionary history behind these mechanisms. Hidden Markov Model (HMM) profiles, broadly used to classify enzyme families, can be useful to distinguish between closely related enzyme families with different specificities. The (βα)8-isomerase HisA/PriA enzyme family, involved in L-histidine (HisA, mono-substrate) biosynthesis in most bacteria and plants, but also in L-tryptophan (HisA/TrpF or PriA, dual-substrate) biosynthesis in most Actinobacteria, has been used as model system to explore evolutionary hypotheses and therefore has a considerable amount of evolutionary, functional and structural knowledge available. We searched for functional evolutionary intermediates between the HisA and PriA enzyme families in order to understand the functional divergence between these families. We constructed a HMM profile that correctly classifies sequences of unknown function into the HisA and PriA enzyme sub-families. Using this HMM profile, we mined a large metagenome to identify plausible evolutionary intermediate sequences between HisA and PriA. These sequences were used to perform phylogenetic reconstructions and to identify functionally conserved amino acids. Biochemical characterization of one selected enzyme (CAM1) with a mutation within the functionally essential N-terminus phosphate-binding site, namely, an alanine instead of a glycine in HisA or a serine in PriA, showed that this evolutionary intermediate has dual-substrate specificity. Moreover, site-directed mutagenesis of this alanine residue, either backwards into a glycine or forward into a serine, revealed the robustness of this enzyme. None of these mutations, presumably upon functionally essential amino acids, significantly abolished its enzyme activities. A truncated version of this enzyme (CAM2) predicted to adopt a (βα)6-fold, and thus entirely lacking a C-terminus phosphate-binding site, was identified and shown to have HisA activity. As expected, reconstruction of the evolution of PriA from HisA with HMM profiles suggest that functional shifts involve mutations in evolutionarily intermediate enzymes of otherwise functionally essential residues or motifs. These results are in agreement with a link between promiscuous enzymes and intragenic epistasis. HMM provides a convenient approach for gaining insights into these evolutionary processes.

  8. BAYESIAN PROTEIN STRUCTURE ALIGNMENT.

    PubMed

    Rodriguez, Abel; Schmidler, Scott C

    The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.

  9. Exploring Evolutionary Patterns in Genetic Sequence: A Computer Exercise

    ERIC Educational Resources Information Center

    Shumate, Alice M.; Windsor, Aaron J.

    2010-01-01

    The increase in publications presenting molecular evolutionary analyses and the availability of comparative sequence data through resources such as NCBI's GenBank underscore the necessity of providing undergraduates with hands-on sequence analysis skills in an evolutionary context. This need is particularly acute given that students have been…

  10. Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model

    PubMed Central

    Nené, Nuno R.; Dunham, Alistair S.; Illingworth, Christopher J. R.

    2018-01-01

    A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model. PMID:29500183

  11. Diversity Arrays Technology (DArT) for Pan-Genomic Evolutionary Studies of Non-Model Organisms

    PubMed Central

    James, Karen E.; Schneider, Harald; Ansell, Stephen W.; Evers, Margaret; Robba, Lavinia; Uszynski, Grzegorz; Pedersen, Niklas; Newton, Angela E.; Russell, Stephen J.; Vogel, Johannes C.; Kilian, Andrzej

    2008-01-01

    Background High-throughput tools for pan-genomic study, especially the DNA microarray platform, have sparked a remarkable increase in data production and enabled a shift in the scale at which biological investigation is possible. The use of microarrays to examine evolutionary relationships and processes, however, is predominantly restricted to model or near-model organisms. Methodology/Principal Findings This study explores the utility of Diversity Arrays Technology (DArT) in evolutionary studies of non-model organisms. DArT is a hybridization-based genotyping method that uses microarray technology to identify and type DNA polymorphism. Theoretically applicable to any organism (even one for which no prior genetic data are available), DArT has not yet been explored in exclusively wild sample sets, nor extensively examined in a phylogenetic framework. DArT recovered 1349 markers of largely low copy-number loci in two lineages of seed-free land plants: the diploid fern Asplenium viride and the haploid moss Garovaglia elegans. Direct sequencing of 148 of these DArT markers identified 30 putative loci including four routinely sequenced for evolutionary studies in plants. Phylogenetic analyses of DArT genotypes reveal phylogeographic and substrate specificity patterns in A. viride, a lack of phylogeographic pattern in Australian G. elegans, and additive variation in hybrid or mixed samples. Conclusions/Significance These results enable methodological recommendations including procedures for detecting and analysing DArT markers tailored specifically to evolutionary investigations and practical factors informing the decision to use DArT, and raise evolutionary hypotheses concerning substrate specificity and biogeographic patterns. Thus DArT is a demonstrably valuable addition to the set of existing molecular approaches used to infer biological phenomena such as adaptive radiations, population dynamics, hybridization, introgression, ecological differentiation and phylogeography. PMID:18301759

  12. Advances in understanding tumour evolution through single-cell sequencing.

    PubMed

    Kuipers, Jack; Jahn, Katharina; Beerenwinkel, Niko

    2017-04-01

    The mutational heterogeneity observed within tumours poses additional challenges to the development of effective cancer treatments. A thorough understanding of a tumour's subclonal composition and its mutational history is essential to open up the design of treatments tailored to individual patients. Comparative studies on a large number of tumours permit the identification of mutational patterns which may refine forecasts of cancer progression, response to treatment and metastatic potential. The composition of tumours is shaped by evolutionary processes. Recent advances in next-generation sequencing offer the possibility to analyse the evolutionary history and accompanying heterogeneity of tumours at an unprecedented resolution, by sequencing single cells. New computational challenges arise when moving from bulk to single-cell sequencing data, leading to the development of novel modelling frameworks. In this review, we present the state of the art methods for understanding the phylogeny encoded in bulk or single-cell sequencing data, and highlight future directions for developing more comprehensive and informative pictures of tumour evolution. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  13. Biophysical models of protein evolution: Understanding the patterns of evolutionary sequence divergence

    PubMed Central

    Echave, Julian; Wilke, Claus O.

    2018-01-01

    For decades, rates of protein evolution have been interpreted in terms of the vague concept of “functional importance”. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating them has large impacts on protein structure and stability. Here, we review the studies of the emergent field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field. PMID:28301766

  14. Microsatellite loci discovery from next-generation sequencing data and loci characterization in the epizoic barnacle Chelonibia testudinaria (Linnaeus, 1758)

    PubMed Central

    Zardus, John D.; Wares, John P.

    2016-01-01

    Microsatellite markers remain an important tool for ecological and evolutionary research, but are unavailable for many non-model organisms. One such organism with rare ecological and evolutionary features is the epizoic barnacle Chelonibia testudinaria (Linnaeus, 1758). Chelonibia testudinaria appears to be a host generalist, and has an unusual sexual system, androdioecy. Genetic studies on host specificity and mating behavior are impeded by the lack of fine-scale, highly variable markers, such as microsatellite markers. In the present study, we discovered thousands of new microsatellite loci from next-generation sequencing data, and characterized 12 loci thoroughly. We conclude that 11 of these loci will be useful markers in future ecological and evolutionary studies on C. testudinaria. PMID:27231653

  15. A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility

    PubMed Central

    2014-01-01

    Background Protein sites evolve at different rates due to functional and biophysical constraints. It is usually considered that the main structural determinant of a site’s rate of evolution is its Relative Solvent Accessibility (RSA). However, a recent comparative study has shown that the main structural determinant is the site’s Local Packing Density (LPD). LPD is related with dynamical flexibility, which has also been shown to correlate with sequence variability. Our purpose is to investigate the mechanism that connects a site’s LPD with its rate of evolution. Results We consider two models: an empirical Flexibility Model and a mechanistic Stress Model. The Flexibility Model postulates a linear increase of site-specific rate of evolution with dynamical flexibility. The Stress Model, introduced here, models mutations as random perturbations of the protein’s potential energy landscape, for which we use simple Elastic Network Models (ENMs). To account for natural selection we assume a single active conformation and use basic statistical physics to derive a linear relationship between site-specific evolutionary rates and the local stress of the mutant’s active conformation. We compare both models on a large and diverse dataset of enzymes. In a protein-by-protein study we found that the Stress Model outperforms the Flexibility Model for most proteins. Pooling all proteins together we show that the Stress Model is strongly supported by the total weight of evidence. Moreover, it accounts for the observed nonlinear dependence of sequence variability on flexibility. Finally, when mutational stress is controlled for, there is very little remaining correlation between sequence variability and dynamical flexibility. Conclusions We developed a mechanistic Stress Model of evolution according to which the rate of evolution of a site is predicted to depend linearly on the local mutational stress of the active conformation. Such local stress is proportional to LPD, so that this model explains the relationship between LPD and evolutionary rate. Moreover, the model also accounts for the nonlinear dependence between evolutionary rate and dynamical flexibility. PMID:24716445

  16. Metal-poor stars. IV - The evolution of red giants.

    NASA Technical Reports Server (NTRS)

    Rood, R. T.

    1972-01-01

    Detailed evolutionary calculations for six Population-II red giants are presented. The first five of these models are followed from the zero age main sequence to the onset of the helium flash. The sixth model allows the effect of direct electron-neutrino interactions to be estimated. The updated input physics and evolutionary code are described briefly. The results of the calculations are presented in a manner pertinent to later stages of evolutions and suitable for comparison with observations.

  17. Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing.

    PubMed

    Liu, Tiancheng; Yu, Lin; Liu, Lei; Li, Hong; Li, Yixue

    2015-01-01

    High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO) studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the "funnel-like" model and the "hourglass" model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.

  18. The current status of REH theory. [Random Evolutionary Hits in biological molecular evolution

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Jukes, T. H.

    1981-01-01

    A response is made to the evaluation of Fitch (1980) of REH (random evolutionary hits) theory for the evolutionary divergence of proteins and nucleic acids. Correct calculations for the beta hemoglobin mRNAs of the human, mouse and rabbit in the absence and presence of selective constraints are summarized, and it is shown that the alternative evolutionary analysis of Fitch underestimates the total fixed mutations. It is further shown that the model used by Fitch to test for the completeness of the count of total base substitutions is in fact a variant of REH theory. Considerations of the variance inherent in evolutionary estimations are also presented which show the REH model to produce no more variance than other evolutionary models. In the reply, it is argued that, despite the objections raised, REH theory applied to proteins gives inaccurate estimates of total gene substitutions. It is further contended that REH theory developed for nucleic sequences suffers from problems relating to the frequency of nucleotide substitutions, the identity of the codons accepting silent and amino acid-changing substitutions, and estimate uncertainties.

  19. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    PubMed Central

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357

  20. Inquiry-Based Learning of Molecular Phylogenetics

    ERIC Educational Resources Information Center

    Campo, Daniel; Garcia-Vazquez, Eva

    2008-01-01

    Reconstructing phylogenies from nucleotide sequences is a challenge for students because it strongly depends on evolutionary models and computer tools that are frequently updated. We present here an inquiry-based course aimed at learning how to trace a phylogeny based on sequences existing in public databases. Computer tools are freely available…

  1. Novel non-parametric models to estimate evolutionary rates and divergence times from heterochronous sequence data.

    PubMed

    Fourment, Mathieu; Holmes, Edward C

    2014-07-24

    Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.

  2. Field Guide to Plant Model Systems

    PubMed Central

    Chang, Caren; Bowman, John L.; Meyerowitz, Elliot M.

    2016-01-01

    For the past several decades, advances in plant development, physiology, cell biology, and genetics have relied heavily on the model (or reference) plant Arabidopsis thaliana. Arabidopsis resembles other plants, including crop plants, in many but by no means all respects. Study of Arabidopsis alone provides little information on the evolutionary history of plants, evolutionary differences between species, plants that survive in different environments, or plants that access nutrients and photosynthesize differently. Empowered by the availability of large-scale sequencing and new technologies for investigating gene function, many new plant models are being proposed and studied. PMID:27716506

  3. Evolutionary Dynamics and Diversity in Microbial Populations

    NASA Astrophysics Data System (ADS)

    Thompson, Joel; Fisher, Daniel

    2013-03-01

    Diseases such as flu and cancer adapt at an astonishing rate. In large part, viruses and cancers are so difficult to prevent because they are continually evolving. Controlling such ``evolutionary diseases'' requires a better understanding of the underlying evolutionary dynamics. It is conventionally assumed that adaptive mutations are rare and therefore will occur and sweep through the population in succession. Recent experiments using modern sequencing technologies have illuminated the many ways in which real population sequence data does not conform to the predictions of conventional theory. We consider a very simple model of asexual evolution and perform simulations in a range of parameters thought to be relevant for microbes and cancer. Simulation results reveal complex evolutionary dynamics typified by competition between lineages with different sets of adaptive mutations. This dynamical process leads to a distribution of mutant gene frequencies different than expected under the conventional assumption that adaptive mutations are rare. Simulated gene frequencies share several conspicuous features with data collected from laboratory-evolved yeast and the worldwide population of influenza.

  4. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    PubMed Central

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  5. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses

    PubMed Central

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A.; Janke, Axel

    2015-01-01

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. PMID:26019166

  6. Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model.

    PubMed

    Nené, Nuno R; Dunham, Alistair S; Illingworth, Christopher J R

    2018-05-01

    A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model. Copyright © 2018 Nené et al.

  7. Datasets for evolutionary comparative genomics

    PubMed Central

    Liberles, David A

    2005-01-01

    Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. PMID:16086856

  8. Historian: accurate reconstruction of ancestral sequences and evolutionary rates.

    PubMed

    Holmes, Ian H

    2017-04-15

    Reconstruction of ancestral sequence histories, and estimation of parameters like indel rates, are improved by using explicit evolutionary models and summing over uncertain alignments. The previous best tool for this purpose (according to simulation benchmarks) was ProtPal, but this tool was too slow for practical use. Historian combines an efficient reimplementation of the ProtPal algorithm with performance-improving heuristics from other alignment tools. Simulation results on fidelity of rate estimation via ancestral reconstruction, along with evaluations on the structurally informed alignment dataset BAliBase 3.0, recommend Historian over other alignment tools for evolutionary applications. Historian is available at https://github.com/evoldoers/historian under the Creative Commons Attribution 3.0 US license. ihholmes+historian@gmail.com. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  9. The Liverwort Contains a Lectin That Is Structurally and Evolutionary Related to the Monocot Mannose-Binding Lectins1

    PubMed Central

    Peumans, Willy J.; Barre, Annick; Bras, Julien; Rougé, Pierre; Proost, Paul; Van Damme, Els J.M.

    2002-01-01

    A mannose (Man)-binding lectin has been isolated and characterized from the thallus of the liverwort Marchantia polymorpha. N-terminal sequencing indicated that the M. polymorpha agglutinin (Marpola) shares sequence similarity with the superfamily of monocot Man-binding lectins. Searches in the databases yielded expressed sequence tags encoding Marpola. Sequence analysis, molecular modeling, and docking experiments revealed striking structural similarities between Marpola and the monocot Man-binding lectins. Activity and specificity studies further indicated that Marpola is a much stronger agglutinin than the Galanthus nivalis agglutinin and exhibits a preference for methylated Man and glucose, which is unprecedented within the family of monocot Man-binding lectins. The discovery of Marpola allows us, for the first time, to corroborate the evolutionary relationship between a lectin from a lower plant and a well-established lectin family from flowering plants. In addition, the identification of Marpola sheds a new light on the molecular evolution of the superfamily of monocot Man-binding lectins. Beside evolutionary considerations, the occurrence of a G. nivalis agglutinin homolog in a lower plant necessitates the rethinking of the physiological role of the whole family of monocot Man-binding lectins. PMID:12114560

  10. Using hidden Markov models and observed evolution to annotate viral genomes.

    PubMed

    McCauley, Stephen; Hein, Jotun

    2006-06-01

    ssRNA (single stranded) viral genomes are generally constrained in length and utilize overlapping reading frames to maximally exploit the coding potential within the genome length restrictions. This overlapping coding phenomenon leads to complex evolutionary constraints operating on the genome. In regions which code for more than one protein, silent mutations in one reading frame generally have a protein coding effect in another. To maximize coding flexibility in all reading frames, overlapping regions are often compositionally biased towards amino acids which are 6-fold degenerate with respect to the 64 codon alphabet. Previous methodologies have used this fact in an ad hoc manner to look for overlapping genes by motif matching. In this paper differentiated nucleotide compositional patterns in overlapping regions are incorporated into a probabilistic hidden Markov model (HMM) framework which is used to annotate ssRNA viral genomes. This work focuses on single sequence annotation and applies an HMM framework to ssRNA viral annotation. A description of how the HMM is parameterized, whilst annotating within a missing data framework is given. A Phylogenetic HMM (Phylo-HMM) extension, as applied to 14 aligned HIV2 sequences is also presented. This evolutionary extension serves as an illustration of the potential of the Phylo-HMM framework for ssRNA viral genomic annotation. The single sequence annotation procedure (SSA) is applied to 14 different strains of the HIV2 virus. Further results on alternative ssRNA viral genomes are presented to illustrate more generally the performance of the method. The results of the SSA method are encouraging however there is still room for improvement, and since there is overwhelming evidence to indicate that comparative methods can improve coding sequence (CDS) annotation, the SSA method is extended to a Phylo-HMM to incorporate evolutionary information. The Phylo-HMM extension is applied to the same set of 14 HIV2 sequences which are pre-aligned. The performance improvement that results from including the evolutionary information in the analysis is illustrated.

  11. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

    Treesearch

    Shannon C.K. Straub; Mark Fishbein; Tatyana Livshult; Zachary Foster; Matthew Parks; Kevin Weitemier; Richard C. Cronn; Aaron Liston

    2011-01-01

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in...

  12. In-silico studies of neutral drift for functional protein interaction networks

    NASA Astrophysics Data System (ADS)

    Ali, Md Zulfikar; Wingreen, Ned S.; Mukhopadhyay, Ranjan

    We have developed a minimal physically-motivated model of protein-protein interaction networks. Our system consists of two classes of enzymes, activators (e.g. kinases) and deactivators (e.g. phosphatases), and the enzyme-mediated activation/deactivation rates are determined by sequence-dependent binding strengths between enzymes and their targets. The network is evolved by introducing random point mutations in the binding sequences where we assume that each new mutation is either fixed or entirely lost. We apply this model to studies of neutral drift in networks that yield oscillatory dynamics, where we start, for example, with a relatively simple network and allow it to evolve by adding nodes and connections while requiring that dynamics be conserved. Our studies demonstrate both the importance of employing a sequence-based evolutionary scheme and the relative rapidity (in evolutionary time) for the redistribution of function over new nodes via neutral drift. Surprisingly, in addition to this redistribution time we discovered another much slower timescale for network evolution, reflecting hidden order in sequence space that we interpret in terms of sparsely connected domains.

  13. Systematic Error in Seed Plant Phylogenomics

    PubMed Central

    Zhong, Bojian; Deusch, Oliver; Goremykin, Vadim V.; Penny, David; Biggs, Patrick J.; Atherton, Robin A.; Nikiforova, Svetlana V.; Lockhart, Peter James

    2011-01-01

    Resolving the closest relatives of Gnetales has been an enigmatic problem in seed plant phylogeny. The problem is known to be difficult because of the extent of divergence between this diverse group of gymnosperms and their closest phylogenetic relatives. Here, we investigate the evolutionary properties of conifer chloroplast DNA sequences. To improve taxon sampling of Cupressophyta (non-Pinaceae conifers), we report sequences from three new chloroplast (cp) genomes of Southern Hemisphere conifers. We have applied a site pattern sorting criterion to study compositional heterogeneity, heterotachy, and the fit of conifer chloroplast genome sequences to a general time reversible + G substitution model. We show that non-time reversible properties of aligned sequence positions in the chloroplast genomes of Gnetales mislead phylogenetic reconstruction of these seed plants. When 2,250 of the most varied sites in our concatenated alignment are excluded, phylogenetic analyses favor a close evolutionary relationship between the Gnetales and Pinaceae—the Gnepine hypothesis. Our analytical protocol provides a useful approach for evaluating the robustness of phylogenomic inferences. Our findings highlight the importance of goodness of fit between substitution model and data for understanding seed plant phylogeny. PMID:22016337

  14. Maximizing ecological and evolutionary insight in bisulfite sequencing data sets

    PubMed Central

    Lea, Amanda J.; Vilgalys, Tauras P.; Durst, Paul A.P.; Tung, Jenny

    2017-01-01

    Preface Genome-scale bisulfite sequencing approaches have opened the door to ecological and evolutionary studies of DNA methylation in many organisms. These approaches can be powerful. However, they introduce new methodological and statistical considerations, some of which are particularly relevant to non-model systems. Here, we highlight how these considerations influence a study’s power to link methylation variation with a predictor variable of interest. Relative to current practice, we argue that sample sizes will need to increase to provide robust insights. We also provide recommendations for overcoming common challenges and an R Shiny app to aid in study design. PMID:29046582

  15. Evolutionary Dynamics on Protein Bi-stability Landscapes can Potentially Resolve Adaptive Conflicts

    PubMed Central

    Sikosek, Tobias; Bornberg-Bauer, Erich; Chan, Hue Sun

    2012-01-01

    Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bi-stable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed. PMID:23028272

  16. Toward a method for tracking virus evolutionary trajectory applied to the pandemic H1N1 2009 influenza virus.

    PubMed

    Squires, R Burke; Pickett, Brett E; Das, Sajal; Scheuermann, Richard H

    2014-12-01

    In 2009 a novel pandemic H1N1 influenza virus (H1N1pdm09) emerged as the first official influenza pandemic of the 21st century. Early genomic sequence analysis pointed to the swine origin of the virus. Here we report a novel computational approach to determine the evolutionary trajectory of viral sequences that uses data-driven estimations of nucleotide substitution rates to track the gradual accumulation of observed sequence alterations over time. Phylogenetic analysis and multiple sequence alignments show that sequences belonging to the resulting evolutionary trajectory of the H1N1pdm09 lineage exhibit a gradual accumulation of sequence variations and tight temporal correlations in the topological structure of the phylogenetic trees. These results suggest that our evolutionary trajectory analysis (ETA) can more effectively pinpoint the evolutionary history of viruses, including the host and geographical location traversed by each segment, when compared against either BLAST or traditional phylogenetic analysis alone. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Probabilistic modeling of the evolution of gene synteny within reconciled phylogenies

    PubMed Central

    2015-01-01

    Background Most models of genome evolution concern either genetic sequences, gene content or gene order. They sometimes integrate two of the three levels, but rarely the three of them. Probabilistic models of gene order evolution usually have to assume constant gene content or adopt a presence/absence coding of gene neighborhoods which is blind to complex events modifying gene content. Results We propose a probabilistic evolutionary model for gene neighborhoods, allowing genes to be inserted, duplicated or lost. It uses reconciled phylogenies, which integrate sequence and gene content evolution. We are then able to optimize parameters such as phylogeny branch lengths, or probabilistic laws depicting the diversity of susceptibility of syntenic regions to rearrangements. We reconstruct a structure for ancestral genomes by optimizing a likelihood, keeping track of all evolutionary events at the level of gene content and gene synteny. Ancestral syntenies are associated with a probability of presence. We implemented the model with the restriction that at most one gene duplication separates two gene speciations in reconciled gene trees. We reconstruct ancestral syntenies on a set of 12 drosophila genomes, and compare the evolutionary rates along the branches and along the sites. We compare with a parsimony method and find a significant number of results not supported by the posterior probability. The model is implemented in the Bio++ library. It thus benefits from and enriches the classical models and methods for molecular evolution. PMID:26452018

  18. MultiSeq: unifying sequence and structure data for evolutionary analysis

    PubMed Central

    Roberts, Elijah; Eargle, John; Wright, Dan; Luthey-Schulten, Zaida

    2006-01-01

    Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: PMID:16914055

  19. Phenotype–genotype correlation in Hirschsprung disease is illuminated by comparative analysis of the RET protein sequence

    PubMed Central

    Kashuk, Carl S.; Stone, Eric A.; Grice, Elizabeth A.; Portnoy, Matthew E.; Green, Eric D.; Sidow, Arend; Chakravarti, Aravinda; McCallion, Andrew S.

    2005-01-01

    The ability to discriminate between deleterious and neutral amino acid substitutions in the genes of patients remains a significant challenge in human genetics. The increasing availability of genomic sequence data from multiple vertebrate species allows inclusion of sequence conservation and physicochemical properties of residues to be used for functional prediction. In this study, the RET receptor tyrosine kinase serves as a model disease gene in which a broad spectrum (≥116) of disease-associated mutations has been identified among patients with Hirschsprung disease and multiple endocrine neoplasia type 2. We report the alignment of the human RET protein sequence with the orthologous sequences of 12 non-human vertebrates (eight mammalian, one avian, and three teleost species), their comparative analysis, the evolutionary topology of the RET protein, and predicted tolerance for all published missense mutations. We show that, although evolutionary conservation alone provides significant information to predict the effect of a RET mutation, a model that combines comparative sequence data with analysis of physiochemical properties in a quantitative framework provides far greater accuracy. Although the ability to discern the impact of a mutation is imperfect, our analyses permit substantial discrimination between predicted functional classes of RET mutations and disease severity even for a multigenic disease such as Hirschsprung disease. PMID:15956201

  20. Radiation transfer of models of massive star formation. III. The evolutionary sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Yichen; Tan, Jonathan C.; Hosokawa, Takashi, E-mail: yichen.zhang@yale.edu, E-mail: jt@astro.ufl.edu, E-mail: takashi.hosokawa@phys.s.u-tokyo.ac.jp

    2014-06-20

    We present radiation transfer simulations of evolutionary sequences of massive protostars forming from massive dense cores in environments of high mass surface densities, based on the Turbulent Core Model. The protostellar evolution is calculated with a multi-zone numerical model, with the accretion rate regulated by feedback from an evolving disk wind outflow cavity. The disk evolution is calculated assuming a fixed ratio of disk to protostellar mass, while the core envelope evolution assumes an inside-out collapse of the core with a fixed outer radius. In this framework, an evolutionary track is determined by three environmental initial conditions: the core massmore » M{sub c} , the mass surface density of the ambient clump Σ{sub cl}, and the ratio of the core's initial rotational to gravitational energy β {sub c}. Evolutionary sequences with various M{sub c} , Σ{sub cl}, and β {sub c} are constructed. We find that in a fiducial model with M{sub c} = 60 M {sub ☉}, Σ{sub cl} = 1 g cm{sup –2}, and β {sub c} = 0.02, the final mass of the protostar reaches at least ∼26 M {sub ☉}, making the final star formation efficiency ≳ 0.43. For each of the evolutionary tracks, radiation transfer simulations are performed at selected stages, with temperature profiles, spectral energy distributions (SEDs), and multiwavelength images produced. At a given stage, the envelope temperature depends strongly on Σ{sub cl}, with higher temperatures in a higher Σ{sub cl} core, but only weakly on M{sub c} . The SED and MIR images depend sensitively on the evolving outflow cavity, which gradually widens as the protostar grows. The fluxes at ≲ 100 μm increase dramatically, and the far-IR peaks move to shorter wavelengths. The influence of Σ{sub cl} and β {sub c} (which determines disk size) are discussed. We find that, despite scatter caused by different M{sub c} , Σ{sub cl}, β {sub c}, and inclinations, sources at a given evolutionary stage appear in similar regions of color-color diagrams, especially when using colors with fluxes at ≳ 70 μm, where scatter due to inclination is minimized, implying that such diagrams can be useful diagnostic tools for identifying the evolutionary stages of massive protostars. We discuss how intensity profiles along or perpendicular to the outflow axis are affected by environmental conditions and source evolution and can thus act as additional diagnostics of the massive star formation process.« less

  1. Field Guide to Plant Model Systems.

    PubMed

    Chang, Caren; Bowman, John L; Meyerowitz, Elliot M

    2016-10-06

    For the past several decades, advances in plant development, physiology, cell biology, and genetics have relied heavily on the model (or reference) plant Arabidopsis thaliana. Arabidopsis resembles other plants, including crop plants, in many but by no means all respects. Study of Arabidopsis alone provides little information on the evolutionary history of plants, evolutionary differences between species, plants that survive in different environments, or plants that access nutrients and photosynthesize differently. Empowered by the availability of large-scale sequencing and new technologies for investigating gene function, many new plant models are being proposed and studied. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses.

    PubMed

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A; Janke, Axel

    2015-05-27

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. Evolutionary biology through the lens of budding yeast comparative genomics.

    PubMed

    Marsit, Souhir; Leducq, Jean-Baptiste; Durand, Éléonore; Marchant, Axelle; Filteau, Marie; Landry, Christian R

    2017-10-01

    The budding yeast Saccharomyces cerevisiae is a highly advanced model system for studying genetics, cell biology and systems biology. Over the past decade, the application of high-throughput sequencing technologies to this species has contributed to this yeast also becoming an important model for evolutionary genomics. Indeed, comparative genomic analyses of laboratory, wild and domesticated yeast populations are providing unprecedented detail about many of the processes that govern evolution, including long-term processes, such as reproductive isolation and speciation, and short-term processes, such as adaptation to natural and domestication-related environments.

  4. Comparison of the theoretical and real-world evolutionary potential of a genetic circuit

    NASA Astrophysics Data System (ADS)

    Razo-Mejia, M.; Boedicker, J. Q.; Jones, D.; DeLuna, A.; Kinney, J. B.; Phillips, R.

    2014-04-01

    With the development of next-generation sequencing technologies, many large scale experimental efforts aim to map genotypic variability among individuals. This natural variability in populations fuels many fundamental biological processes, ranging from evolutionary adaptation and speciation to the spread of genetic diseases and drug resistance. An interesting and important component of this variability is present within the regulatory regions of genes. As these regions evolve, accumulated mutations lead to modulation of gene expression, which may have consequences for the phenotype. A simple model system where the link between genetic variability, gene regulation and function can be studied in detail is missing. In this article we develop a model to explore how the sequence of the wild-type lac promoter dictates the fold-change in gene expression. The model combines single-base pair resolution maps of transcription factor and RNA polymerase binding energies with a comprehensive thermodynamic model of gene regulation. The model was validated by predicting and then measuring the variability of lac operon regulation in a collection of natural isolates. We then implement the model to analyze the sensitivity of the promoter sequence to the regulatory output, and predict the potential for regulation to evolve due to point mutations in the promoter region.

  5. Phylogenetic tree and community structure from a Tangled Nature model.

    PubMed

    Canko, Osman; Taşkın, Ferhat; Argın, Kamil

    2015-10-07

    In evolutionary biology, the taxonomy and origination of species are widely studied subjects. An estimation of the evolutionary tree can be done via available DNA sequence data. The calculation of the tree is made by well-known and frequently used methods such as maximum likelihood and neighbor-joining. In order to examine the results of these methods, an evolutionary tree is pursued computationally by a mathematical model, called Tangled Nature. A relatively small genome space is investigated due to computational burden and it is found that the actual and predicted trees are in reasonably good agreement in terms of shape. Moreover, the speciation and the resulting community structure of the food-web are investigated by modularity. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Are there laws of genome evolution?

    PubMed

    Koonin, Eugene V

    2011-08-01

    Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.

  7. Detailed phylogenetic analysis of primate T-lymphotropic virus type 1 (PTLV-1) sequences from orangutans (Pongo pygmaeus) reveals new insights into the evolutionary history of PTLV-1 in Asia.

    PubMed

    Reid, Michael J C; Switzer, William M; Schillaci, Michael A; Ragonnet-Cronin, Manon; Joanisse, Isabelle; Caminiti, Kyna; Lowenberger, Carl A; Galdikas, Birute Mary F; Sandstrom, Paul A; Brooks, James I

    2016-09-01

    While human T-lymphotropic virus type 1 (HTLV-1) originates from ancient cross-species transmission of simian T-lymphotropic virus type 1 (STLV-1) from infected nonhuman primates, much debate exists on whether the first HTLV-1 occurred in Africa, or in Asia during early human evolution and migration. This topic is complicated by a lack of representative Asian STLV-1 to infer PTLV-1 evolutionary histories. In this study we obtained new STLV-1 LTR and tax sequences from a wild-born Bornean orangutan (Pongo pygmaeus) and performed detailed phylogenetic analyses using both maximum likelihood and Bayesian inference of available Asian PTLV-1 and African STLV-1 sequences. Phylogenies, divergence dates and nucleotide substitution rates were co-inferred and compared using six different molecular clock calibrations in a Bayesian framework, including both archaeological and/or nucleotide substitution rate calibrations. We then combined our molecular results with paleobiogeographical and ecological data to infer the most likely evolutionary history of PTLV-1. Based on the preferred models our analyses robustly inferred an Asian source for PTLV-1 with cross-species transmission of STLV-1 likely from a macaque (Macaca sp.) to an orangutan about 37.9-48.9kya, and to humans between 20.3-25.5kya. An orangutan diversification of STLV-1 commenced approximately 6.4-7.3kya. Our analyses also inferred that HTLV-1 was first introduced into Australia ~3.1-3.7kya, corresponding to both genetic and archaeological changes occurring in Australia at that time. Finally, HTLV-1 appears in Melanesia at ~2.3-2.7kya corresponding to the migration of the Lapita peoples into the region. Our results also provide an important future reference for calibrating information essential for PTLV evolutionary timescale inference. Longer sequence data, or full genomes from a greater representation of Asian primates, including gibbons, leaf monkeys, and Sumatran orangutans are needed to fully elucidate these evolutionary dates and relationships using the model criteria suggested herein. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and analyses of adaptive molecular evolution. Since not all genes in the floral transcriptome will be associated with flowering, these EST resources will also be of interest to plant scientists working on other functions, such as photosynthesis, signal transduction, and metabolic pathways. PMID:15799777

  9. Rapidly rotating polytropes in general relativity

    NASA Technical Reports Server (NTRS)

    Cook, Gregory B.; Shapiro, Stuart L.; Teukolsky, Saul A.

    1994-01-01

    We construct an extensive set of equilibrium sequences of rotating polytropes in general relativity. We determine a number of important physical parameters of such stars, including maximum mass and maximum spin rate. The stability of the configurations against quasi-radial perturbations is diagnosed. Two classes of evolutionary sequences of fixed rest mass and entropy are explored: normal sequences which behave very much like Newtonian evolutionary sequences, and supramassive sequences which exist solely because of relativistic effects. Dissipation leading to loss of angular momentum causes a star to evolve in a quasi-stationary fashion along an evolutionary sequence. Supramassive sequences evolve towards eventual catastrophic collapse to a black hole. Prior to collapse, the star must spin up as it loses angular momentum, an effect which may provide an observational precursor to gravitational collapse to a black hole.

  10. Impact of tree priors in species delimitation and phylogenetics of the genus Oligoryzomys (Rodentia: Cricetidae).

    PubMed

    da Cruz, Marcos de O R; Weksler, Marcelo

    2018-02-01

    The use of genetic data and tree-based algorithms to delimit evolutionary lineages is becoming an important practice in taxonomic identification, especially in morphologically cryptic groups. The effects of different phylogenetic and/or coalescent models in the analyses of species delimitation, however, are not clear. In this paper, we assess the impact of different evolutionary priors in phylogenetic estimation, species delimitation, and molecular dating of the genus Oligoryzomys (Mammalia: Rodentia), a group with complex taxonomy and morphological cryptic species. Phylogenetic and coalescent analyses included 20 of the 24 recognized species of the genus, comprising of 416 Cytochrome b sequences, 26 Cytochrome c oxidase I sequences, and 27 Beta-Fibrinogen Intron 7 sequences. For species delimitation, we employed the General Mixed Yule Coalescent (GMYC) and Bayesian Poisson tree processes (bPTP) analyses, and contrasted 4 genealogical and phylogenetic models: Pure-birth (Yule), Constant Population Size Coalescent, Multiple Species Coalescent, and a mixed Yule-Coalescent model. GMYC analyses of trees from different genealogical models resulted in similar species delimitation and phylogenetic relationships, with incongruence restricted to areas of poor nodal support. bPTP results, however, significantly differed from GMYC for 5 taxa. Oligoryzomys early diversification was estimated to have occurred in the Early Pleistocene, between 0.7 and 2.6 MYA. The mixed Yule-Coalescent model, however, recovered younger dating estimates for Oligoryzomys diversification, and for the threshold for the speciation-coalescent horizon in GMYC. Eight of the 20 included Oligoryzomys species were identified as having two or more independent evolutionary units, indicating that current taxonomy of Oligoryzomys is still unsettled. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Evolutionary sequences of very hot, low-mass, accreting white dwarfs with application to symbiotic variables and ultrasoft/supersoft low-luminosity x-ray sources

    NASA Technical Reports Server (NTRS)

    Sion, Edward M.; Starrfield, Sumner G.

    1994-01-01

    We present the first detailed model results of quasi-static evolutionary sequences of very hot low-mass white dwarfs accreting hydrogen-rich material at rates between 1 x 10(exp -7) and 1 x 10(exp -9) solar mass/yr. Most of the sequences were generated from starting models whose core thermal structures were not thermally relaxed in the thermal pulse cycle-averaged sense of an asymptotic giant branch stellar core. Hence, the evolution at constant accretion rate was not invariably characterized by series of identical shell flashes. Sequences exhibiting stable steady state nuclear burning at the accretion supply rate as well as sequences exhibiting recurrent thermonuclear shell flashes are presented and discussed. In some cases, the white dwarf accretors remain small (less than 10(exp 11) cm) and very hot even during the shell flash episode. They then experience continued but reduced hydrogen shell burning during the longer quiescent intervals while their surface temperatures increase both because of compressional heating and envelope structure readjustment in response to accretion over thousands of years. Both accretion and continued hydrogen burning power these models with luminosities of a few times 10(exp 37) ergs/s. We suggest that the physical properties of these model sequences are of considerable relevance to the observed outburst and quiescent behavior of those symbiotic variables and symbiotic novae containing low-mass white dwarfs. We also suggest that our models are relevant to the observational characteristics of the growing class of low-luminosity, supersoft/ultrasoft X-ray sources in globular clusters, and the Magellanic Clouds.

  12. PAQ: Partition Analysis of Quasispecies.

    PubMed

    Baccam, P; Thompson, R J; Fedrigo, O; Carpenter, S; Cornette, J L

    2001-01-01

    The complexities of genetic data may not be accurately described by any single analytical tool. Phylogenetic analysis is often used to study the genetic relationship among different sequences. Evolutionary models and assumptions are invoked to reconstruct trees that describe the phylogenetic relationship among sequences. Genetic databases are rapidly accumulating large amounts of sequences. Newly acquired sequences, which have not yet been characterized, may require preliminary genetic exploration in order to build models describing the evolutionary relationship among sequences. There are clustering techniques that rely less on models of evolution, and thus may provide nice exploratory tools for identifying genetic similarities. Some of the more commonly used clustering methods perform better when data can be grouped into mutually exclusive groups. Genetic data from viral quasispecies, which consist of closely related variants that differ by small changes, however, may best be partitioned by overlapping groups. We have developed an intuitive exploratory program, Partition Analysis of Quasispecies (PAQ), which utilizes a non-hierarchical technique to partition sequences that are genetically similar. PAQ was used to analyze a data set of human immunodeficiency virus type 1 (HIV-1) envelope sequences isolated from different regions of the brain and another data set consisting of the equine infectious anemia virus (EIAV) regulatory gene rev. Analysis of the HIV-1 data set by PAQ was consistent with phylogenetic analysis of the same data, and the EIAV rev variants were partitioned into two overlapping groups. PAQ provides an additional tool which can be used to glean information from genetic data and can be used in conjunction with other tools to study genetic similarities and genetic evolution of viral quasispecies.

  13. Sequence of the tomato chloroplast DNA and evolutionary comparison of solanaceous plastid genomes.

    PubMed

    Kahlau, Sabine; Aspinall, Sue; Gray, John C; Bock, Ralph

    2006-08-01

    Tomato, Solanum lycopersicum (formerly Lycopersicon esculentum), has long been one of the classical model species of plant genetics. More recently, solanaceous species have become a model of evolutionary genomics, with several EST projects and a tomato genome project having been initiated. As a first contribution toward deciphering the genetic information of tomato, we present here the complete sequence of the tomato chloroplast genome (plastome). The size of this circular genome is 155,461 base pairs (bp), with an average AT content of 62.14%. It contains 114 genes and conserved open reading frames (ycfs). Comparison with the previously sequenced plastid DNAs of Nicotiana tabacum and Atropa belladonna reveals patterns of plastid genome evolution in the Solanaceae family and identifies varying degrees of conservation of individual plastid genes. In addition, we discovered several new sites of RNA editing by cytidine-to-uridine conversion. A detailed comparison of editing patterns in the three solanaceous species highlights the dynamics of RNA editing site evolution in chloroplasts. To assess the level of intraspecific plastome variation in tomato, the plastome of a second tomato cultivar was sequenced. Comparison of the two genotypes (IPA-6, bred in South America, and Ailsa Craig, bred in Europe) revealed no nucleotide differences, suggesting that the plastomes of modern tomato cultivars display very little, if any, sequence variation.

  14. Neutral Evolution of Duplicated DNA: An Evolutionary Stick-Breaking Process Causes Scale-Invariant Behavior

    NASA Astrophysics Data System (ADS)

    Massip, Florian; Arndt, Peter F.

    2013-04-01

    Recently, an enrichment of identical matching sequences has been found in many eukaryotic genomes. Their length distribution exhibits a power law tail raising the question of what evolutionary mechanism or functional constraints would be able to shape this distribution. Here we introduce a simple and evolutionarily neutral model, which involves only point mutations and segmental duplications, and produces the same statistical features as observed for genomic data. Further, we extend a mathematical model for random stick breaking to analytically show that the exponent of the power law tail is -3 and universal as it does not depend on the microscopic details of the model.

  15. An Automated Pipeline for Engineering Many-Enzyme Pathways: Computational Sequence Design, Pathway Expression-Flux Mapping, and Scalable Pathway Optimization.

    PubMed

    Halper, Sean M; Cetnar, Daniel P; Salis, Howard M

    2018-01-01

    Engineering many-enzyme metabolic pathways suffers from the design curse of dimensionality. There are an astronomical number of synonymous DNA sequence choices, though relatively few will express an evolutionary robust, maximally productive pathway without metabolic bottlenecks. To solve this challenge, we have developed an integrated, automated computational-experimental pipeline that identifies a pathway's optimal DNA sequence without high-throughput screening or many cycles of design-build-test. The first step applies our Operon Calculator algorithm to design a host-specific evolutionary robust bacterial operon sequence with maximally tunable enzyme expression levels. The second step applies our RBS Library Calculator algorithm to systematically vary enzyme expression levels with the smallest-sized library. After characterizing a small number of constructed pathway variants, measurements are supplied to our Pathway Map Calculator algorithm, which then parameterizes a kinetic metabolic model that ultimately predicts the pathway's optimal enzyme expression levels and DNA sequences. Altogether, our algorithms provide the ability to efficiently map the pathway's sequence-expression-activity space and predict DNA sequences with desired metabolic fluxes. Here, we provide a step-by-step guide to applying the Pathway Optimization Pipeline on a desired multi-enzyme pathway in a bacterial host.

  16. Rapidly rotating neutron stars in general relativity: Realistic equations of state

    NASA Technical Reports Server (NTRS)

    Cook, Gregory B.; Shapiro, Stuart L.; Teukolsky, Saul A.

    1994-01-01

    We construct equilibrium sequences of rotating neutron stars in general relativity. We compare results for 14 nuclear matter equations of state. We determine a number of important physical parameters for such stars, including the maximum mass and maximum spin rate. The stability of the configurations to quasi-radial perturbations is assessed. We employ a numerical scheme particularly well suited to handle rapid rotation and large departures from spherical symmetry. We provide an extensive tabulation of models for future reference. Two classes of evolutionary sequences of fixed baryon rest mass and entropy are explored: normal sequences, which behave very much like Newtonian sequences, and supramassive sequences, which exist for neutron stars solely because of general relativistic effects. Adiabatic dissipation of energy and angular momentum causes a star to evolve in quasi-stationary fashion along an evolutionary sequence. Supramassive sequences have masses exceeding the maximum mass of a nonrotating neutron star. A supramassive star evolves toward eventual catastrophic collapse to a black hole. Prior to collapse, the star actually spins up as it loses angular momentum, an effect that may provide an observable precursor to gravitational collapse to a black hole.

  17. Modelling and strategy optimisation for a kind of networked evolutionary games with memories under the bankruptcy mechanism

    NASA Astrophysics Data System (ADS)

    Fu, Shihua; Li, Haitao; Zhao, Guodong

    2018-05-01

    This paper investigates the evolutionary dynamic and strategy optimisation for a kind of networked evolutionary games whose strategy updating rules incorporate 'bankruptcy' mechanism, and the situation that each player's bankruptcy is due to the previous continuous low profits gaining from the game is considered. First, by using semi-tensor product of matrices method, the evolutionary dynamic of this kind of games is expressed as a higher order logical dynamic system and then converted into its algebraic form, based on which, the evolutionary dynamic of the given games can be discussed. Second, the strategy optimisation problem is investigated, and some free-type control sequences are designed to maximise the total payoff of the whole game. Finally, an illustrative example is given to show that our new results are very effective.

  18. Evolution of X-ray activity of 1-3 Msun late-type stars in early post-main-sequence phases

    NASA Astrophysics Data System (ADS)

    Pizzolato, N.; Maggio, A.; Sciortino, S.

    2000-09-01

    We have investigated the variation of coronal X-ray emission during early post-main-sequence phases for a sample of 120 late-type stars within 100 pc, and with estimated masses in the range 1-3 Msun, based on Hipparcos parallaxes and recent evolutionary models. These stars were observed with the ROSAT/PSPC, and the data processed with the Palermo-CfA pipeline, including detection and evaluation of X-ray fluxes (or upper limits) by means of a wavelet transform algorithm. We have studied the evolutionary history of X-ray luminosity and surface flux for stars in selected mass ranges, including stars with inactive A-type progenitors on the main sequence and lower mass solar-type stars. Our stellar sample suggests a trend of increasing X-ray emission level with age for stars with masses M > 1.5 Msun, and a decline for lower-mass stars. A similar behavior holds for the average coronal temperature, which follows a power-law correlation with the X-ray luminosity, independently of their mass and evolutionary state. We have also studied the relationship between X-ray luminosity and surface rotation rate for stars in the same mass ranges, and how this relationships departs from the Lx ~ vrot2 law followed by main-sequence stars. Our results are interpreted in terms of a magnetic dynamo whose efficiency depends on the stellar evolutionary state through the mass-dependent changes of the stellar internal structure, including the properties of envelope convection and the internal rotation profile.

  19. The Sphagnome Project: enabling ecological and evolutionary insights through a genus-level sequencing project

    DOE PAGES

    Weston, David J.; Turetsky, Merritt R.; Johnson, Matthew G.; ...

    2017-10-27

    Considerable progress has been made in ecological and evolutionary genetics with studies demonstrating how genes underlying plant and microbial traits can influence adaptation and even ‘extend’ to influence community structure and ecosystem level processes. The progress in this area is limited to model systems with deep genetic and genomic resources that often have negligible ecological impact or interest. Therefore, important linkages between genetic adaptations and their consequences at organismal and ecological scales are often lacking. We introduce the Sphagnome Project, which incorporates genomics into a long-running history of Sphagnum research that has documented unparalleled contributions to peatland ecology, carbon sequestration,more » biogeochemistry, microbiome research, niche construction, and ecosystem engineering. The Sphagnome Project encompasses a genus-level sequencing effort that represents a new type of model system driven not only by genetic tractability, but by ecologically relevant questions and hypotheses.« less

  20. The Sphagnome Project: enabling ecological and evolutionary insights through a genus-level sequencing project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weston, David J.; Turetsky, Merritt R.; Johnson, Matthew G.

    Considerable progress has been made in ecological and evolutionary genetics with studies demonstrating how genes underlying plant and microbial traits can influence adaptation and even ‘extend’ to influence community structure and ecosystem level processes. The progress in this area is limited to model systems with deep genetic and genomic resources that often have negligible ecological impact or interest. Therefore, important linkages between genetic adaptations and their consequences at organismal and ecological scales are often lacking. We introduce the Sphagnome Project, which incorporates genomics into a long-running history of Sphagnum research that has documented unparalleled contributions to peatland ecology, carbon sequestration,more » biogeochemistry, microbiome research, niche construction, and ecosystem engineering. The Sphagnome Project encompasses a genus-level sequencing effort that represents a new type of model system driven not only by genetic tractability, but by ecologically relevant questions and hypotheses.« less

  1. The Sphagnome Project: enabling ecological and evolutionary insights through a genus-level sequencing project.

    PubMed

    Weston, David J; Turetsky, Merritt R; Johnson, Matthew G; Granath, Gustaf; Lindo, Zoë; Belyea, Lisa R; Rice, Steven K; Hanson, David T; Engelhardt, Katharina A M; Schmutz, Jeremy; Dorrepaal, Ellen; Euskirchen, Eugénie S; Stenøien, Hans K; Szövényi, Péter; Jackson, Michelle; Piatkowski, Bryan T; Muchero, Wellington; Norby, Richard J; Kostka, Joel E; Glass, Jennifer B; Rydin, Håkan; Limpens, Juul; Tuittila, Eeva-Stiina; Ullrich, Kristian K; Carrell, Alyssa; Benscoter, Brian W; Chen, Jin-Gui; Oke, Tobi A; Nilsson, Mats B; Ranjan, Priya; Jacobson, Daniel; Lilleskov, Erik A; Clymo, R S; Shaw, A Jonathan

    2018-01-01

    Considerable progress has been made in ecological and evolutionary genetics with studies demonstrating how genes underlying plant and microbial traits can influence adaptation and even 'extend' to influence community structure and ecosystem level processes. Progress in this area is limited to model systems with deep genetic and genomic resources that often have negligible ecological impact or interest. Thus, important linkages between genetic adaptations and their consequences at organismal and ecological scales are often lacking. Here we introduce the Sphagnome Project, which incorporates genomics into a long-running history of Sphagnum research that has documented unparalleled contributions to peatland ecology, carbon sequestration, biogeochemistry, microbiome research, niche construction, and ecosystem engineering. The Sphagnome Project encompasses a genus-level sequencing effort that represents a new type of model system driven not only by genetic tractability, but by ecologically relevant questions and hypotheses. © 2017 UT-Battelle New Phytologist © 2017 New Phytologist Trust.

  2. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    PubMed

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  3. BEYOND THE MAIN SEQUENCE: TESTING THE ACCURACY OF STELLAR MASSES PREDICTED BY THE PARSEC EVOLUTIONARY TRACKS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghezzi, Luan; Johnson, John Asher, E-mail: lghezzi@cfa.harvard.edu

    2015-10-20

    Characterizing the physical properties of exoplanets and understanding their formation and orbital evolution requires precise and accurate knowledge of their host stars. Accurately measuring stellar masses is particularly important because they likely influence planet occurrence and the architectures of planetary systems. Single main-sequence stars typically have masses estimated from evolutionary tracks, which generally provide accurate results due to their extensive empirical calibration. However, the validity of this method for subgiants and giants has been called into question by recent studies, with suggestions that the masses of these evolved stars could have been overestimated. We investigate these concerns using a samplemore » of 59 benchmark evolved stars with model-independent masses (from binary systems or asteroseismology) obtained from the literature. We find very good agreement between these benchmark masses and the ones estimated using evolutionary tracks. The average fractional difference in the mass interval ∼0.7–4.5 M{sub ⊙} is consistent with zero (−1.30 ± 2.42%), with no significant trends in the residuals relative to the input parameters. A good agreement between model-dependent and -independent radii (−4.81 ± 1.32%) and surface gravities (0.71 ± 0.51%) is also found. The consistency between independently determined ages for members of binary systems adds further support for the accuracy of the method employed to derive the stellar masses. Taken together, our results indicate that determination of masses of evolved stars using grids of evolutionary tracks is not significantly affected by systematic errors, and is thus valid for estimating the masses of isolated stars beyond the main sequence.« less

  4. An improved approximate-Bayesian model-choice method for estimating shared evolutionary history

    PubMed Central

    2014-01-01

    Background To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. Results By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. Conclusions The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support. PMID:24992937

  5. Speciation and Neutral Molecular Evolution in One-Dimensional Closed Population

    NASA Astrophysics Data System (ADS)

    Semovski, Sergei V.; Bukin, Yuri S.; Sherbakov, Dmitry Yu.

    Models are presented suitable for a description of speciation processes arising due to reproductive isolation depending on genetic distance. The main attention is paid to the model of a one-dimensional closed population, which describes the evolution of littoral benthic organisms. In order to correspond the modeling results to the results obtained in the course of experimental phylogenetic studies, all individual-based models described here involve neutrally evolving and maternally inherited DNA sequence. Sub-samples of the resulting sequences were used for a posteriori phylogenetic inferences which then were compared to the "true" evolutionary histories.

  6. Evolutionary advantage via common action of recombination and neutrality

    NASA Astrophysics Data System (ADS)

    Saakian, David B.; Hu, Chin-Kun

    2013-11-01

    We investigate evolution models with recombination and neutrality. We consider the Crow-Kimura (parallel) mutation-selection model with the neutral fitness landscape, in which there is a central peak with high fitness A, and some of 1-point mutants have the same high fitness A, while the fitness of other sequences is 0. We find that the effect of recombination and neutrality depends on the concrete version of both neutrality and recombination. We consider three versions of neutrality: (a) all the nearest neighbor sequences of the peak sequence have the same high fitness A; (b) all the l-point mutations in a piece of genome of length l≥1 are neutral; (c) the neutral sequences are randomly distributed among the nearest neighbors of the peak sequences. We also consider three versions of recombination: (I) the simple horizontal gene transfer (HGT) of one nucleotide; (II) the exchange of a piece of genome of length l, HGT-l; (III) two-point crossover recombination (2CR). For the case of (a), the 2CR gives a rather strong contribution to the mean fitness, much stronger than that of HGT for a large genome length L. For the random distribution of neutral sequences there is a critical degree of neutrality νc, and for μ<μc and (μc-μ) is not large, the 2CR suppresses the mean fitness while HGT increases it; for ν much larger than νc, the 2CR and HGT-l increase the mean fitness larger than that of the HGT. We also consider the recombination in the case of smooth fitness landscapes. The recombination gives some advantage in the evolutionary dynamics, where recombination distinguishes clearly the mean-field-like evolutionary factors from the fluctuation-like ones. By contrast, mutations affect the mean-field-like and fluctuation-like factors similarly. Consequently, recombination can accelerate the non-mean-field (fluctuation) type dynamics without considerably affecting the mean-field-like factors.

  7. Tests of two convection theories for red giant and red supergiant envelopes

    NASA Technical Reports Server (NTRS)

    Stothers, Richard B.; Chin, Chao-Wen

    1995-01-01

    Two theories of stellar envelope convection are considered here in the context of red giants and red supergiants of intermediate to high mass: Boehm-Vitense's standard mixing-length theory (MLT) and Canuto & Mazzitelli's new theory incorporating the full spectrum of turbulence (FST). Both theories assume incompressible convection. Two formulations of the convective mixing length are also evaluated: l proportional to the local pressure scale height (H(sub P)) and l proportional to the distance from the upper boundary of the convection zone (z). Applications to test both theories are made by calculating stellar evolutionary sequences into the red zone (z). Applications to test both theories are made by calculating stellar evolutionary sequences into the red phase of core helium burning. Since the theoretically predicted effective temperatures for cool stars are known to be sensitive to the assigned value of the mixing length, this quantity has been individually calibrated for each evolutionary sequence. The calibration is done in a composite Hertzsprung-Russell diagram for the red giant and red supergiant members of well-observed Galactic open clusters. The MLT model requires the constant of proportionality for the convective mixing length to vary by a small but statistically significant amount with stellar mass, whereas the FST model succeeds in all cases with the mixing lenghth simply set equal to z. The structure of the deep stellar interior, however, remains very nearly unaffected by the choices of convection theory and mixing lenghth. Inside the convective envelope itself, a density inversion always occurs, but is somewhat smaller for the convectively more efficient MLT model. On physical grounds the FST model is preferable, and seems to alleviate the problem of finding the proper mixing length.

  8. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  9. On the path to genetic novelties: insights from programmed DNA elimination and RNA splicing.

    PubMed

    Catania, Francesco; Schmitz, Jürgen

    2015-01-01

    Understanding how genetic novelties arise is a central goal of evolutionary biology. To this end, programmed DNA elimination and RNA splicing deserve special consideration. While programmed DNA elimination reshapes genomes by eliminating chromatin during organismal development, RNA splicing rearranges genetic messages by removing intronic regions during transcription. Small RNAs help to mediate this class of sequence reorganization, which is not error-free. It is this imperfection that makes programmed DNA elimination and RNA splicing excellent candidates for generating evolutionary novelties. Leveraging a number of these two processes' mechanistic and evolutionary properties, which have been uncovered over the past years, we present recently proposed models and empirical evidence for how splicing can shape the structure of protein-coding genes in eukaryotes. We also chronicle a number of intriguing similarities between the processes of programmed DNA elimination and RNA splicing, and highlight the role that the variation in the population-genetic environment may play in shaping their target sequences. © 2015 Wiley Periodicals, Inc.

  10. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis.

    PubMed

    Kumar, Sudhir; Stecher, Glen; Peterson, Daniel; Tamura, Koichiro

    2012-10-15

    There is a growing need in the research community to apply the molecular evolutionary genetics analysis (MEGA) software tool for batch processing a large number of datasets and to integrate it into analysis workflows. Therefore, we now make available the computing core of the MEGA software as a stand-alone executable (MEGA-CC), along with an analysis prototyper (MEGA-Proto). MEGA-CC provides users with access to all the computational analyses available through MEGA's graphical user interface version. This includes methods for multiple sequence alignment, substitution model selection, evolutionary distance estimation, phylogeny inference, substitution rate and pattern estimation, tests of natural selection and ancestral sequence inference. Additionally, we have upgraded the source code for phylogenetic analysis using the maximum likelihood methods for parallel execution on multiple processors and cores. Here, we describe MEGA-CC and outline the steps for using MEGA-CC in tandem with MEGA-Proto for iterative and automated data analysis. http://www.megasoftware.net/.

  11. Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.

    PubMed

    Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric

    2016-08-30

    Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.

  12. Evolution of high-mass star-forming regions .

    NASA Astrophysics Data System (ADS)

    Giannetti, A.; Leurini, S.; Wyrowski, F.; Urquhart, J.; König, C.; Csengeri, T.; Güsten, R.; Menten, K. M.

    Observational identification of a coherent evolutionary sequence for high-mass star-forming regions is still missing. We use the progressive heating of the gas caused by the feedback of high-mass young stellar objects to prove the statistical validity of the most common schemes used to observationally define an evolutionary sequence for high-mass clumps, and identify which physical process dominates in the different phases. From the spectroscopic follow-ups carried out towards the TOP100 sample between 84 and 365 km s^-1 giga hertz, we selected several multiplets of CH3CN, CH3CCH, and CH3OH lines to derive the physical properties of the gas in the clumps along the evolutionary sequence. We demonstrate that the evolutionary sequence is statistically valid, and we define intervals in L/M separating the compression, collapse and accretion, and disruption phases. The first hot cores and ZAMS stars appear at L/M≈10usk {L_ȯ}msun-1

  13. Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations

    PubMed Central

    Beleva Guthrie, Violeta; Masica, David L; Fraser, Andrew; Federico, Joseph; Fan, Yunfan; Camps, Manel; Karchin, Rachel

    2018-01-01

    Abstract The evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure. PMID:29522102

  14. Evolutionary origin and phylogeny of the modern holocephalans (Chondrichthyes: Chimaeriformes): a mitogenomic perspective.

    PubMed

    Inoue, Jun G; Miya, Masaki; Lam, Kevin; Tay, Boon-Hui; Danks, Janine A; Bell, Justin; Walker, Terrence I; Venkatesh, Byrappa

    2010-11-01

    With our increasing ability for generating whole-genome sequences, comparative analysis of whole genomes has become a powerful tool for understanding the structure, function, and evolutionary history of human and other vertebrate genomes. By virtue of their position basal to bony vertebrates, cartilaginous fishes (class Chondrichthyes) are a valuable outgroup in comparative studies of vertebrates. Recently, a holocephalan cartilaginous fish, the elephant shark, Callorhinchus milii (Subclass Holocephali: Order Chimaeriformes), has been proposed as a model genome, and low-coverage sequence of its genome has been generated. Despite such an increasing interest, the evolutionary history of the modern holocephalans-a previously successful and diverse group but represented by only 39 extant species-and their relationship with elasmobranchs and other jawed vertebrates has been poorly documented largely owing to a lack of well-preserved fossil materials after the end-Permian about 250 Ma. In this study, we assembled the whole mitogenome sequences for eight representatives from all the three families of the modern holocephalans and investigated their phylogenetic relationships and evolutionary history. Unambiguously aligned sequences from these holocephalans together with 17 other vertebrates (9,409 nt positions excluding entire third codon positions) were subjected to partitioned maximum likelihood analysis. The resulting tree strongly supported a single origin of the modern holocephalans and their sister-group relationship with elasmobranchs. The mitogenomic tree recovered the most basal callorhinchids within the chimaeriforms, which is sister to a clade comprising the remaining two families (rhinochimaerids and chimaerids). The timetree derived from a relaxed molecular clock Bayesian method suggests that the holocephalans originated in the Silurian about 420 Ma, having survived from the end-Permian (250 Ma) mass extinction and undergoing familial diversifications during the late Jurassic to early Cretaceous (170-120 Ma). This postulated evolutionary scenario agrees well with that based on the paleontological observations.

  15. Functional versus non-functional intratumor heterogeneity in cancer

    PubMed Central

    Williams, Marc J.; Werner, Benjamin; Graham, Trevor A.; Sottoriva, Andrea

    2016-01-01

    ABSTRACT Next-generation sequencing data from human cancers are often difficult to interpret within the context of tumor evolution. We developed a mathematical model describing the accumulation of mutations under neutral evolutionary dynamics and showed that 323/904 cancers (∼30%) from multiple types were consistent with the neutral model of tumor evolution. PMID:27652316

  16. Molecular selection in a unified evolutionary sequence

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  17. Protein Structure Determination using Metagenome sequence data

    PubMed Central

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David

    2017-01-01

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891

  18. EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

    PubMed Central

    Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

    2006-01-01

    Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150

  19. Evidence of the evolved nature of the B[e] star MWC 137

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muratore, M. F.; Arias, M. L.; Cidale, L.

    2015-01-01

    The evolutionary phase of B[e] stars is difficult to establish due to the uncertainties in their fundamental parameters. For instance, possible classifications for the Galactic B[e] star MWC 137 include pre-main-sequence and post-main-sequence phases, with a large range in luminosity. Our goal is to clarify the evolutionary stage of this peculiar object, and to study the CO molecular component of its circumstellar medium. To this purpose, we modeled the CO molecular bands using high-resolution K-band spectra. We find that MWC 137 is surrounded by a detached cool (T=1900±100 K) and dense (N=(3±1)×10{sup 21} cm{sup −2}) ring of CO gas orbitingmore » the star with a rotational velocity, projected to the line of sight, of 84 ± 2 km s{sup −1}. We also find that the molecular gas is enriched in the isotope {sup 13}C, excluding the classification of the star as a Herbig Be. The observed isotopic abundance ratio ({sup 12}C/{sup 13}C = 25 ± 2) derived from our modeling is compatible with a proto-planetary nebula, main-sequence, or supergiant evolutionary phase. However, based on some observable characteristics of MWC 137, we propose that the supergiant scenario seems to be the most plausible. Hence, we suggest that MWC 137 could be in an extremely short-lived phase, evolving from a B[e] supergiant to a blue supergiant with a bipolar ring nebula.« less

  20. Effects of Mitochondrial DNA Rate Variation on Reconstruction of Pleistocene Demographic History in a Social Avian Species, Pomatostomus superciliosus

    PubMed Central

    Norman, Janette A.; Blackmore, Caroline J.; Rourke, Meaghan; Christidis, Les

    2014-01-01

    Mitochondrial sequence data is often used to reconstruct the demographic history of Pleistocene populations in an effort to understand how species have responded to past climate change events. However, departures from neutral equilibrium conditions can confound evolutionary inference in species with structured populations or those that have experienced periods of population expansion or decline. Selection can affect patterns of mitochondrial DNA variation and variable mutation rates among mitochondrial genes can compromise inferences drawn from single markers. We investigated the contribution of these factors to patterns of mitochondrial variation and estimates of time to most recent common ancestor (TMRCA) for two clades in a co-operatively breeding avian species, the white-browed babbler Pomatostomus superciliosus. Both the protein-coding ND3 gene and hypervariable domain I control region sequences showed departures from neutral expectations within the superciliosus clade, and a two-fold difference in TMRCA estimates. Bayesian phylogenetic analysis provided evidence of departure from a strict clock model of molecular evolution in domain I, leading to an over-estimation of TMRCA for the superciliosus clade at this marker. Our results suggest mitochondrial studies that attempt to reconstruct Pleistocene demographic histories should rigorously evaluate data for departures from neutral equilibrium expectations, including variation in evolutionary rates across multiple markers. Failure to do so can lead to serious errors in the estimation of evolutionary parameters and subsequent demographic inferences concerning the role of climate as a driver of evolutionary change. These effects may be especially pronounced in species with complex social structures occupying heterogeneous environments. We propose that environmentally driven differences in social structure may explain observed differences in evolutionary rate of domain I sequences, resulting from longer than expected retention times for matriarchal lineages in the superciliosus clade. PMID:25181547

  1. The Awesome Power of Yeast Evolutionary Genetics: New Genome Sequences and Strain Resources for the Saccharomyces sensu stricto Genus

    PubMed Central

    Scannell, Devin R.; Zill, Oliver A.; Rokas, Antonis; Payen, Celia; Dunham, Maitreya J.; Eisen, Michael B.; Rine, Jasper; Johnston, Mark; Hittinger, Chris Todd

    2011-01-01

    High-quality, well-annotated genome sequences and standardized laboratory strains fuel experimental and evolutionary research. We present improved genome sequences of three species of Saccharomyces sensu stricto yeasts: S. bayanus var. uvarum (CBS 7001), S. kudriavzevii (IFO 1802T and ZP 591), and S. mikatae (IFO 1815T), and describe their comparison to the genomes of S. cerevisiae and S. paradoxus. The new sequences, derived by assembling millions of short DNA sequence reads together with previously published Sanger shotgun reads, have vastly greater long-range continuity and far fewer gaps than the previously available genome sequences. New gene predictions defined a set of 5261 protein-coding orthologs across the five most commonly studied Saccharomyces yeasts, enabling a re-examination of the tempo and mode of yeast gene evolution and improved inferences of species-specific gains and losses. To facilitate experimental investigations, we generated genetically marked, stable haploid strains for all three of these Saccharomyces species. These nearly complete genome sequences and the collection of genetically marked strains provide a valuable toolset for comparative studies of gene function, metabolism, and evolution, and render Saccharomyces sensu stricto the most experimentally tractable model genus. These resources are freely available and accessible through www.SaccharomycesSensuStricto.org. PMID:22384314

  2. Evolutionary and Functional Relationships in the Truncated Hemoglobin Family.

    PubMed

    Bustamante, Juan P; Radusky, Leandro; Boechi, Leonardo; Estrin, Darío A; Ten Have, Arjen; Martí, Marcelo A

    2016-01-01

    Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2 affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends.

  3. Evolutionary and Functional Relationships in the Truncated Hemoglobin Family

    PubMed Central

    Bustamante, Juan P.; Radusky, Leandro; Boechi, Leonardo; Estrin, Darío A.; ten Have, Arjen; Martí, Marcelo A.

    2016-01-01

    Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2 affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends. PMID:26788940

  4. Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data

    PubMed Central

    Nater, Alexander; Burri, Reto; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2015-01-01

    Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post-divergence gene flow. PMID:26187295

  5. Analyses of Evolutionary Characteristics of the Hemagglutinin-Esterase Gene of Influenza C Virus during a Period of 68 Years Reveals Evolutionary Patterns Different from Influenza A and B Viruses.

    PubMed

    Furuse, Yuki; Matsuzaki, Yoko; Nishimura, Hidekazu; Oshitani, Hitoshi

    2016-11-26

    Infections with the influenza C virus causing respiratory symptoms are common, particularly among children. Since isolation and detection of the virus are rarely performed, compared with influenza A and B viruses, the small number of available sequences of the virus makes it difficult to analyze its evolutionary dynamics. Recently, we reported the full genome sequence of 102 strains of the virus. Here, we exploited the data to elucidate the evolutionary characteristics and phylodynamics of the virus compared with influenza A and B viruses. Along with our data, we obtained public sequence data of the hemagglutinin-esterase gene of the virus; the dataset consists of 218 unique sequences of the virus collected from 14 countries between 1947 and 2014. Informatics analyses revealed that (1) multiple lineages have been circulating globally; (2) there have been weak and infrequent selective bottlenecks; (3) the evolutionary rate is low because of weak positive selection and a low capability to induce mutations; and (4) there is no significant positive selection although a few mutations affecting its antigenicity have been induced. The unique evolutionary dynamics of the influenza C virus must be shaped by multiple factors, including virological, immunological, and epidemiological characteristics.

  6. Analyses of Evolutionary Characteristics of the Hemagglutinin-Esterase Gene of Influenza C Virus during a Period of 68 Years Reveals Evolutionary Patterns Different from Influenza A and B Viruses

    PubMed Central

    Furuse, Yuki; Matsuzaki, Yoko; Nishimura, Hidekazu; Oshitani, Hitoshi

    2016-01-01

    Infections with the influenza C virus causing respiratory symptoms are common, particularly among children. Since isolation and detection of the virus are rarely performed, compared with influenza A and B viruses, the small number of available sequences of the virus makes it difficult to analyze its evolutionary dynamics. Recently, we reported the full genome sequence of 102 strains of the virus. Here, we exploited the data to elucidate the evolutionary characteristics and phylodynamics of the virus compared with influenza A and B viruses. Along with our data, we obtained public sequence data of the hemagglutinin-esterase gene of the virus; the dataset consists of 218 unique sequences of the virus collected from 14 countries between 1947 and 2014. Informatics analyses revealed that (1) multiple lineages have been circulating globally; (2) there have been weak and infrequent selective bottlenecks; (3) the evolutionary rate is low because of weak positive selection and a low capability to induce mutations; and (4) there is no significant positive selection although a few mutations affecting its antigenicity have been induced. The unique evolutionary dynamics of the influenza C virus must be shaped by multiple factors, including virological, immunological, and epidemiological characteristics. PMID:27898037

  7. A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences.

    PubMed

    Karaboga, D; Aslan, S

    2016-04-27

    The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.

  8. Phylogenetics.

    PubMed

    Sleator, Roy D

    2011-04-01

    The recent rapid expansion in the DNA and protein databases, arising from large-scale genomic and metagenomic sequence projects, has forced significant development in the field of phylogenetics: the study of the evolutionary relatedness of the planet's inhabitants. Advances in phylogenetic analysis have greatly transformed our view of the landscape of evolutionary biology, transcending the view of the tree of life that has shaped evolutionary theory since Darwinian times. Indeed, modern phylogenetic analysis no longer focuses on the restricted Darwinian-Mendelian model of vertical gene transfer, but must also consider the significant degree of lateral gene transfer, which connects and shapes almost all living things. Herein, I review the major tree-building methods, their strengths, weaknesses and future prospects.

  9. Evolution of Enzyme Superfamilies: Comprehensive Exploration of Sequence-Function Relationships.

    PubMed

    Baier, F; Copp, J N; Tokuriki, N

    2016-11-22

    The sequence and functional diversity of enzyme superfamilies have expanded through billions of years of evolution from a common ancestor. Understanding how protein sequence and functional "space" have expanded, at both the evolutionary and molecular level, is central to biochemistry, molecular biology, and evolutionary biology. Integrative approaches that examine protein sequence, structure, and function have begun to provide comprehensive views of the functional diversity and evolutionary relationships within enzyme superfamilies. In this review, we outline the recent advances in our understanding of enzyme evolution and superfamily functional diversity. We describe the tools that have been used to comprehensively analyze sequence relationships and to characterize sequence and function relationships. We also highlight recent large-scale experimental approaches that systematically determine the activity profiles across enzyme superfamilies. We identify several intriguing insights from this recent body of work. First, promiscuous activities are prevalent among extant enzymes. Second, many divergent proteins retain "function connectivity" via enzyme promiscuity, which can be used to probe the evolutionary potential and history of enzyme superfamilies. Finally, we discuss open questions regarding the intricacies of enzyme divergence, as well as potential research directions that will deepen our understanding of enzyme superfamily evolution.

  10. Evolutionary dynamics of retrotransposons assessed by high-throughput sequencing in wild relatives of wheat.

    PubMed

    Senerchia, Natacha; Wicker, Thomas; Felber, François; Parisod, Christian

    2013-01-01

    Transposable elements (TEs) represent a major fraction of plant genomes and drive their evolution. An improved understanding of genome evolution requires the dynamics of a large number of TE families to be considered. We put forward an approach bypassing the required step of a complete reference genome to assess the evolutionary trajectories of high copy number TE families from genome snapshot with high-throughput sequencing. Low coverage sequencing of the complex genomes of Aegilops cylindrica and Ae. geniculata using 454 identified more than 70% of the sequences as known TEs, mainly long terminal repeat (LTR) retrotransposons. Comparing the abundance of reads as well as patterns of sequence diversity and divergence within and among genomes assessed the dynamics of 44 major LTR retrotransposon families of the 165 identified. In particular, molecular population genetics on individual TE copies distinguished recently active from quiescent families and highlighted different evolutionary trajectories of retrotransposons among related species. This work presents a suite of tools suitable for current sequencing data, allowing to address the genome-wide evolutionary dynamics of TEs at the family level and advancing our understanding of the evolution of nonmodel genomes.

  11. A priori and a posteriori approaches for finding genes of evolutionary interest in non-model species: osmoregulatory genes in the kidney transcriptome of the desert rodent Dipodomys spectabilis (banner-tailed kangaroo rat).

    PubMed

    Marra, Nicholas J; Eo, Soo Hyung; Hale, Matthew C; Waser, Peter M; DeWoody, J Andrew

    2012-12-01

    One common goal in evolutionary biology is the identification of genes underlying adaptive traits of evolutionary interest. Recently next-generation sequencing techniques have greatly facilitated such evolutionary studies in species otherwise depauperate of genomic resources. Kangaroo rats (Dipodomys sp.) serve as exemplars of adaptation in that they inhabit extremely arid environments, yet require no drinking water because of ultra-efficient kidney function and osmoregulation. As a basis for identifying water conservation genes in kangaroo rats, we conducted a priori bioinformatics searches in model rodents (Mus musculus and Rattus norvegicus) to identify candidate genes with known or suspected osmoregulatory function. We then obtained 446,758 reads via 454 pyrosequencing to characterize genes expressed in the kidney of banner-tailed kangaroo rats (Dipodomys spectabilis). We also determined candidates a posteriori by identifying genes that were overexpressed in the kidney. The kangaroo rat sequences revealed nine different a priori candidate genes predicted from our Mus and Rattus searches, as well as 32 a posteriori candidate genes that were overexpressed in kidney. Mutations in two of these genes, Slc12a1 and Slc12a3, cause human renal diseases that result in the inability to concentrate urine. These genes are likely key determinants of physiological water conservation in desert rodents. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Open Reading Frame Phylogenetic Analysis on the Cloud

    PubMed Central

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  13. Draft genome of the medaka fish: a comprehensive resource for medaka developmental genetics and vertebrate evolutionary biology.

    PubMed

    Takeda, Hiroyuki

    2008-06-01

    The medaka Oryzias latipes is a small egg-laying freshwater teleost, and has become an excellent model system for developmental genetics and evolutionary biology. The medaka genome is relatively small in size, approximately 800 Mb, and the genome sequencing project was recently completed by Japanese research groups, providing a high-quality draft genome sequence of the inbred Hd-rR strain of medaka. In this review, I present an overview of the medaka genome project including genome resources, followed by specific findings obtained with the medaka draft genome. In particular, I focus on the analysis that was done by taking advantage of the medaka system, such as the sex chromosome differentiation and the regional history of medaka species using single nucleotide polymorphisms as genomic markers.

  14. Revising the Evolutionary Stage of HD 163899: The Effects of Convective Overshooting and Rotation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ostrowski, Jakub; Daszyńska-Daszkiewicz, Jadwiga; Cugier, Henryk, E-mail: ostrowski@astro.uni.wroc.pl

    We revise the evolutionary status of the B-type supergiant HD 163899 based on the new determinations of the mass–luminosity ratio, effective temperature, and rotational velocity, as well as on the interpretation of the oscillation spectrum of the star. The observed value of the nitrogen-to-carbon abundance fixes the value of the rotation rate of the star. Now, more massive models are strongly preferred than those previously considered, and it is very likely that the star is still in the main-sequence stage. The rotationally induced mixing manifests as the nitrogen overabundance in the atmosphere, which agrees with our analysis of the HARPSmore » spectra. Thus, HD 163899 probably belongs to a group of evolved nitrogen-rich main-sequence stars.« less

  15. Beryllium abundances along the evolutionary sequence of the open cluster IC 4651 - A new test for hydrodynamical stellar models

    NASA Astrophysics Data System (ADS)

    Smiljanic, R.; Pasquini, L.; Charbonnel, C.; Lagarde, N.

    2010-02-01

    Context. Previous analyses of lithium abundances in main sequence and red giant stars have revealed the action of mixing mechanisms other than convection in stellar interiors. Beryllium abundances in stars with Li abundance determinations can offer valuable complementary information on the nature of these mechanisms. Aims: Our aim is to derive Be abundances along the whole evolutionary sequence of an open cluster. We focus on the well-studied open cluster IC 4651. These Be abundances are used with previously determined Li abundances, in the same sample stars, to investigate the mixing mechanisms in a range of stellar masses and evolutionary stages. Methods: Atmospheric parameters were adopted from a previous abundance analysis by the same authors. New Be abundances have been determined from high-resolution, high signal-to-noise UVES spectra using spectrum synthesis and model atmospheres. The careful synthetic modeling of the Be lines region is used to calculate reliable abundances in rapidly rotating stars. The observed behavior of Be and Li is compared to theoretical predictions from stellar models including rotation-induced mixing, internal gravity waves, atomic diffusion, and thermohaline mixing. Results: Beryllium is detected in all the main sequence and turn-off sample stars, both slow- and fast-rotating stars, including the Li-dip stars, but is not detected in the red giants. Confirming previous results, we find that the Li dip is also a Be dip, although the depletion of Be is more modest than for Li in the corresponding effective temperature range. For post-main-sequence stars, the Be dilution starts earlier within the Hertzsprung gap than expected from classical predictions, as does the Li dilution. A clear dispersion in the Be abundances is also observed. Theoretical stellar models including the hydrodynamical transport processes mentioned above are able to reproduce all the observed features well. These results show a good theoretical understanding of the Li and Be behavior along the color-magnitude diagram of this intermediate-age cluster for stars more massive than 1.2 M⊙. Based on observations made with the ESO VLT, at Paranal Observatory, under programs 065.L-0427 and 067.D-0126.Current address: European Southern Observatory, Karl-Schwarzschild-Str. 2, 85748 Garching bei München, Germany.

  16. Sequencing and comparing whole mitochondrial genomes ofanimals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less

  17. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  18. Detecting and Analyzing Genetic Recombination Using RDP4.

    PubMed

    Martin, Darren P; Murrell, Ben; Khoosal, Arjun; Muhire, Brejnev

    2017-01-01

    Recombination between nucleotide sequences is a major process influencing the evolution of most species on Earth. The evolutionary value of recombination has been widely debated and so too has its influence on evolutionary analysis methods that assume nucleotide sequences replicate without recombining. When nucleic acids recombine, the evolution of the daughter or recombinant molecule cannot be accurately described by a single phylogeny. This simple fact can seriously undermine the accuracy of any phylogenetics-based analytical approach which assumes that the evolutionary history of a set of recombining sequences can be adequately described by a single phylogenetic tree. There are presently a large number of available methods and associated computer programs for analyzing and characterizing recombination in various classes of nucleotide sequence datasets. Here we examine the use of some of these methods to derive and test recombination hypotheses using multiple sequence alignments.

  19. Repertoire comparison of the B-cell receptor encoding loci in humans and rhesus macaques by next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Rhesus macaques are a widely used model system for the study of vaccines, infectious diseases, and microbial pathogenesis. Their value as a model lies in their close evolutionary relationship to humans, which, in theory, allows them to serve as a close approximation of the human immune system. Howev...

  20. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    PubMed

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-06

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  1. Mesa Isochrones and Stellar Tracks (MIST). I. Solar-scaled Models

    NASA Astrophysics Data System (ADS)

    Choi, Jieun; Dotter, Aaron; Conroy, Charlie; Cantiello, Matteo; Paxton, Bill; Johnson, Benjamin D.

    2016-06-01

    This is the first of a series of papers presenting the Modules for Experiments in Stellar Astrophysics (MESA) Isochrones and Stellar Tracks (MIST) project, a new comprehensive set of stellar evolutionary tracks and isochrones computed using MESA, a state-of-the-art open-source 1D stellar evolution package. In this work, we present models with solar-scaled abundance ratios covering a wide range of ages (5≤slant {log}({Age}) [{year}]≤slant 10.3), masses (0.1≤slant M/{M}⊙ ≤slant 300), and metallicities (-2.0≤slant [{{Z}}/{{H}}]≤slant 0.5). The models are self-consistently and continuously evolved from the pre-main sequence (PMS) to the end of hydrogen burning, the white dwarf cooling sequence, or the end of carbon burning, depending on the initial mass. We also provide a grid of models evolved from the PMS to the end of core helium burning for -4.0≤slant [{{Z}}/{{H}}]\\lt -2.0. We showcase extensive comparisons with observational constraints as well as with some of the most widely used existing models in the literature. The evolutionary tracks and isochrones can be downloaded from the project website at http://waps.cfa.harvard.edu/MIST/.

  2. Genetic distances and phylogenetic trees of different Awassi sheep populations based on DNA sequencing.

    PubMed

    Al-Atiyat, R M; Aljumaah, R S

    2014-08-27

    This study aimed to estimate evolutionary distances and to reconstruct phylogeny trees between different Awassi sheep populations. Thirty-two sheep individuals from three different geographical areas of Jordan and the Kingdom of Saudi Arabia (KSA) were randomly sampled. DNA was extracted from the tissue samples and sequenced using the T7 promoter universal primer. Different phylogenetic trees were reconstructed from 0.64-kb DNA sequences using the MEGA software with the best general time reverse distance model. Three methods of distance estimation were then used. The maximum composite likelihood test was considered for reconstructing maximum likelihood, neighbor-joining and UPGMA trees. The maximum likelihood tree indicated three major clusters separated by cytosine (C) and thymine (T). The greatest distance was shown between the South sheep and North sheep. On the other hand, the KSA sheep as an outgroup showed shorter evolutionary distance to the North sheep population than to the others. The neighbor-joining and UPGMA trees showed quite reliable clusters of evolutionary differentiation of Jordan sheep populations from the Saudi population. The overall results support geographical information and ecological types of the sheep populations studied. Summing up, the resulting phylogeny trees may contribute to the limited information about the genetic relatedness and phylogeny of Awassi sheep in nearby Arab countries.

  3. Evolutionary tree reconstruction

    NASA Technical Reports Server (NTRS)

    Cheeseman, Peter; Kanefsky, Bob

    1990-01-01

    It is described how Minimum Description Length (MDL) can be applied to the problem of DNA and protein evolutionary tree reconstruction. If there is a set of mutations that transform a common ancestor into a set of the known sequences, and this description is shorter than the information to encode the known sequences directly, then strong evidence for an evolutionary relationship has been found. A heuristic algorithm is described that searches for the simplest tree (smallest MDL) that finds close to optimal trees on the test data. Various ways of extending the MDL theory to more complex evolutionary relationships are discussed.

  4. The genome of the sea urchin Strongylocentrotus purpuratus.

    PubMed

    Sodergren, Erica; Weinstock, George M; Davidson, Eric H; Cameron, R Andrew; Gibbs, Richard A; Angerer, Robert C; Angerer, Lynne M; Arnone, Maria Ina; Burgess, David R; Burke, Robert D; Coffman, James A; Dean, Michael; Elphick, Maurice R; Ettensohn, Charles A; Foltz, Kathy R; Hamdoun, Amro; Hynes, Richard O; Klein, William H; Marzluff, William; McClay, David R; Morris, Robert L; Mushegian, Arcady; Rast, Jonathan P; Smith, L Courtney; Thorndyke, Michael C; Vacquier, Victor D; Wessel, Gary M; Wray, Greg; Zhang, Lan; Elsik, Christine G; Ermolaeva, Olga; Hlavina, Wratko; Hofmann, Gretchen; Kitts, Paul; Landrum, Melissa J; Mackey, Aaron J; Maglott, Donna; Panopoulou, Georgia; Poustka, Albert J; Pruitt, Kim; Sapojnikov, Victor; Song, Xingzhi; Souvorov, Alexandre; Solovyev, Victor; Wei, Zheng; Whittaker, Charles A; Worley, Kim; Durbin, K James; Shen, Yufeng; Fedrigo, Olivier; Garfield, David; Haygood, Ralph; Primus, Alexander; Satija, Rahul; Severson, Tonya; Gonzalez-Garay, Manuel L; Jackson, Andrew R; Milosavljevic, Aleksandar; Tong, Mark; Killian, Christopher E; Livingston, Brian T; Wilt, Fred H; Adams, Nikki; Bellé, Robert; Carbonneau, Seth; Cheung, Rocky; Cormier, Patrick; Cosson, Bertrand; Croce, Jenifer; Fernandez-Guerra, Antonio; Genevière, Anne-Marie; Goel, Manisha; Kelkar, Hemant; Morales, Julia; Mulner-Lorillon, Odile; Robertson, Anthony J; Goldstone, Jared V; Cole, Bryan; Epel, David; Gold, Bert; Hahn, Mark E; Howard-Ashby, Meredith; Scally, Mark; Stegeman, John J; Allgood, Erin L; Cool, Jonah; Judkins, Kyle M; McCafferty, Shawn S; Musante, Ashlan M; Obar, Robert A; Rawson, Amanda P; Rossetti, Blair J; Gibbons, Ian R; Hoffman, Matthew P; Leone, Andrew; Istrail, Sorin; Materna, Stefan C; Samanta, Manoj P; Stolc, Viktor; Tongprasit, Waraporn; Tu, Qiang; Bergeron, Karl-Frederik; Brandhorst, Bruce P; Whittle, James; Berney, Kevin; Bottjer, David J; Calestani, Cristina; Peterson, Kevin; Chow, Elly; Yuan, Qiu Autumn; Elhaik, Eran; Graur, Dan; Reese, Justin T; Bosdet, Ian; Heesun, Shin; Marra, Marco A; Schein, Jacqueline; Anderson, Michele K; Brockton, Virginia; Buckley, Katherine M; Cohen, Avis H; Fugmann, Sebastian D; Hibino, Taku; Loza-Coll, Mariano; Majeske, Audrey J; Messier, Cynthia; Nair, Sham V; Pancer, Zeev; Terwilliger, David P; Agca, Cavit; Arboleda, Enrique; Chen, Nansheng; Churcher, Allison M; Hallböök, F; Humphrey, Glen W; Idris, Mohammed M; Kiyama, Takae; Liang, Shuguang; Mellott, Dan; Mu, Xiuqian; Murray, Greg; Olinski, Robert P; Raible, Florian; Rowe, Matthew; Taylor, John S; Tessmar-Raible, Kristin; Wang, D; Wilson, Karen H; Yaguchi, Shunsuke; Gaasterland, Terry; Galindo, Blanca E; Gunaratne, Herath J; Juliano, Celina; Kinukawa, Masashi; Moy, Gary W; Neill, Anna T; Nomura, Mamoru; Raisch, Michael; Reade, Anna; Roux, Michelle M; Song, Jia L; Su, Yi-Hsien; Townley, Ian K; Voronina, Ekaterina; Wong, Julian L; Amore, Gabriele; Branno, Margherita; Brown, Euan R; Cavalieri, Vincenzo; Duboc, Véronique; Duloquin, Louise; Flytzanis, Constantin; Gache, Christian; Lapraz, François; Lepage, Thierry; Locascio, Annamaria; Martinez, Pedro; Matassi, Giorgio; Matranga, Valeria; Range, Ryan; Rizzo, Francesca; Röttinger, Eric; Beane, Wendy; Bradham, Cynthia; Byrum, Christine; Glenn, Tom; Hussain, Sofia; Manning, Gerard; Miranda, Esther; Thomason, Rebecca; Walton, Katherine; Wikramanayke, Athula; Wu, Shu-Yu; Xu, Ronghui; Brown, C Titus; Chen, Lili; Gray, Rachel F; Lee, Pei Yun; Nam, Jongmin; Oliveri, Paola; Smith, Joel; Muzny, Donna; Bell, Stephanie; Chacko, Joseph; Cree, Andrew; Curry, Stacey; Davis, Clay; Dinh, Huyen; Dugan-Rocha, Shannon; Fowler, Jerry; Gill, Rachel; Hamilton, Cerrissa; Hernandez, Judith; Hines, Sandra; Hume, Jennifer; Jackson, Laronda; Jolivet, Angela; Kovar, Christie; Lee, Sandra; Lewis, Lora; Miner, George; Morgan, Margaret; Nazareth, Lynne V; Okwuonu, Geoffrey; Parker, David; Pu, Ling-Ling; Thorn, Rachel; Wright, Rita

    2006-11-10

    We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.

  5. Evolutionary history of versatile-lipases from Agaricales through reconstruction of ancestral structures.

    PubMed

    Barriuso, Jorge; Martínez, María Jesús

    2017-01-03

    Fungal "Versatile carboxylic ester hydrolases" are enzymes with great biotechnological interest. Here we carried out a bioinformatic screening to find these proteins in genomes from Agaricales, by means of searching for conserved motifs, sequence and phylogenetic analysis, and three-dimensional modeling. Moreover, we reconstructed the molecular evolution of these enzymes along the time by inferring and analyzing the sequence of ancestral intermediate forms. The properties of the ancestral candidates are discussed on the basis of their three-dimensional structural models, the hydrophobicity of the lid, and the substrate binding intramolecular tunnel, revealing all of them featured properties of these enzymes. The evolutionary history of the putative lipases revealed an increase on the length and hydrophobicity of the lid region, as well as in the size of the substrate binding pocket, during evolution time. These facts suggest the enzymes' specialization towards certain substrates and their subsequent loss of promiscuity. These results bring to light the presence of different pools of lipases in fungi with different habitats and life styles. Despite the consistency of the data gathered from reconstruction of ancestral sequences, the heterologous expression of some of these candidates would be essential to corroborate enzymes' activities.

  6. Reconstructing Networks from Profit Sequences in Evolutionary Games via a Multiobjective Optimization Approach with Lasso Initialization

    PubMed Central

    Wu, Kai; Liu, Jing; Wang, Shuai

    2016-01-01

    Evolutionary games (EG) model a common type of interactions in various complex, networked, natural and social systems. Given such a system with only profit sequences being available, reconstructing the interacting structure of EG networks is fundamental to understand and control its collective dynamics. Existing approaches used to handle this problem, such as the lasso, a convex optimization method, need a user-defined constant to control the tradeoff between the natural sparsity of networks and measurement error (the difference between observed data and simulated data). However, a shortcoming of these approaches is that it is not easy to determine these key parameters which can maximize the performance. In contrast to these approaches, we first model the EG network reconstruction problem as a multiobjective optimization problem (MOP), and then develop a framework which involves multiobjective evolutionary algorithm (MOEA), followed by solution selection based on knee regions, termed as MOEANet, to solve this MOP. We also design an effective initialization operator based on the lasso for MOEA. We apply the proposed method to reconstruct various types of synthetic and real-world networks, and the results show that our approach is effective to avoid the above parameter selecting problem and can reconstruct EG networks with high accuracy. PMID:27886244

  7. Reconstructing Networks from Profit Sequences in Evolutionary Games via a Multiobjective Optimization Approach with Lasso Initialization

    NASA Astrophysics Data System (ADS)

    Wu, Kai; Liu, Jing; Wang, Shuai

    2016-11-01

    Evolutionary games (EG) model a common type of interactions in various complex, networked, natural and social systems. Given such a system with only profit sequences being available, reconstructing the interacting structure of EG networks is fundamental to understand and control its collective dynamics. Existing approaches used to handle this problem, such as the lasso, a convex optimization method, need a user-defined constant to control the tradeoff between the natural sparsity of networks and measurement error (the difference between observed data and simulated data). However, a shortcoming of these approaches is that it is not easy to determine these key parameters which can maximize the performance. In contrast to these approaches, we first model the EG network reconstruction problem as a multiobjective optimization problem (MOP), and then develop a framework which involves multiobjective evolutionary algorithm (MOEA), followed by solution selection based on knee regions, termed as MOEANet, to solve this MOP. We also design an effective initialization operator based on the lasso for MOEA. We apply the proposed method to reconstruct various types of synthetic and real-world networks, and the results show that our approach is effective to avoid the above parameter selecting problem and can reconstruct EG networks with high accuracy.

  8. Targeted sequencing for high-resolution evolutionary analyses following genome duplication in salmonid fish: Proof of concept for key components of the insulin-like growth factor axis.

    PubMed

    Lappin, Fiona M; Shaw, Rebecca L; Macqueen, Daniel J

    2016-12-01

    High-throughput sequencing has revolutionised comparative and evolutionary genome biology. It has now become relatively commonplace to generate multiple genomes and/or transcriptomes to characterize the evolution of large taxonomic groups of interest. Nevertheless, such efforts may be unsuited to some research questions or remain beyond the scope of some research groups. Here we show that targeted high-throughput sequencing offers a viable alternative to study genome evolution across a vertebrate family of great scientific interest. Specifically, we exploited sequence capture and Illumina sequencing to characterize the evolution of key components from the insulin-like growth (IGF) signalling axis of salmonid fish at unprecedented phylogenetic resolution. The IGF axis represents a central governor of vertebrate growth and its core components were expanded by whole genome duplication in the salmonid ancestor ~95Ma. Using RNA baits synthesised to genes encoding the complete family of IGF binding proteins (IGFBP) and an IGF hormone (IGF2), we captured, sequenced and assembled orthologous and paralogous exons from species representing all ten salmonid genera. This approach generated 299 novel sequences, most as complete or near-complete protein-coding sequences. Phylogenetic analyses confirmed congruent evolutionary histories for all nineteen recognized salmonid IGFBP family members and identified novel salmonid-specific IGF2 paralogues. Moreover, we reconstructed the evolution of duplicated IGF axis paralogues across a replete salmonid phylogeny, revealing complex historic selection regimes - both ancestral to salmonids and lineage-restricted - that frequently involved asymmetric paralogue divergence under positive and/or relaxed purifying selection. Our findings add to an emerging literature highlighting diverse applications for targeted sequencing in comparative-evolutionary genomics. We also set out a viable approach to obtain large sets of nuclear genes for any member of the salmonid family, which should enable insights into the evolutionary role of whole genome duplication before additional nuclear genome sequences become available. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Characterization of the rainbow trout transcriptome using Sanger and 454-Pyrosequencing approaches

    USDA-ARS?s Scientific Manuscript database

    BACKGROUND: Rainbow trout is an important fish species for aquaculture and a model species for research investigations associated with carcinogenesis, comparative immunology, toxicology and the evolutionary biology. However, to date there is no genome reference sequence to facilitate the development...

  10. Characterization of the rainbow trout transcriptome using Sanger and 454-pyrosequencing approaches

    USDA-ARS?s Scientific Manuscript database

    Background: Rainbow trout is an important fish for aquaculture and recreational fisheries and serves as a model species for research investigations associated with carcinogenesis, comparative immunology, toxicology and the evolutionary biology. However, to date there is no genome reference sequence...

  11. Genome sequencing reveals loci under artificial selection that underlie disease phenotypes in the laboratory rat.

    PubMed

    Atanur, Santosh S; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R; Kaisaki, Pamela J; Otto, Georg W; Ma, Man Chun John; Keane, Thomas M; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J

    2013-08-01

    Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Genome Sequencing Reveals Loci under Artificial Selection that Underlie Disease Phenotypes in the Laboratory Rat

    PubMed Central

    Atanur, Santosh S.; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R.; Kaisaki, Pamela J.; Otto, Georg W.; Ma, Man Chun John; Keane, Thomas M.; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R.; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J.; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J.

    2013-01-01

    Summary Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models. PaperClip PMID:23890820

  13. Bacteriophage P23-77 Capsid Protein Structures Reveal the Archetype of an Ancient Branch from a Major Virus Lineage

    PubMed Central

    Rissanen, Ilona; Grimes, Jonathan M.; Pawlowski, Alice; Mäntynen, Sari; Harlos, Karl; Bamford, Jaana K.H.; Stuart, David I.

    2013-01-01

    Summary It has proved difficult to classify viruses unless they are closely related since their rapid evolution hinders detection of remote evolutionary relationships in their genetic sequences. However, structure varies more slowly than sequence, allowing deeper evolutionary relationships to be detected. Bacteriophage P23-77 is an example of a newly identified viral lineage, with members inhabiting extreme environments. We have solved multiple crystal structures of the major capsid proteins VP16 and VP17 of bacteriophage P23-77. They fit the 14 Å resolution cryo-electron microscopy reconstruction of the entire virus exquisitely well, allowing us to propose a model for both the capsid architecture and viral assembly, quite different from previously published models. The structures of the capsid proteins and their mode of association to form the viral capsid suggest that the P23-77-like and adeno-PRD1 lineages of viruses share an extremely ancient common ancestor. PMID:23623731

  14. Quantitative analysis of RNA-protein interactions on a massively parallel array for mapping biophysical and evolutionary landscapes

    PubMed Central

    Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.

    2015-01-01

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714

  15. Phylogenetic estimates of diversification rate are affected by molecular rate variation.

    PubMed

    Duchêne, D A; Hua, X; Bromham, L

    2017-10-01

    Molecular phylogenies are increasingly being used to investigate the patterns and mechanisms of macroevolution. In particular, node heights in a phylogeny can be used to detect changes in rates of diversification over time. Such analyses rest on the assumption that node heights in a phylogeny represent the timing of diversification events, which in turn rests on the assumption that evolutionary time can be accurately predicted from DNA sequence divergence. But there are many influences on the rate of molecular evolution, which might also influence node heights in molecular phylogenies, and thus affect estimates of diversification rate. In particular, a growing number of studies have revealed an association between the net diversification rate estimated from phylogenies and the rate of molecular evolution. Such an association might, by influencing the relative position of node heights, systematically bias estimates of diversification time. We simulated the evolution of DNA sequences under several scenarios where rates of diversification and molecular evolution vary through time, including models where diversification and molecular evolutionary rates are linked. We show that commonly used methods, including metric-based, likelihood and Bayesian approaches, can have a low power to identify changes in diversification rate when molecular substitution rates vary. Furthermore, the association between the rates of speciation and molecular evolution rate can cause the signature of a slowdown or speedup in speciation rates to be lost or misidentified. These results suggest that the multiple sources of variation in molecular evolutionary rates need to be considered when inferring macroevolutionary processes from phylogenies. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.

  16. Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants

    PubMed Central

    Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.

    2015-01-01

    Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016

  17. Modeling Evolution on Nearly Neutral Network Fitness Landscapes

    NASA Astrophysics Data System (ADS)

    Yakushkina, Tatiana; Saakian, David B.

    2017-08-01

    To describe virus evolution, it is necessary to define a fitness landscape. In this article, we consider the microscopic models with the advanced version of neutral network fitness landscapes. In this problem setting, we suppose a fitness difference between one-point mutation neighbors to be small. We construct a modification of the Wright-Fisher model, which is related to ordinary infinite population models with nearly neutral network fitness landscape at the large population limit. From the microscopic models in the realistic sequence space, we derive two versions of nearly neutral network models: with sinks and without sinks. We claim that the suggested model describes the evolutionary dynamics of RNA viruses better than the traditional Wright-Fisher model with few sequences.

  18. Discovery of magnetic A supergiants: the descendants of magnetic main-sequence B stars

    NASA Astrophysics Data System (ADS)

    Neiner, Coralie; Oksala, Mary E.; Georgy, Cyril; Przybilla, Norbert; Mathis, Stéphane; Wade, Gregg; Kondrak, Matthias; Fossati, Luca; Blazère, Aurore; Buysschaert, Bram; Grunhut, Jason

    2017-10-01

    In the context of the high resolution, high signal-to-noise ratio, high sensitivity, spectropolarimetric survey BritePol, which complements observations by the BRITE constellation of nanosatellites for asteroseismology, we are looking for and measuring the magnetic field of all stars brighter than V = 4. In this paper, we present circularly polarized spectra obtained with HarpsPol at ESO in La Silla (Chile) and ESPaDOnS at CFHT (Hawaii) for three hot evolved stars: ι Car, HR 3890 and ɛ CMa. We detected a magnetic field in all three stars. Each star has been observed several times to confirm the magnetic detections and check for variability. The stellar parameters of the three objects were determined and their evolutionary status was ascertained employing evolution models computed with the Geneva code. ɛ CMa was already known and is confirmed to be magnetic, but our modelling indicates that it is located near the end of the main sequence, I.e. it is still in a core hydrogen burning phase. ι Car and HR 3890 are the first discoveries of magnetic hot supergiants located well after the end of the main sequence on the Hertzsprung-Russell diagram. These stars are probably the descendants of main-sequence magnetic massive stars. Their current field strength (a few G) is compatible with magnetic flux conservation during stellar evolution. These results provide observational constraints for the development of future evolutionary models of hot stars including a fossil magnetic field.

  19. Accurate quantification of within- and between-host HBV evolutionary rates requires explicit transmission chain modelling.

    PubMed

    Vrancken, Bram; Suchard, Marc A; Lemey, Philippe

    2017-07-01

    Analyses of virus evolution in known transmission chains have the potential to elucidate the impact of transmission dynamics on the viral evolutionary rate and its difference within and between hosts. Lin et al. (2015, Journal of Virology , 89/7: 3512-22) recently investigated the evolutionary history of hepatitis B virus in a transmission chain and postulated that the 'colonization-adaptation-transmission' model can explain the differential impact of transmission on synonymous and non-synonymous substitution rates. Here, we revisit this dataset using a full probabilistic Bayesian phylogenetic framework that adequately accounts for the non-independence of sequence data when estimating evolutionary parameters. Examination of the transmission chain data under a flexible coalescent prior reveals a general inconsistency between the estimated timings and clustering patterns and the known transmission history, highlighting the need to incorporate host transmission information in the analysis. Using an explicit genealogical transmission chain model, we find strong support for a transmission-associated decrease of the overall evolutionary rate. However, in contrast to the initially reported larger transmission effect on non-synonymous substitution rate, we find a similar decrease in both non-synonymous and synonymous substitution rates that cannot be adequately explained by the colonization-adaptation-transmission model. An alternative explanation may involve a transmission/establishment advantage of hepatitis B virus variants that have accumulated fewer within-host substitutions, perhaps by spending more time in the covalently closed circular DNA state between each round of viral replication. More generally, this study illustrates that ignoring phylogenetic relationships can lead to misleading evolutionary estimates.

  20. Evolution of heliobacteria: implications for photosynthetic reaction center complexes

    NASA Technical Reports Server (NTRS)

    Vermaas, W. F.; Blankenship, R. E. (Principal Investigator)

    1994-01-01

    The evolutionary position of the heliobacteria, a group of green photosynthetic bacteria with a photosynthetic apparatus functionally resembling Photosystem I of plants and cyanobacteria, has been investigated with respect to the evolutionary relationship to Gram-positive bacteria and cyanobacteria. On the basis of 16S rRNA sequence analysis, the heliobacteria appear to be most closely related to Gram-positive bacteria, but also an evolutionary link to cyanobacteria is evident. Interestingly, a 46-residue domain including the putative sixth membrane-spanning region of the heliobacterial reaction center protein show rather strong similarity (33% identity and 72% similarity) to a region including the sixth membrane-spanning region of the CP47 protein, a chlorophyll-binding core antenna polypeptide of Photosystem II. The N-terminal half of the heliobacterial reaction center polypeptide shows a moderate sequence similarity (22% identity over 232 residues) with the CP47 protein, which is significantly more than the similarity with the Photosystem I core polypeptides in this region. An evolutionary model for photosynthetic reaction center complexes is discussed, in which an ancestral homodimeric reaction center protein (possibly resembling the heliobacterial reaction center protein) with 11 membrane-spanning regions per polypeptide has diverged to give rise to the core of Photosystem I, Photosystem II, and of the photosynthetic apparatus in green, purple, and heliobacteria.

  1. A reassessment of the evolutionary timescale of bat rabies viruses based upon glycoprotein gene sequences.

    PubMed

    Kuzmina, Natalia A; Kuzmin, Ivan V; Ellison, James A; Taylor, Steven T; Bergman, David L; Dew, Beverly; Rupprecht, Charles E

    2013-10-01

    Rabies, an acute progressive encephalomyelitis caused by viruses in the genus Lyssavirus, is one of the oldest known infectious diseases. Although dogs and other carnivores represent the greatest threat to public health as rabies reservoirs, it is commonly accepted that bats are the primary evolutionary hosts of lyssaviruses. Despite early historical documentation of rabies, molecular clock analyses indicate a quite young age of lyssaviruses, which is confusing. For example, the results obtained for partial and complete nucleoprotein gene sequences of rabies viruses (RABV), or for a limited number of glycoprotein gene sequences, indicated that the time of the most recent common ancestor (TMRCA) for current bat RABV diversity in the Americas lies in the seventeenth to eighteenth centuries and might be directly or indirectly associated with the European colonization. Conversely, several other reports demonstrated high genetic similarity between lyssavirus isolates, including RABV, obtained within a time interval of 25-50 years. In the present study, we attempted to re-estimate the age of several North American bat RABV lineages based on the largest set of complete and partial glycoprotein gene sequences compiled to date (n = 201) employing a codon substitution model. Although our results overlap with previous estimates in marginal areas of the 95 % high probability density (HPD), they suggest a longer evolutionary history of American bat RABV lineages (TMRCA at least 732 years, with a 95 % HPD 436-1107 years).

  2. Purifying Selection Maintains Dosage-Sensitive Genes during Degeneration of the Threespine Stickleback Y Chromosome

    PubMed Central

    White, Michael A.; Kitano, Jun; Peichel, Catherine L.

    2015-01-01

    Sex chromosomes are subject to unique evolutionary forces that cause suppression of recombination, leading to sequence degeneration and the formation of heteromorphic chromosome pairs (i.e., XY or ZW). Although progress has been made in characterizing the outcomes of these evolutionary processes on vertebrate sex chromosomes, it is still unclear how recombination suppression and sequence divergence typically occur and how gene dosage imbalances are resolved in the heterogametic sex. The threespine stickleback fish (Gasterosteus aculeatus) is a powerful model system to explore vertebrate sex chromosome evolution, as it possesses an XY sex chromosome pair at relatively early stages of differentiation. Using a combination of whole-genome and transcriptome sequencing, we characterized sequence evolution and gene expression across the sex chromosomes. We uncovered two distinct evolutionary strata that correspond with known structural rearrangements on the Y chromosome. In the oldest stratum, only a handful of genes remain, and these genes are under strong purifying selection. By comparing sex-linked gene expression with expression of autosomal orthologs in an outgroup, we show that dosage compensation has not evolved in threespine sticklebacks through upregulation of the X chromosome in males. Instead, in the oldest stratum, the genes that still possess a Y chromosome allele are enriched for genes predicted to be dosage sensitive in mammals and yeast. Our results suggest that dosage imbalances may have been avoided at haploinsufficient genes by retaining function of the Y chromosome allele through strong purifying selection. PMID:25818858

  3. Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily

    PubMed Central

    Akiva, Eyal; Copp, Janine N.; Tokuriki, Nobuhiko; Babbitt, Patricia C.

    2017-01-01

    Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold. PMID:29078300

  4. Ecological and evolutionary genomics of marine photosynthetic organisms.

    PubMed

    Coelho, Susana M; Simon, Nathalie; Ahmed, Sophia; Cock, J Mark; Partensky, Frédéric

    2013-02-01

    Environmental (ecological) genomics aims to understand the genetic basis of relationships between organisms and their abiotic and biotic environments. It is a rapidly progressing field of research largely due to recent advances in the speed and volume of genomic data being produced by next generation sequencing (NGS) technologies. Building on information generated by NGS-based approaches, functional genomic methodologies are being applied to identify and characterize genes and gene systems of both environmental and evolutionary relevance. Marine photosynthetic organisms (MPOs) were poorly represented amongst the early genomic models, but this situation is changing rapidly. Here we provide an overview of the recent advances in the application of ecological genomic approaches to both prokaryotic and eukaryotic MPOs. We describe how these approaches are being used to explore the biology and ecology of marine cyanobacteria and algae, particularly with regard to their functions in a broad range of marine ecosystems. Specifically, we review the ecological and evolutionary insights gained from whole genome and transcriptome sequencing projects applied to MPOs and illustrate how their genomes are yielding information on the specific features of these organisms. © 2012 Blackwell Publishing Ltd.

  5. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  6. Rapid Multi-Locus Sequence Typing Using Microfluidic Biochips

    DTIC Science & Technology

    2010-05-12

    Sequence Types. The evolutionary history of all the B. cereus MLST concatenated Sequence Types (545 taxa, 2,394 nucleotide positions) was inferred using...the Neighbor-Joining method [28]. The bootstrap consensus tree inferred from 100 replicates was taken to represent the evolutionary history of the... Chlamydia (manuscript in preparation) and performed pilot studies on Staphylococcus aureus and Streptoccus pneumoniae (Data S4 and Text S2). Another potential

  7. Comparative modeling without implicit sequence alignments.

    PubMed

    Kolinski, Andrzej; Gront, Dominik

    2007-10-01

    The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.

  8. Evolutionary modes of emergence of short interspersed nuclear element (SINE) families in grasses.

    PubMed

    Kögler, Anja; Schmidt, Thomas; Wenke, Torsten

    2017-11-01

    Short interspersed nuclear elements (SINEs) are non-autonomous transposable elements which are propagated by retrotransposition and constitute an inherent part of the genome of most eukaryotic species. Knowledge of heterogeneous and highly abundant SINEs is crucial for de novo (or improvement of) annotation of whole genome sequences. We scanned Poaceae genome sequences of six important cereals (Oryza sativa, Triticum aestivum, Hordeum vulgare, Panicum virgatum, Sorghum bicolor, Zea mays) and Brachypodium distachyon to examine the diversity and evolution of SINE populations. We comparatively analyzed the structural features, distribution, evolutionary relation and abundance of 32 SINE families and subfamilies within grasses, comprising 11 052 individual copies. The investigation of activity profiles within the Poaceae provides insights into their species-specific diversification and amplification. We found that Poaceae SINEs (PoaS) fall into two length categories: simple SINEs of up to 180 bp and dimeric SINEs larger than 240 bp. Detailed analysis at the nucleotide level revealed that multimerization of related and unrelated SINE copies is an important evolutionary mechanism of SINE formation. We conclude that PoaS families diversify by massive reshuffling between SINE families, likely caused by insertion of truncated copies, and provide a model for this evolutionary scenario. Twenty-eight of 32 PoaS families and subfamilies show significant conservation, in particular either in the 5' or 3' regions, across Poaceae species and share large sequence stretches with one or more other PoaS families. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  9. Evolutionary divergence of chloroplast FAD synthetase proteins

    PubMed Central

    2010-01-01

    Background Flavin adenine dinucleotide synthetases (FADSs) - a group of bifunctional enzymes that carry out the dual functions of riboflavin phosphorylation to produce flavin mononucleotide (FMN) and its subsequent adenylation to generate FAD in most prokaryotes - were studied in plants in terms of sequence, structure and evolutionary history. Results Using a variety of bioinformatics methods we have found that FADS enzymes localized to the chloroplasts, which we term as plant-like FADS proteins, are distributed across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. The C-terminal module of these enzymes does not contain the typical riboflavin kinase active site sequence, while the N-terminal module is broadly conserved. These results agree with a previous work reported by Sandoval et al. in 2008. Furthermore, our observations and preliminary experimental results indicate that the C-terminus of plant-like FADS proteins may contain a catalytic activity, but different to that of their prokaryotic counterparts. In fact, homology models predict that plant-specific conserved residues constitute a distinct active site in the C-terminus. Conclusions A structure-based sequence alignment and an in-depth evolutionary survey of FADS proteins, thought to be crucial in plant metabolism, are reported, which will be essential for the correct annotation of plant genomes and further structural and functional studies. This work is a contribution to our understanding of the evolutionary history of plant-like FADS enzymes, which constitute a new family of FADS proteins whose C-terminal module might be involved in a distinct catalytic activity. PMID:20955574

  10. A continuum model for damage evolution in laminated composites

    NASA Technical Reports Server (NTRS)

    Lo, D. C.; Allen, D. H.; Harris, C. E.

    1991-01-01

    The accumulation of matrix cracking is examined using continuum damage mechanics lamination theory. A phenomenologically based damage evolutionary relationship is proposed for matrix cracking in continuous fiber reinforced laminated composites. The use of material dependent properties and damage dependent laminate averaged ply stresses in this evolutionary relationship permits its application independently of the laminate stacking sequence. Several load histories are applied to crossply laminates using this model, and the results are compared to published experimental data. The stress redistribution among the plies during the accumulation of matrix damage is also examined. It is concluded that characteristics of the stress redistribution process could assist in the analysis of the progressive failure process in laminated composites.

  11. The tangled bank of amino acids

    PubMed Central

    Pollock, David D.

    2016-01-01

    Abstract The use of amino acid substitution matrices to model protein evolution has yielded important insights into both the evolutionary process and the properties of specific protein families. In order to make these models tractable, standard substitution matrices represent the average results of the evolutionary process rather than the underlying molecular biophysics and population genetics, treating proteins as a set of independently evolving sites rather than as an integrated biomolecular entity. With advances in computing and the increasing availability of sequence data, we now have an opportunity to move beyond current substitution matrices to more interpretable mechanistic models with greater fidelity to the evolutionary process of mutation and selection and the holistic nature of the selective constraints. As part of this endeavour, we consider how epistatic interactions induce spatial and temporal rate heterogeneity, and demonstrate how these generally ignored factors can reconcile standard substitution rate matrices and the underlying biology, allowing us to better understand the meaning of these substitution rates. Using computational simulations of protein evolution, we can demonstrate the importance of both spatial and temporal heterogeneity in modelling protein evolution. PMID:27028523

  12. SCARF: maximizing next-generation EST assemblies for evolutionary and population genomic analyses.

    PubMed

    Barker, Michael S; Dlugosch, Katrina M; Reddy, A Chaitanya C; Amyotte, Sarah N; Rieseberg, Loren H

    2009-02-15

    Scaffolded and Corrected Assembly of Roche 454 (SCARF) is a next-generation sequence assembly tool for evolutionary genomics that is designed especially for assembling 454 EST sequences against high-quality reference sequences from related species. The program was created to knit together 454 contigs that do not assemble during traditional de novo assembly, using a reference sequence library to orient the 454 sequences. SCARF is freely available at http://msbarker.com/software.htm, and is released under the open source GPLv3 license (http://www.opensource.org/licenses/gpl-3.0.html.

  13. How Good Are Statistical Models at Approximating Complex Fitness Landscapes?

    PubMed Central

    du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian

    2016-01-01

    Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564

  14. Evolutionary interrogation of human biology in well-annotated genomic framework of rhesus macaque.

    PubMed

    Zhang, Shi-Jian; Liu, Chu-Jun; Yu, Peng; Zhong, Xiaoming; Chen, Jia-Yu; Yang, Xinzhuang; Peng, Jiguang; Yan, Shouyu; Wang, Chenqu; Zhu, Xiaotong; Xiong, Jingwei; Zhang, Yong E; Tan, Bertrand Chin-Ming; Li, Chuan-Yun

    2014-05-01

    With genome sequence and composition highly analogous to human, rhesus macaque represents a unique reference for evolutionary studies of human biology. Here, we developed a comprehensive genomic framework of rhesus macaque, the RhesusBase2, for evolutionary interrogation of human genes and the associated regulations. A total of 1,667 next-generation sequencing (NGS) data sets were processed, integrated, and evaluated, generating 51.2 million new functional annotation records. With extensive NGS annotations, RhesusBase2 refined the fine-scale structures in 30% of the macaque Ensembl transcripts, reporting an accurate, up-to-date set of macaque gene models. On the basis of these annotations and accurate macaque gene models, we further developed an NGS-oriented Molecular Evolution Gateway to access and visualize macaque annotations in reference to human orthologous genes and associated regulations (www.rhesusbase.org/molEvo). We highlighted the application of this well-annotated genomic framework in generating hypothetical link of human-biased regulations to human-specific traits, by using mechanistic characterization of the DIEXF gene as an example that provides novel clues to the understanding of digestive system reduction in human evolution. On a global scale, we also identified a catalog of 9,295 human-biased regulatory events, which may represent novel elements that have a substantial impact on shaping human transcriptome and possibly underpin recent human phenotypic evolution. Taken together, we provide an NGS data-driven, information-rich framework that will broadly benefit genomics research in general and serves as an important resource for in-depth evolutionary studies of human biology.

  15. Evolutionary plasticity of insect immunity.

    PubMed

    Vilcinskas, Andreas

    2013-02-01

    Many insect genomes have been sequenced and the innate immune responses of several species have been studied by transcriptomics, inviting the comparative analysis of immunity-related genes. Such studies have demonstrated significant evolutionary plasticity, with the emergence of novel proteins and protein domains correlated with insects adapting to both abiotic and biotic environmental stresses. This review article focuses on effector molecules such as antimicrobial peptides (AMPs) and proteinase inhibitors, which display greater evolutionary dynamism than conserved components such as immunity-related signaling molecules. There is increasing evidence to support an extended role for insect AMPs beyond defense against pathogens, including the management of beneficial endosymbionts. The total number of AMPs varies among insects with completed genome sequences, providing intriguing examples of immunity gene expansion and loss. This plasticity is discussed in the context of recent developments in evolutionary ecology suggesting that the maintenance and deployment of immune responses reallocates resources from other fitness-related traits thus requiring fitness trade-offs. Based on our recent studies using both model and non-model insects, I propose that insect immunity genes can be lost when alternative defense strategies with a lower fitness penalty have evolved, such as the so-called social immunity in bees, the chemical sanitation of the microenvironment by some beetles, and the release of antimicrobial secondary metabolites in the hemolymph. Conversely, recent studies provide evidence for the expansion and functional diversification of insect AMPs and proteinase inhibitors to reflect coevolution with a changing pathosphere and/or adaptations to habitats or food associated with microbial contamination. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. InterEvDock: a docking server to predict the structure of protein–protein interactions using evolutionary information

    PubMed Central

    Yu, Jinchao; Vavrusa, Marek; Andreani, Jessica; Rey, Julien; Tufféry, Pierre; Guerois, Raphaël

    2016-01-01

    The structural modeling of protein–protein interactions is key in understanding how cell machineries cross-talk with each other. Molecular docking simulations provide efficient means to explore how two unbound protein structures interact. InterEvDock is a server for protein docking based on a free rigid-body docking strategy. A systematic rigid-body docking search is performed using the FRODOCK program and the resulting models are re-scored with InterEvScore and SOAP-PP statistical potentials. The InterEvScore potential was specifically designed to integrate co-evolutionary information in the docking process. InterEvDock server is thus particularly well suited in case homologous sequences are available for both binding partners. The server returns 10 structures of the most likely consensus models together with 10 predicted residues most likely involved in the interface. In 91% of all complexes tested in the benchmark, at least one residue out of the 10 predicted is involved in the interface, providing useful guidelines for mutagenesis. InterEvDock is able to identify a correct model among the top10 models for 49% of the rigid-body cases with evolutionary information, making it a unique and efficient tool to explore structural interactomes under an evolutionary perspective. The InterEvDock web interface is available at http://bioserv.rpbs.univ-paris-diderot.fr/services/InterEvDock/. PMID:27131368

  17. Testing Models of Stellar Structure and Evolution I. Comparison with Detached Eclipsing Binaries

    NASA Astrophysics Data System (ADS)

    del Burgo, C.; Allende Prieto, C.

    2018-05-01

    We present the results of an analysis aimed at testing the accuracy and precision of the PARSEC v1.2S library of stellar evolution models, combined with a Bayesian approach, to infer stellar parameters. We mainly employ the online DEBCat catalogue by Southworth, a compilation of detached eclipsing binary systems with published measurements of masses and radii to ˜ 2 per cent precision. We select a sample of 318 binary components, with masses between 0.10 and 14.5 solar units, and distances between 1.3 pc and ˜ 8 kpc for Galactic objects and ˜ 44-68 kpc for the extragalactic ones. The Bayesian analysis applied takes on input effective temperature, radius, and [Fe/H], and their uncertainties, returning theoretical predictions for other stellar parameters. From the comparison with dynamical masses, we conclude inferred masses are precisely derived for stars on the main-sequence and in the core-helium-burning phase, with respective uncertainties of 4 per cent and 7 per cent, on average. Subgiants and red giants masses are predicted within 14 per cent, and early asymptotic giant branch stars within 24 per cent. These results are helpful to further improve the models, in particular for advanced evolutionary stages for which our understanding is limited. We obtain distances and ages for the binary systems and compare them, whenever possible, with precise literature estimates, finding excellent agreement. We discuss evolutionary effects and the challenges associated with the inference of stellar ages from evolutionary models. We also provide useful polynomial fittings to theoretical zero-age main-sequence relations.

  18. The Genome of the Sea Urchin Strongylocentrotus purpuratus

    PubMed Central

    2011-01-01

    We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes. PMID:17095691

  19. Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics

    PubMed Central

    Kolaczkowski, Bryan; Thornton, Joseph W.

    2009-01-01

    Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis. PMID:20011052

  20. Long-branch attraction bias and inconsistency in Bayesian phylogenetics.

    PubMed

    Kolaczkowski, Bryan; Thornton, Joseph W

    2009-12-09

    Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias--which is apparent under both controlled simulation conditions and in analyses of empirical sequence data--also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages--that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.

  1. Protein interface classification by evolutionary analysis

    PubMed Central

    2012-01-01

    Background Distinguishing biologically relevant interfaces from lattice contacts in protein crystals is a fundamental problem in structural biology. Despite efforts towards the computational prediction of interface character, many issues are still unresolved. Results We present here a protein-protein interface classifier that relies on evolutionary data to detect the biological character of interfaces. The classifier uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. Both aim at detecting differential selection pressure between interface core and rim or rest of surface. The core residues, defined as fully buried residues (>95% burial), appear to be fundamental determinants of biological interfaces: their number is in itself a powerful discriminator of interface character and together with the evolutionary measures it is able to clearly distinguish evolved biological contacts from crystal ones. We demonstrate that this definition of core residues leads to distinctively better results than earlier definitions from the literature. The stringent selection and quality filtering of structural and sequence data was key to the success of the method. Most importantly we demonstrate that a more conservative selection of homolog sequences - with relatively high sequence identities to the query - is able to produce a clearer signal than previous attempts. Conclusions An evolutionary approach like the one presented here is key to the advancement of the field, which so far was missing an effective method exploiting the evolutionary character of protein interfaces. Its coverage and performance will only improve over time thanks to the incessant growth of sequence databases. Currently our method reaches an accuracy of 89% in classifying interfaces of the Ponstingl 2003 datasets and it lends itself to a variety of useful applications in structural biology and bioinformatics. We made the corresponding software implementation available to the community as an easy-to-use graphical web interface at http://www.eppic-web.org. PMID:23259833

  2. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involvedmore » in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.« less

  3. Classification and Lineage Tracing of SH2 Domains Throughout Eukaryotes.

    PubMed

    Liu, Bernard A

    2017-01-01

    Today there exists a rapidly expanding number of sequenced genomes. Cataloging protein interaction domains such as the Src Homology 2 (SH2) domain across these various genomes can be accomplished with ease due to existing algorithms and predictions models. An evolutionary analysis of SH2 domains provides a step towards understanding how SH2 proteins integrated with existing signaling networks to position phosphotyrosine signaling as a crucial driver of robust cellular communication networks in metazoans. However organizing and tracing SH2 domain across organisms and understanding their evolutionary trajectory remains a challenge. This chapter describes several methodologies towards analyzing the evolutionary trajectory of SH2 domains including a global SH2 domain classification system, which facilitates annotation of new SH2 sequences essential for tracing the lineage of SH2 domains throughout eukaryote evolution. This classification utilizes a combination of sequence homology, protein domain architecture and the boundary positions between introns and exons within the SH2 domain or genes encoding these domains. Discrete SH2 families can then be traced across various genomes to provide insight into its origins. Furthermore, additional methods for examining potential mechanisms for divergence of SH2 domains from structural changes to alterations in the protein domain content and genome duplication will be discussed. Therefore a better understanding of SH2 domain evolution may enhance our insight into the emergence of phosphotyrosine signaling and the expansion of protein interaction domains.

  4. Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica

    PubMed Central

    2011-01-01

    Background Angiosperm mitochondrial genomes are more complex than those of other organisms. Analyses of the mitochondrial genome sequences of at least 11 angiosperm species have showed several common properties; these cannot easily explain, however, how the diverse mitotypes evolved within each genus or species. We analyzed the evolutionary relationships of Brassica mitotypes by sequencing. Results We sequenced the mitotypes of cam (Brassica rapa), ole (B. oleracea), jun (B. juncea), and car (B. carinata) and analyzed them together with two previously sequenced mitotypes of B. napus (pol and nap). The sizes of whole single circular genomes of cam, jun, ole, and car are 219,747 bp, 219,766 bp, 360,271 bp, and 232,241 bp, respectively. The mitochondrial genome of ole is largest as a resulting of the duplication of a 141.8 kb segment. The jun mitotype is the result of an inherited cam mitotype, and pol is also derived from the cam mitotype with evolutionary modifications. Genes with known functions are conserved in all mitotypes, but clear variation in open reading frames (ORFs) with unknown functions among the six mitotypes was observed. Sequence relationship analysis showed that there has been genome compaction and inheritance in the course of Brassica mitotype evolution. Conclusions We have sequenced four Brassica mitotypes, compared six Brassica mitotypes and suggested a mechanism for mitochondrial genome formation in Brassica, including evolutionary events such as inheritance, duplication, rearrangement, genome compaction, and mutation. PMID:21988783

  5. Deciphering mRNA Sequence Determinants of Protein Production Rate

    NASA Astrophysics Data System (ADS)

    Szavits-Nossan, Juraj; Ciandrini, Luca; Romano, M. Carmen

    2018-03-01

    One of the greatest challenges in biophysical models of translation is to identify coding sequence features that affect the rate of translation and therefore the overall protein production in the cell. We propose an analytic method to solve a translation model based on the inhomogeneous totally asymmetric simple exclusion process, which allows us to unveil simple design principles of nucleotide sequences determining protein production rates. Our solution shows an excellent agreement when compared to numerical genome-wide simulations of S. cerevisiae transcript sequences and predicts that the first 10 codons, which is the ribosome footprint length on the mRNA, together with the value of the initiation rate, are the main determinants of protein production rate under physiological conditions. Finally, we interpret the obtained analytic results based on the evolutionary role of the codons' choice for regulating translation rates and ribosome densities.

  6. Evolutionary paths of streptococcal and staphylococcal superantigens

    PubMed Central

    2012-01-01

    Background Streptococcus pyogenes (GAS) harbors several superantigens (SAgs) in the prophage region of its genome, although speG and smez are not located in this region. The diversity of SAgs is thought to arise during horizontal transfer, but their evolutionary pathways have not yet been determined. We recently completed sequencing the entire genome of S. dysgalactiae subsp. equisimilis (SDSE), the closest relative of GAS. Although speG is the only SAg gene of SDSE, speG was present in only 50% of clinical SDSE strains and smez in none. In this study, we analyzed the evolutionary paths of streptococcal and staphylococcal SAgs. Results We compared the sequences of the 12–60 kb speG regions of nine SDSE strains, five speG+ and four speG–. We found that the synteny of this region was highly conserved, whether or not the speG gene was present. Synteny analyses based on genome-wide comparisons of GAS and SDSE indicated that speG is the direct descendant of a common ancestor of streptococcal SAgs, whereas smez was deleted from SDSE after SDSE and GAS split from a common ancestor. Cumulative nucleotide skew analysis of SDSE genomes suggested that speG was located outside segments of steeper slopes than the stable region in the genome, whereas the region flanking smez was unstable, as expected from the results of GAS. We also detected a previously undescribed staphylococcal SAg gene, selW, and a staphylococcal SAg -like gene, ssl, in the core genomes of all Staphylococcus aureus strains sequenced. Amino acid substitution analyses, based on dN/dS window analysis of the products encoded by speG, selW and ssl suggested that all three genes have been subjected to strong positive selection. Evolutionary analysis based on the Bayesian Markov chain Monte Carlo method showed that each clade included at least one direct descendant. Conclusions Our findings reveal a plausible model for the comprehensive evolutionary pathway of streptococcal and staphylococcal SAgs. PMID:22900646

  7. Protein Structure Prediction with Evolutionary Algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  8. Modeling evolution of crosstalk in noisy signal transduction networks

    NASA Astrophysics Data System (ADS)

    Tareen, Ammar; Wingreen, Ned S.; Mukhopadhyay, Ranjan

    2018-02-01

    Signal transduction networks can form highly interconnected systems within cells due to crosstalk between constituent pathways. To better understand the evolutionary design principles underlying such networks, we study the evolution of crosstalk for two parallel signaling pathways that arise via gene duplication. We use a sequence-based evolutionary algorithm and evolve the network based on two physically motivated fitness functions related to information transmission. We find that one fitness function leads to a high degree of crosstalk while the other leads to pathway specificity. Our results offer insights on the relationship between network architecture and information transmission for noisy biomolecular networks.

  9. The impact of calibration and clock-model choice on molecular estimates of divergence times.

    PubMed

    Duchêne, Sebastián; Lanfear, Robert; Ho, Simon Y W

    2014-09-01

    Phylogenetic estimates of evolutionary timescales can be obtained from nucleotide sequence data using the molecular clock. These estimates are important for our understanding of evolutionary processes across all taxonomic levels. The molecular clock needs to be calibrated with an independent source of information, such as fossil evidence, to allow absolute ages to be inferred. Calibration typically involves fixing or constraining the age of at least one node in the phylogeny, enabling the ages of the remaining nodes to be estimated. We conducted an extensive simulation study to investigate the effects of the position and number of calibrations on the resulting estimate of the timescale. Our analyses focused on Bayesian estimates obtained using relaxed molecular clocks. Our findings suggest that an effective strategy is to include multiple calibrations and to prefer those that are close to the root of the phylogeny. Under these conditions, we found that evolutionary timescales could be estimated accurately even when the relaxed-clock model was misspecified and when the sequence data were relatively uninformative. We tested these findings in a case study of simian foamy virus, where we found that shallow calibrations caused the overall timescale to be underestimated by up to three orders of magnitude. Finally, we provide some recommendations for improving the practice of molecular-clock calibration. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. The evolutionary time machine: forecasting how populations can adapt to changing environments using dormant propagules

    PubMed Central

    Orsini, Luisa; Schwenk, Klaus; De Meester, Luc; Colbourne, John K.; Pfrender, Michael E.; Weider, Lawrence J.

    2013-01-01

    Evolutionary changes are determined by a complex assortment of ecological, demographic and adaptive histories. Predicting how evolution will shape the genetic structures of populations coping with current (and future) environmental challenges has principally relied on investigations through space, in lieu of time, because long-term phenotypic and molecular data are scarce. Yet, dormant propagules in sediments, soils and permafrost are convenient natural archives of population-histories from which to trace adaptive trajectories along extended time periods. DNA sequence data obtained from these natural archives, combined with pioneering methods for analyzing both ecological and population genomic time-series data, are likely to provide predictive models to forecast evolutionary responses of natural populations to environmental changes resulting from natural and anthropogenic stressors, including climate change. PMID:23395434

  11. Evolutionary Origins and Dynamics of Octoploid Strawberry Subgenomes Revealed by Dense Targeted Capture Linkage Maps

    PubMed Central

    Tennessen, Jacob A.; Govindarajulu, Rajanikanth; Ashman, Tia-Lynn; Liston, Aaron

    2014-01-01

    Whole-genome duplications are radical evolutionary events that have driven speciation and adaptation in many taxa. Higher-order polyploids have complex histories often including interspecific hybridization and dynamic genomic changes. This chromosomal reshuffling is poorly understood for most polyploid species, despite their evolutionary and agricultural importance, due to the challenge of distinguishing homologous sequences from each other. Here, we use dense linkage maps generated with targeted sequence capture to improve the diploid strawberry (Fragaria vesca) reference genome and to disentangle the subgenomes of the wild octoploid progenitors of cultivated strawberry, Fragaria virginiana and Fragaria chiloensis. Our novel approach, POLiMAPS (Phylogenetics Of Linkage-Map-Anchored Polyploid Subgenomes), leverages sequence reads to associate informative interhomeolog phylogenetic markers with linkage groups and reference genome positions. In contrast to a widely accepted model, we find that one of the four subgenomes originates with the diploid cytoplasm donor F. vesca, one with the diploid Fragaria iinumae, and two with an unknown ancestor close to F. iinumae. Extensive unidirectional introgression has converted F. iinumae-like subgenomes to be more F. vesca-like, but never the reverse, due either to homoploid hybridization in the F. iinumae-like diploid ancestors or else strong selection spreading F. vesca-like sequence among subgenomes through homeologous exchange. In addition, divergence between homeologous chromosomes has been substantially augmented by interchromosomal rearrangements. Our phylogenetic approach reveals novel aspects of the complicated web of genetic exchanges that occur during polyploid evolution and suggests a path forward for unraveling other agriculturally and ecologically important polyploid genomes. PMID:25477420

  12. Large-scale gene function analysis with the PANTHER classification system.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  13. Tree-Structured Digital Organisms Model

    NASA Astrophysics Data System (ADS)

    Suzuki, Teruhiko; Nobesawa, Shiho; Tahara, Ikuo

    Tierra and Avida are well-known models of digital organisms. They describe a life process as a sequence of computation codes. A linear sequence model may not be the only way to describe a digital organism, though it is very simple for a computer-based model. Thus we propose a new digital organism model based on a tree structure, which is rather similar to the generic programming. With our model, a life process is a combination of various functions, as if life in the real world is. This implies that our model can easily describe the hierarchical structure of life, and it can simulate evolutionary computation through mutual interaction of functions. We verified our model by simulations that our model can be regarded as a digital organism model according to its definitions. Our model even succeeded in creating species such as viruses and parasites.

  14. The molecular biology and evolution of feline immunodeficiency viruses of cougars

    PubMed Central

    Poss, Mary; Ross, Howard; Rodrigo, Allen; Terwee, Julie; VandeWoude, Sue; Biek, Roman

    2008-01-01

    Feline immunodeficiency virus (FIV) is a lentivirus that has been identified in many members of the family Felidae but domestic cats are the only FIV host in which infection results in disease. We studied FIVpco infection of cougars (Puma concolor) as a model for asymptomatic lentivirus infections to understand the mechanisms of host-virus coexistence. Several natural cougar populations were evaluated to determine if there are any consequences of FIVpco infection on cougar fecundity, survival, or susceptibility to other infections. We have sequenced full length viral genomes and conducted a detailed analysis of viral molecular evolution on these sequences and on genome fragments of serially sampled animals to determine the evolutionary forces experienced by this virus in cougars. In addition, we have evaluated the molecular genetics of FIVpco in a new host, domestic cats, to determine the evolutionary consequences to a host-adapted virus associated with cross-species infection. Our results indicate that there are no significant differences in survival, fecundity or susceptibility to other infections between FIVpco-infected and uninfected cougars. The molecular evolution of FIVpco is characterized by a slower evolutionary rate and an absence of positive selection, but also by proviral and plasma viral loads comparable to those of epidemic lentiviruses such as HIV-1 or FIVfca. Evolutionary and recombination rates and selection profiles change significantly when FIVpco replicates in a new host. PMID:18295904

  15. Understanding the Origin of Species with Genome-Scale Data: the Role of Gene Flow

    PubMed Central

    Sousa, Vitor; Hey, Jody

    2017-01-01

    As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population-genomic data sets. Such data hold the potential to resolve evolutionary biology’s long-standing questions about the role of gene exchange in species formation. In principle the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here. PMID:23657479

  16. Phylodynamics of the HIV-1 CRF02_AG clade in Cameroon

    PubMed Central

    Faria, Nuno Rodrigues; Suchard, Marc A; Abecasis, Ana; Sousa, J. D.; Ndembi, Nicaise; Camacho, R.J.; Vandamme, Anne-Mieke; Peeters, Martine; Lemey, Philippe

    2015-01-01

    Evolutionary analyses have revealed an origin of pandemic HIV-1 group M in the Congo River basin in the first part of the XXth century, but the patterns of historical viral spread in or around its epicentre remain largely unexplored. Here, we combine epidemiologic and molecular sequence data to investigate the spatiotemporal patterns of the CRF02_AG clade. By explicitly integrating prevalence counts and genetic population size estimates we date the epidemic emergence of CRF02_AG at 1973.1 (1972.1, 1975.3 95% CI). To infer their phylogeographic signature at a regional scale, we analyze pol and env time-stamped sequence data from 8 countries using a Bayesian phylogeographic approach based on a discrete asymmetric model. Our data confirms a spatial origin of this clade in the Democratic Republic of Congo (DRC) and suggests that viral dissemination to Cameroon occurred at an early stage of the evolutionary history of CRF02_AG. We find considerable support for epidemiological linkage between neighbour countries. Compilation of ethnographic data suggests that well-supported viral migration was related with chance exportation events rather than by sustained human migratory flows. Finally, using sequence data from 15 locations in Cameroon, we use relaxed random walk models to explore the spatiotemporal dynamics of CRF02_AG at a finer geographical detail. Phylogeographic dispersal in continuous space reveals that at least two distinct CRF02_AG lineages are circulating in overlapping regions that are evolving at different evolutionary and diffusion rates. Altogether, by combining molecular and epidemiological data, our results provide a time scale for CRF02_AG, place its spatial root within the putative root of group-M diversity and propose a scenario for the spatiotemporal patterns of a successful HIV-1 lineage both at a regional and country-scale. PMID:21565285

  17. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.

    PubMed

    Keel, Brittney N; Deng, Bo; Moriyama, Etsuko N

    2018-04-15

    Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. emoriyama2@unl.edu. Supplementary data are available at Bioinformatics online.

  18. Ancient DNA analysis reveals woolly rhino evolutionary relationships.

    PubMed

    Orlando, Ludovic; Leonard, Jennifer A; Thenot, Aurélie; Laudet, Vincent; Guerin, Claude; Hänni, Catherine

    2003-09-01

    With ancient DNA technology, DNA sequences have been added to the list of characters available to infer the phyletic position of extinct species in evolutionary trees. We have sequenced the entire 12S rRNA and partial cytochrome b (cyt b) genes of one 60-70,000-year-old sample, and partial 12S rRNA and cyt b sequences of two 40-45,000-year-old samples of the extinct woolly rhinoceros (Coelodonta antiquitatis). Based on these two mitochondrial markers, phylogenetic analyses show that C. antiquitatis is most closely related to one of the three extant Asian rhinoceros species, Dicerorhinus sumatrensis. Calculations based on a molecular clock suggest that the lineage leading to C. antiquitatis and D. sumatrensis diverged in the Oligocene, 21-26 MYA. Both results agree with morphological models deduced from palaeontological data. Nuclear inserts of mitochondrial DNA were identified in the ancient specimens. These data should encourage the use of nuclear DNA in future ancient DNA studies. It also further establishes that the degraded nature of ancient DNA does not completely protect ancient DNA studies based on mitochondrial data from the problems associated with nuclear inserts.

  19. Evolutionary signals of selection on cognition from the great tit genome and methylome

    PubMed Central

    Laine, Veronika N.; Gossmann, Toni I.; Schachtschneider, Kyle M.; Garroway, Colin J.; Madsen, Ole; Verhoeven, Koen J. F.; de Jager, Victor; Megens, Hendrik-Jan; Warren, Wesley C.; Minx, Patrick; Crooijmans, Richard P. M. A.; Corcoran, Pádraic; Adriaensen, Frank; Belda, Eduardo; Bushuev, Andrey; Cichon, Mariusz; Charmantier, Anne; Dingemanse, Niels; Doligez, Blandine; Eeva, Tapio; Erikstad, Kjell Einar; Fedorov, Slava; Hau, Michaela; Hille, Sabine; Hinde, Camilla; Kempenaers, Bart; Kerimov, Anvar; Krist, Milos; Mand, Raivo; Matthysen, Erik; Nager, Reudi; Norte, Claudia; Orell, Markku; Richner, Heinz; Slagsvold, Tore; Tilgar, Vallo; Tinbergen, Joost; Torok, Janos; Tschirren, Barbara; Yuta, Tera; Sheldon, Ben C.; Slate, Jon; Zeng, Kai; van Oers, Kees; Visser, Marcel E.; Groenen, Martien A. M.

    2016-01-01

    For over 50 years, the great tit (Parus major) has been a model species for research in evolutionary, ecological and behavioural research; in particular, learning and cognition have been intensively studied. Here, to provide further insight into the molecular mechanisms behind these important traits, we de novo assemble a great tit reference genome and whole-genome re-sequence another 29 individuals from across Europe. We show an overrepresentation of genes related to neuronal functions, learning and cognition in regions under positive selection, as well as increased CpG methylation in these regions. In addition, great tit neuronal non-CpG methylation patterns are very similar to those observed in mammals, suggesting a universal role in neuronal epigenetic regulation which can affect learning-, memory- and experience-induced plasticity. The high-quality great tit genome assembly will play an instrumental role in furthering the integration of ecological, evolutionary, behavioural and genomic approaches in this model species. PMID:26805030

  20. Using evolutionary computations to understand the design and evolution of gene and cell regulatory networks.

    PubMed

    Spirov, Alexander; Holloway, David

    2013-07-15

    This paper surveys modeling approaches for studying the evolution of gene regulatory networks (GRNs). Modeling of the design or 'wiring' of GRNs has become increasingly common in developmental and medical biology, as a means of quantifying gene-gene interactions, the response to perturbations, and the overall dynamic motifs of networks. Drawing from developments in GRN 'design' modeling, a number of groups are now using simulations to study how GRNs evolve, both for comparative genomics and to uncover general principles of evolutionary processes. Such work can generally be termed evolution in silico. Complementary to these biologically-focused approaches, a now well-established field of computer science is Evolutionary Computations (ECs), in which highly efficient optimization techniques are inspired from evolutionary principles. In surveying biological simulation approaches, we discuss the considerations that must be taken with respect to: (a) the precision and completeness of the data (e.g. are the simulations for very close matches to anatomical data, or are they for more general exploration of evolutionary principles); (b) the level of detail to model (we proceed from 'coarse-grained' evolution of simple gene-gene interactions to 'fine-grained' evolution at the DNA sequence level); (c) to what degree is it important to include the genome's cellular context; and (d) the efficiency of computation. With respect to the latter, we argue that developments in computer science EC offer the means to perform more complete simulation searches, and will lead to more comprehensive biological predictions. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Understanding phylogenetic incongruence: lessons from phyllostomid bats

    PubMed Central

    Dávalos, Liliana M; Cirranello, Andrea L; Geisler, Jonathan H; Simmons, Nancy B

    2012-01-01

    All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar-feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species-rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar-feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well-studied organisms such as phyllostomid bats. PMID:22891620

  2. Identifying predictors of time-inhomogeneous viral evolutionary processes.

    PubMed

    Bielejec, Filip; Baele, Guy; Rodrigo, Allen G; Suchard, Marc A; Lemey, Philippe

    2016-07-01

    Various factors determine the rate at which mutations are generated and fixed in viral genomes. Viral evolutionary rates may vary over the course of a single persistent infection and can reflect changes in replication rates and selective dynamics. Dedicated statistical inference approaches are required to understand how the complex interplay of these processes shapes the genetic diversity and divergence in viral populations. Although evolutionary models accommodating a high degree of complexity can now be formalized, adequately informing these models by potentially sparse data, and assessing the association of the resulting estimates with external predictors, remains a major challenge. In this article, we present a novel Bayesian evolutionary inference method, which integrates multiple potential predictors and tests their association with variation in the absolute rates of synonymous and non-synonymous substitutions along the evolutionary history. We consider clinical and virological measures as predictors, but also changes in population size trajectories that are simultaneously inferred using coalescent modelling. We demonstrate the potential of our method in an application to within-host HIV-1 sequence data sampled throughout the infection of multiple patients. While analyses of individual patient populations lack statistical power, we detect significant evidence for an abrupt drop in non-synonymous rates in late stage infection and a more gradual increase in synonymous rates over the course of infection in a joint analysis across all patients. The former is predicted by the immune relaxation hypothesis while the latter may be in line with increasing replicative fitness during the asymptomatic stage.

  3. The first complete chloroplast genome of the Genistoid legume Lupinus luteus: evidence for a novel major lineage-specific rearrangement and new insights regarding plastome evolution in the legume family

    PubMed Central

    Martin, Guillaume E.; Rousseau-Gueutin, Mathieu; Cordonnier, Solenn; Lima, Oscar; Michon-Coudouel, Sophie; Naquin, Delphine; de Carvalho, Julie Ferreira; Aïnouche, Malika; Salmon, Armel; Aïnouche, Abdelkader

    2014-01-01

    Background and Aims To date chloroplast genomes are available only for members of the non-protein amino acid-accumulating clade (NPAAA) Papilionoid lineages in the legume family (i.e. Millettioids, Robinoids and the ‘inverted repeat-lacking clade’, IRLC). It is thus very important to sequence plastomes from other lineages in order to better understand the unusual evolution observed in this model flowering plant family. To this end, the plastome of a lupine species, Lupinus luteus, was sequenced to represent the Genistoid lineage, a noteworthy but poorly studied legume group. Methods The plastome of L. luteus was reconstructed using Roche-454 and Illumina next-generation sequencing. Its structure, repetitive sequences, gene content and sequence divergence were compared with those of other Fabaceae plastomes. PCR screening and sequencing were performed in other allied legumes in order to determine the origin of a large inversion identified in L. luteus. Key Results The first sequenced Genistoid plastome (L. luteus: 155 894 bp) resulted in the discovery of a 36-kb inversion, embedded within the already known 50-kb inversion in the large single-copy (LSC) region of the Papilionoideae. This inversion occurs at the base or soon after the Genistoid emergence, and most probably resulted from a flip–flop recombination between identical 29-bp inverted repeats within two trnS genes. Comparative analyses of the chloroplast gene content of L. luteus vs. Fabaceae and extra-Fabales plastomes revealed the loss of the plastid rpl22 gene, and its functional relocation to the nucleus was verified using lupine transcriptomic data. An investigation into the evolutionary rate of coding and non-coding sequences among legume plastomes resulted in the identification of remarkably variable regions. Conclusions This study resulted in the discovery of a novel, major 36-kb inversion, specific to the Genistoids. Chloroplast mutational hotspots were also identified, which contain novel and potentially informative regions for molecular evolutionary studies at various taxonomic levels in the legumes. Taken together, the results provide new insights into the evolutionary landscape of the legume plastome. PMID:24769537

  4. On the Evolutionary and Biogeographic History of Saxifraga sect. Trachyphyllum (Gaud.) Koch (Saxifragaceae Juss.)

    PubMed Central

    DeChaine, Eric G.; Anderson, Stacy A.; McNew, Jennifer M.; Wendling, Barry M.

    2013-01-01

    Arctic-alpine plants in the genus Saxifraga L. (Saxifragaceae Juss.) provide an excellent system for investigating the process of diversification in northern regions. Yet, sect. Trachyphyllum (Gaud.) Koch, which is comprised of about 8 to 26 species, has still not been explored by molecular systematists even though taxonomists concur that the section needs to be thoroughly re-examined. Our goals were to use chloroplast trnL-F and nuclear ITS DNA sequence data to circumscribe the section phylogenetically, test models of geographically-based population divergence, and assess the utility of morphological characters in estimating evolutionary relationships. To do so, we sequenced both genetic markers for 19 taxa within the section. The phylogenetic inferences of sect. Trachyphyllum using maximum likelihood and Bayesian analyses showed that the section is polyphyletic, with S. aspera L. and S bryoides L. falling outside the main clade. In addition, the analyses supported several taxonomic re-classifications to prior names. We used two approaches to test biogeographic hypotheses: i) a coalescent approach in Mesquite to test the fit of our reconstructed gene trees to geographically-based models of population divergence and ii) a maximum likelihood inference in Lagrange. These tests uncovered strong support for an origin of the clade in the Southern Rocky Mountains of North America followed by dispersal and divergence episodes across refugia. Finally we adopted a stochastic character mapping approach in SIMMAP to investigate the utility of morphological characters in estimating evolutionary relationships among taxa. We found that few morphological characters were phylogenetically informative and many were misleading. Our molecular analyses provide a foundation for the diversity and evolutionary relationships within sect. Trachyphyllum and hypotheses for better understanding the patterns and processes of divergence in this section, other saxifrages, and plants inhabiting the North Pacific Rim. PMID:23922810

  5. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  6. Determination of evolutionary relationships of outbreak-associated Listeria monocytogenes strains of serotypes 1/2a and 1/2b by whole-genome sequencing

    USDA-ARS?s Scientific Manuscript database

    We used whole-genome sequencing to determine evolutionary relationships among 20 outbreak-associated clinical isolates of Listeria monocytogenes serotypes 1/2a and 1/2b. Isolates from 6 of 11 outbreaks fell outside the clonal groups or “epidemic clones” that have been previously associated with outb...

  7. ECOD: An Evolutionary Classification of Protein Domains

    PubMed Central

    Kinch, Lisa N.; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V.

    2014-01-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies. PMID:25474468

  8. ECOD: an evolutionary classification of protein domains.

    PubMed

    Cheng, Hua; Schaeffer, R Dustin; Liao, Yuxing; Kinch, Lisa N; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V

    2014-12-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

  9. Evolution in the block: common elements of 5S rDNA organization and evolutionary patterns in distant fish genera.

    PubMed

    Campo, Daniel; García-Vázquez, Eva

    2012-01-01

    The 5S rDNA is organized in the genome as tandemly repeated copies of a structural unit composed of a coding sequence plus a nontranscribed spacer (NTS). The coding region is highly conserved in the evolution, whereas the NTS vary in both length and sequence. It has been proposed that 5S rRNA genes are members of a gene family that have arisen through concerted evolution. In this study, we describe the molecular organization and evolution of the 5S rDNA in the genera Lepidorhombus and Scophthalmus (Scophthalmidae) and compared it with already known 5S rDNA of the very different genera Merluccius (Merluccidae) and Salmo (Salmoninae), to identify common structural elements or patterns for understanding 5S rDNA evolution in fish. High intra- and interspecific diversity within the 5S rDNA family in all the genera can be explained by a combination of duplications, deletions, and transposition events. Sequence blocks with high similarity in all the 5S rDNA members across species were identified for the four studied genera, with evidences of intense gene conversion within noncoding regions. We propose a model to explain the evolution of the 5S rDNA, in which the evolutionary units are blocks of nucleotides rather than the entire sequences or single nucleotides. This model implies a "two-speed" evolution: slow within blocks (homogenized by recombination) and fast within the gene family (diversified by duplications and deletions).

  10. Chemical characterization of the early evolutionary phases of high-mass star-forming regions

    NASA Astrophysics Data System (ADS)

    Gerner, Thomas

    2014-10-01

    The formation of high-mass stars is a very complex process and up to date no comprehensive theory about it exists. This thesis studies the early stages of high-mass star-forming regions and employs astrochemistry as a tool to probe their different physical conditions. We split the evolutionary sequence into four observationally motivated stages that are based on a classification proposed in the literature. The sequence is characterized by an increase of the temperatures and densities that strongly influences the chemistry in the different stages. We observed a sample of 59 high-mass star-forming regions that cover the whole sequence and statistically characterized the chemical compositions of the different stages. We determined average column densities of 18 different molecular species and found generally increasing abundances with stage. We fitted them for each stage with a 1D model, such that the result of the best fit to the previous stage was used as new input for the following. This is a unique approach and allowed us to infer physical properties like the temperature and density structure and yielded a typical chemical lifetime for the high-mass star-formation process of 1e5 years. The 18 analyzed molecular species also included four deuterated molecules whose chemistry is particularly sensitive to thermal history and thus is a promising tool to infer chemical ages. We found decreasing trends of the D/H ratios with evolutionary stage for 3 of the 4 molecular species and that the D/H ratio depends more on the fraction of warm and cold gas than on the total amount of gas. That indicates different chemical pathways for the different molecules and confirms the potential use of deuterated species as chemical age indicators. In addition, we mapped a low-mass star forming region in order to study the cosmic ray ionization rate, which is an important parameter in chemical models. While in chemical models it is commonly fixed, we found that it ! strongly varies with environment.

  11. Rapid evolution of the env gene leader sequence in cats naturally infected with feline immunodeficiency virus

    PubMed Central

    Hughes, Joseph; Biek, Roman; Litster, Annette; Willett, Brian J.; Hosie, Margaret J.

    2015-01-01

    Analysing the evolution of feline immunodeficiency virus (FIV) at the intra-host level is important in order to address whether the diversity and composition of viral quasispecies affect disease progression. We examined the intra-host diversity and the evolutionary rates of the entire env and structural fragments of the env sequences obtained from sequential blood samples in 43 naturally infected domestic cats that displayed different clinical outcomes. We observed in the majority of cats that FIV env showed very low levels of intra-host diversity. We estimated that env evolved at a rate of 1.16×10−3 substitutions per site per year and demonstrated that recombinant sequences evolved faster than non-recombinant sequences. It was evident that the V3–V5 fragment of FIV env displayed higher evolutionary rates in healthy cats than in those with terminal illness. Our study provided the first evidence that the leader sequence of env, rather than the V3–V5 sequence, had the highest intra-host diversity and the highest evolutionary rate of all env fragments, consistent with this region being under a strong selective pressure for genetic variation. Overall, FIV env displayed relatively low intra-host diversity and evolved slowly in naturally infected cats. The maximum evolutionary rate was observed in the leader sequence of env. Although genetic stability is not necessarily a prerequisite for clinical stability, the higher genetic stability of FIV compared with human immunodeficiency virus might explain why many naturally infected cats do not progress rapidly to AIDS. PMID:25535323

  12. Did A Planet Survive A Post-Main Sequence Evolutionary Event?

    NASA Astrophysics Data System (ADS)

    Sorber, Rebecca; Jang-Condell, Hannah; Zimmerman, Mara

    2018-06-01

    The GL86 is star system approximately 10 pc away with a main sequence K- type ~ 0.77 M⊙ star (GL 86A) with a white dwarf ~0.49 M⊙ companion (GL86 B). The system has a ~ 18.4 AU semi-major axis, an orbital period of ~353 yrs, and an eccentricity of ~ 0.39. A 4.5 MJ planet orbits the main sequence star with a semi-major axis of 0.113 AU, an orbital period of 15.76 days, in a near circular orbit with an eccentricity of 0.046. If we assume that this planet was formed during the time when the white dwarf was a main sequence star, it would be difficult for the planet to have remained in a stable orbit during the post-main sequence evolution of GL86 B. The post-main sequence evolution with planet survival will be examined by modeling using the program Mercury (Chambers 1999). Using the model, we examine the origins of the planet: whether it formed before or after the post-main sequence evolution of GL86B. The modeling will give us insight into the dynamical evolution of, not only, the binary star system, but also the planet’s life cycle.

  13. Sequence similarities and evolutionary relationships of microbial, plant and animal alpha-amylases.

    PubMed

    Janecek, S

    1994-09-01

    Amino acid sequence comparison of 37 alpha-amylases from microbial, plant and animal sources was performed to identify their mutual sequence similarities in addition to the five already described conserved regions. These sequence regions were examined from structure/function and evolutionary perspectives. An unrooted evolutionary tree of alpha-amylases was constructed on a subset of 55 residues from the alignment of sequence similarities along with conserved regions. The most important new information extracted from the tree was as follows: (a) the close evolutionary relationship of Alteromonas haloplanctis alpha-amylase (thermolabile enzyme from an antarctic psychrotroph) with the already known group of homologous alpha-amylases from streptomycetes, Thermomonospora curvata, insects and mammals, and (b) the remarkable 40.1% identity between starch-saccharifying Bacillus subtilis alpha-amylase and the enzyme from the ruminal bacterium Butyrivibrio fibrisolvens, an alpha-amylase with an unusually large polypeptide chain (943 residues in the mature enzyme). Due to a very high degree of similarity, the whole amino acid sequences of three groups of alpha-amylases, namely (a) fungi and yeasts, (b) plants, and (c) A. haloplanctis, streptomycetes, T. curvata, insects and mammals, were aligned independently and their unrooted distance trees were calculated using these alignments. Possible rooting of the trees was also discussed. Based on the knowledge of the location of the five disulfide bonds in the structure of pig pancreatic alpha-amylase, the possible disulfide bridges were established for each of these groups of homologous alpha-amylases.

  14. The tangled bank of amino acids.

    PubMed

    Goldstein, Richard A; Pollock, David D

    2016-07-01

    The use of amino acid substitution matrices to model protein evolution has yielded important insights into both the evolutionary process and the properties of specific protein families. In order to make these models tractable, standard substitution matrices represent the average results of the evolutionary process rather than the underlying molecular biophysics and population genetics, treating proteins as a set of independently evolving sites rather than as an integrated biomolecular entity. With advances in computing and the increasing availability of sequence data, we now have an opportunity to move beyond current substitution matrices to more interpretable mechanistic models with greater fidelity to the evolutionary process of mutation and selection and the holistic nature of the selective constraints. As part of this endeavour, we consider how epistatic interactions induce spatial and temporal rate heterogeneity, and demonstrate how these generally ignored factors can reconcile standard substitution rate matrices and the underlying biology, allowing us to better understand the meaning of these substitution rates. Using computational simulations of protein evolution, we can demonstrate the importance of both spatial and temporal heterogeneity in modelling protein evolution. © 2016 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  15. Evolutionary genetics of insect innate immunity.

    PubMed

    Viljakainen, Lumi

    2015-11-01

    Patterns of evolution in immune defense genes help to understand the evolutionary dynamics between hosts and pathogens. Multiple insect genomes have been sequenced, with many of them having annotated immune genes, which paves the way for a comparative genomic analysis of insect immunity. In this review, I summarize the current state of comparative and evolutionary genomics of insect innate immune defense. The focus is on the conserved and divergent components of immunity with an emphasis on gene family evolution and evolution at the sequence level; both population genetics and molecular evolution frameworks are considered. © The Author 2015. Published by Oxford University Press.

  16. A replicated climate change field experiment reveals rapid evolutionary response in an ecologically important soil invertebrate.

    PubMed

    Bataillon, Thomas; Galtier, Nicolas; Bernard, Aurelien; Cryer, Nicolai; Faivre, Nicolas; Santoni, Sylvain; Severac, Dany; Mikkelsen, Teis N; Larsen, Klaus S; Beier, Claus; Sørensen, Jesper G; Holmstrup, Martin; Ehlers, Bodil K

    2016-07-01

    Whether species can respond evolutionarily to current climate change is crucial for the persistence of many species. Yet, very few studies have examined genetic responses to climate change in manipulated experiments carried out in natural field conditions. We examined the evolutionary response to climate change in a common annelid worm using a controlled replicated experiment where climatic conditions were manipulated in a natural setting. Analyzing the transcribed genome of 15 local populations, we found that about 12% of the genetic polymorphisms exhibit differences in allele frequencies associated to changes in soil temperature and soil moisture. This shows an evolutionary response to realistic climate change happening over short-time scale, and calls for incorporating evolution into models predicting future response of species to climate change. It also shows that designed climate change experiments coupled with genome sequencing offer great potential to test for the occurrence (or lack) of an evolutionary response. © 2016 The Authors. Global Change Biology Published by John Wiley & Sons Ltd.

  17. A Large Stellar Evolution Database for Population Synthesis Studies. I. Scaled Solar Models and Isochrones

    NASA Astrophysics Data System (ADS)

    Pietrinferni, Adriano; Cassisi, Santi; Salaris, Maurizio; Castelli, Fiorella

    2004-09-01

    We present a large and updated stellar evolution database for low-, intermediate-, and high-mass stars in a wide metallicity range, suitable for studying Galactic and extragalactic simple and composite stellar populations using population synthesis techniques. The stellar mass range is between ~0.5 and 10 Msolar with a fine mass spacing. The metallicity [Fe/H] comprises 10 values ranging from -2.27 to 0.40, with a scaled solar metal distribution. The initial He mass fraction ranges from Y=0.245, for the more metal-poor composition, up to 0.303 for the more metal-rich one, with ΔY/ΔZ~1.4. For each adopted chemical composition, the evolutionary models have been computed without (canonical models) and with overshooting from the Schwarzschild boundary of the convective cores during the central H-burning phase. Semiconvection is included in the treatment of core convection during the He-burning phase. The whole set of evolutionary models can be used to compute isochrones in a wide age range, from ~30 Myr to ~15 Gyr. Both evolutionary models and isochrones are available in several observational planes, employing an updated set of bolometric corrections and color-Teff relations computed for this project. The number of points along the models and the resulting isochrones is selected in such a way that interpolation for intermediate metallicities not contained in the grid is straightforward; a simple quadratic interpolation produces results of sufficient accuracy for population synthesis applications.We compare our isochrones with results from a series of widely used stellar evolution databases and perform some empirical tests for the reliability of our models. Since this work is devoted to scaled solar chemical compositions, we focus our attention on the Galactic disk stellar populations, employing multicolor photometry of unevolved field main-sequence stars with precise Hipparcos parallaxes, well-studied open clusters, and one eclipsing binary system with precise measurements of masses, radii, and [Fe/H] of both components. We find that the predicted metallicity dependence of the location of the lower, unevolved main sequence in the color magnitude diagram (CMD) appears in satisfactory agreement with empirical data. When comparing our models with CMDs of selected, well-studied, open clusters, once again we were able to properly match the whole observed evolutionary sequences by assuming cluster distance and reddening estimates in satisfactory agreement with empirical evaluations of these quantities. In general, models including overshooting during the H-burning phase provide a better match to the observations, at least for ages below ~4 Gyr. At [Fe/H] around solar and higher ages (i.e., smaller convective cores) before the onset of radiative cores, the selected efficiency of core overshooting may be too high in our model, as well as in various other models in the literature. Since we also provide canonical models, the reader is strongly encouraged to always compare the results from both sets in this critical age range.

  18. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian

    PubMed Central

    2014-01-01

    Background The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. Description We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215–364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. Conclusions The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a “non-model system.” PMID:24467778

  19. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian.

    PubMed

    Stefanik, Derek J; Lubinski, Tristan J; Granger, Brian R; Byrd, Allyson L; Reitzel, Adam M; DeFilippo, Lukas; Lorenc, Allison; Finnerty, John R

    2014-01-28

    The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215-364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a "non-model system."

  20. Understanding protein evolution: from protein physics to Darwinian selection.

    PubMed

    Zeldovich, Konstantin B; Shakhnovich, Eugene I

    2008-01-01

    Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.

  1. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  2. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  3. The evolution of transcriptional regulation in eukaryotes

    NASA Technical Reports Server (NTRS)

    Wray, Gregory A.; Hahn, Matthew W.; Abouheif, Ehab; Balhoff, James P.; Pizer, Margaret; Rockman, Matthew V.; Romano, Laura A.

    2003-01-01

    Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.

  4. Tempo and mode of genomic mutations unveil human evolutionary history.

    PubMed

    Hara, Yuichiro

    2015-01-01

    Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.

  5. Modeling populations of rotationally mixed massive stars

    NASA Astrophysics Data System (ADS)

    Brott, I.

    2011-02-01

    Massive stars can be considered as cosmic engines. With their high luminosities, strong stellar winds and violent deaths they drive the evolution of galaxies through-out the history of the universe. Despite the importance of massive stars, their evolution is still poorly understood. Two major issues have plagued evolutionary models of massive stars until today: mixing and mass loss On the main sequence, the effects of mass loss remain limited in the considered mass and metallicity range, this thesis concentrates on the role of mixing in massive stars. This thesis approaches this problem just on the cross road between observations and simulations. The main question: Do evolutionary models of single stars, accounting for the effects of rotation, reproduce the observed properties of real stars. In particular we are interested if the evolutionary models can reproduce the surface abundance changes during the main-sequence phase. To constrain our models we build a population synthesis model for the sample of the VLT-FLAMES Survey of Massive stars, for which star-formation history and rotational velocity distribution are well constrained. We consider the four main regions of the Hunter diagram. Nitrogen un-enriched slow rotators and nitrogen enriched fast rotators that are predicted by theory. Nitrogen enriched slow rotators and nitrogen unenriched fast rotators that are not predicted by our model. We conclude that currently these comparisons are not sufficient to verify the theory of rotational mixing. Physical processes in addition to rotational mixing appear necessary to explain the stars in the later two regions. The chapters of this Thesis have been published in the following Journals: Ch. 2: ``Rotating Massive Main-Sequence Stars I: Grids of Evolutionary Models and Isochrones'', I. Brott, S. E. de Mink, M. Cantiello, N. Langer, A. de Koter, C. J. Evans, I. Hunter, C. Trundle, J.S. Vink submitted to Astronomy & Astrop hysics Ch. 3: ``The VLT-FLAMES Survey of Massive Stars: Rotation and Nitrogen Enrichment as the Key to Understanding Massive Star Evolution'', I.Hunter, I.Brott, D.J. Lennon, N. Langer, C. Trundle, A. de Koter, C.J. Evans and R.S.I. Ryans The Astrophysical Journal, 2008, 676, L29-L32 Ch. 4: ``The VLT-FLAMES Survey of Massive Stars: Constraints on Stellar Evolution from the Chemical Compositions of Rapidly Rotating Galactic and Magellanic Cloud B-type Stars '', I. Hunter, I. Brott, N. Langer, D.J. Lennon, P.L. Dufton, I.D. Howarth R.S.I. Ryan, C. Trundle, C. Evans, A. de Koter and S.J. Smartt Published in Astronomy & Astropysics, 2009, 496, 841- 853 Ch. 5: ``Rotating Massive Main-Sequence Stars II: Simulating a Population of LMC early B-type Stars as a Test of Rotational Mixing '', I. Brott, C. J. Evans, I. Hunter, A. de Koter, N. Langer, P. L. Dufton, M. Cantiello, C. Trundle, D. J. Lennon, S.E. de Mink, S.-C. Yoon, P. Anders submitted to Astronomy & Astrophysics Ch 6: ``The Nature of B Supergiants: Clues From a Steep Drop in Rotation Rates at 22 000 K - The possibility of Bi-stability braking'', Jorick S. Vink, I. Brott, G. Graefener, N. Langer, A. de Koter, D.J. Lennon Astronomy & Astrophysics, 2010, 512, L7

  6. UV observations of blue stragglers and population 2 K dwarfs

    NASA Technical Reports Server (NTRS)

    Carney, B. W.; Bond, H. E.

    1986-01-01

    Blue stragglers are stars, found usually in either open or globular clusters, that appear to lie on the main sequence, but are brighter and bluer than the cluster turn-off. Currently, two rival models are invoked to explain this apparently pathological behavior: internal mixing (so that fresh fuel is brought into the stellar core); and mass transfer (by which a normal main sequence star acquires mass from an evolving nearby companion and so moves up the main sequence). The latter model predicts that in the absence of complete mass transfer (i.e., coalescence), blue stragglers should be binary systems with the fainter star in a post-main sequence evolutionary state. It is important to ascertain the cause of this phenomenon since stellar evolution models of main sequence stars play such a vital role in astronomy. If mass transfer is involved, one may easily exclude binaries from age determinations of clusters, but if mixing is the cause, our age determinations will be much less accurate unless we can determine whether all stars or only some mix, and what causes the mixing to occur at all.

  7. A coarse-grained biophysical model of sequence evolution and the population size dependence of the speciation rate

    PubMed Central

    Khatri, Bhavin S.; Goldstein, Richard A.

    2015-01-01

    Speciation is fundamental to understanding the huge diversity of life on Earth. Although still controversial, empirical evidence suggests that the rate of speciation is larger for smaller populations. Here, we explore a biophysical model of speciation by developing a simple coarse-grained theory of transcription factor-DNA binding and how their co-evolution in two geographically isolated lineages leads to incompatibilities. To develop a tractable analytical theory, we derive a Smoluchowski equation for the dynamics of binding energy evolution that accounts for the fact that natural selection acts on phenotypes, but variation arises from mutations in sequences; the Smoluchowski equation includes selection due to both gradients in fitness and gradients in sequence entropy, which is the logarithm of the number of sequences that correspond to a particular binding energy. This simple consideration predicts that smaller populations develop incompatibilities more quickly in the weak mutation regime; this trend arises as sequence entropy poises smaller populations closer to incompatible regions of phenotype space. These results suggest a generic coarse-grained approach to evolutionary stochastic dynamics, allowing realistic modelling at the phenotypic level. PMID:25936759

  8. Interchromosomal Duplications on the Bactrocera oleae Y Chromosome Imply a Distinct Evolutionary Origin of the Sex Chromosomes Compared to Drosophila

    PubMed Central

    Gabrieli, Paolo; Gomulski, Ludvik M.; Bonomi, Angelica; Siciliano, Paolo; Scolari, Francesca; Franz, Gerald; Jessup, Andrew; Malacrida, Anna R.; Gasperi, Giuliano

    2011-01-01

    Background Diptera have an extraordinary variety of sex determination mechanisms, and Drosophila melanogaster is the paradigm for this group. However, the Drosophila sex determination pathway is only partially conserved and the family Tephritidae affords an interesting example. The tephritid Y chromosome is postulated to be necessary to determine male development. Characterization of Y sequences, apart from elucidating the nature of the male determining factor, is also important to understand the evolutionary history of sex chromosomes within the Tephritidae. We studied the Y sequences from the olive fly, Bactrocera oleae. Its Y chromosome is minute and highly heterochromatic, and displays high heteromorphism with the X chromosome. Methodology/Principal Findings A combined Representational Difference Analysis (RDA) and fluorescence in-situ hybridization (FISH) approach was used to investigate the Y chromosome to derive information on its sequence content. The Y chromosome is strewn with repetitive DNA sequences, the majority of which are also interdispersed in the pericentromeric regions of the autosomes. The Y chromosome appears to have accumulated small and large repetitive interchromosomal duplications. The large interchromosomal duplications harbour an importin-4-like gene fragment. Apart from these importin-4-like sequences, the other Y repetitive sequences are not shared with the X chromosome, suggesting molecular differentiation of these two chromosomes. Moreover, as the identified Y sequences were not detected on the Y chromosomes of closely related tephritids, we can infer divergence in the repetitive nature of their sequence contents. Conclusions/Significance The identification of Y-linked sequences may tell us much about the repetitive nature, the origin and the evolution of Y chromosomes. We hypothesize how these repetitive sequences accumulated and were maintained on the Y chromosome during its evolutionary history. Our data reinforce the idea that the sex chromosomes of the Tephritidae may have distinct evolutionary origins with respect to those of the Drosophilidae and other Dipteran families. PMID:21408187

  9. A novel model for DNA sequence similarity analysis based on graph theory.

    PubMed

    Qi, Xingqin; Wu, Qin; Zhang, Yusen; Fuller, Eddie; Zhang, Cun-Quan

    2011-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

  10. Functionally essential, invariant glutamate near the C-terminus of strand beta 5 in various (alpha/beta)8-barrel enzymes as a possible indicator of their evolutionary relatedness.

    PubMed

    Janecek, S; Baláz, S

    1995-08-01

    Twelve different (alpha/beta)8-barrel enzymes belonging to three structurally distinct families were found to contain, near the C-terminus of their strand beta 5, a conserved invariant glutamic acid residue that plays an important functional role in each of these enzymes. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif owing to their mutual evolutionary relatedness. For this purpose, the sequence region around the well conserved fifth beta-strand of alpha-amylase containing catalytic glutamate (Glu230, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The isolated sequence stretches of the 12 (alpha/beta)8-barrels are discussed from both the sequence-structural and the evolutionary point of view, the invariant glutamate residue being proposed to be a joining feature of the studied group of enzymes remaining from their ancestral (alpha/beta)8-barrel.

  11. Plastid Phylogenomics Resolve Deep Relationships among Eupolypod II Ferns with Rapid Radiation and Rate Heterogeneity

    PubMed Central

    Wei, Ran; Yan, Yue-Hong; Harris, AJ; Kang, Jong-Soo; Shen, Hui; Zhang, Xian-Chun

    2017-01-01

    Abstract The eupolypods II ferns represent a classic case of evolutionary radiation and, simultaneously, exhibit high substitution rate heterogeneity. These factors have been proposed to contribute to the contentious resolutions among clades within this fern group in multilocus phylogenetic studies. We investigated the deep phylogenetic relationships of eupolypod II ferns by sampling all major families and using 40 plastid genomes, or plastomes, of which 33 were newly sequenced with next-generation sequencing technology. We performed model-based analyses to evaluate the diversity of molecular evolutionary rates for these ferns. Our plastome data, with more than 26,000 informative characters, yielded good resolution for deep relationships within eupolypods II and unambiguously clarified the position of Rhachidosoraceae and the monophyly of Athyriaceae. Results of rate heterogeneity analysis revealed approximately 33 significant rate shifts in eupolypod II ferns, with the most heterogeneous rates (both accelerations and decelerations) occurring in two phylogenetically difficult lineages, that is, the Rhachidosoraceae–Aspleniaceae and Athyriaceae clades. These observations support the hypothesis that rate heterogeneity has previously constrained the deep phylogenetic resolution in eupolypods II. According to the plastome data, we propose that 14 chloroplast markers are particularly phylogenetically informative for eupolypods II both at the familial and generic levels. Our study demonstrates the power of a character-rich plastome data set and high-throughput sequencing for resolving the recalcitrant lineages, which have undergone rapid evolutionary radiation and dramatic changes in substitution rates. PMID:28854625

  12. OncoNEM: inferring tumor evolution from single-cell sequencing data.

    PubMed

    Ross, Edith M; Markowetz, Florian

    2016-04-15

    Single-cell sequencing promises a high-resolution view of genetic heterogeneity and clonal evolution in cancer. However, methods to infer tumor evolution from single-cell sequencing data lag behind methods developed for bulk-sequencing data. Here, we present OncoNEM, a probabilistic method for inferring intra-tumor evolutionary lineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellular subpopulations and infers their genotypes as well as a tree describing their evolutionary relationships. In simulation studies, we assess OncoNEM's robustness and benchmark its performance against competing methods. Finally, we show its applicability in case studies of muscle-invasive bladder cancer and essential thrombocythemia.

  13. The Evolution of the Human Genome

    PubMed Central

    Simonti, Corinne N.; Capra, John A.

    2015-01-01

    Human genomes hold a record of the evolutionary forces that have shaped our species. Advances in DNA sequencing, functional genomics, and population genetic modeling have deepened our understanding of human demographic history, natural selection, and many other long-studied topics. These advances have also revealed many previously underappreciated factors that influence the evolution of the human genome, including functional modifications to DNA and histones, conserved 3D topological chromatin domains, structural variation, and heterogeneous mutation patterns along the genome. Using evolutionary theory as a lens to study these phenomena will lead to significant breakthroughs in understanding what makes us human and why we get sick. PMID:26338498

  14. The evolutionary sequence: origin and emergences.

    PubMed

    Fox, S W

    1986-03-01

    The evolutionary sequence is being reexamined experimentally from a "Big Bang"origin to the protocell and from the emergence of protocell and variety of species to Darwin's mental power (mind) and society (The Descent of Man). A most fundamentally revisionary consequence of experiments is an emphasis on endogenous ordering. This principle, seen vividly in ordered copolymerization of amino acids, has had new impact on the theory of Darwinian evolution and has been found to apply to the entire sequence. Herein, I will discuss some problems of dealing with teaching controversial subjects.

  15. The evolutionary sequence: origin and emergences

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    The evolutionary sequence is being reexamined experimentally from a "Big Bang"origin to the protocell and from the emergence of protocell and variety of species to Darwin's mental power (mind) and society (The Descent of Man). A most fundamentally revisionary consequence of experiments is an emphasis on endogenous ordering. This principle, seen vividly in ordered copolymerization of amino acids, has had new impact on the theory of Darwinian evolution and has been found to apply to the entire sequence. Herein, I will discuss some problems of dealing with teaching controversial subjects.

  16. Novel Insights on Hantavirus Evolution: The Dichotomy in Evolutionary Pressures Acting on Different Hantavirus Segments.

    PubMed

    Sankar, Sathish; Upadhyay, Mohita; Ramamurthy, Mageshbabu; Vadivel, Kumaran; Sagadevan, Kalaiselvan; Nandagopal, Balaji; Vivekanandan, Perumal; Sridharan, Gopalan

    2015-01-01

    Hantaviruses are important emerging zoonotic pathogens. The current understanding of hantavirus evolution is complicated by the lack of consensus on co-divergence of hantaviruses with their animal hosts. In addition, hantaviruses have long-term associations with their reservoir hosts. Analyzing the relative abundance of dinucleotides may shed new light on hantavirus evolution. We studied the relative abundance of dinucleotides and the evolutionary pressures shaping different hantavirus segments. A total of 118 sequences were analyzed; this includes 51 sequences of the S segment, 43 sequences of the M segment and 23 sequences of the L segment. The relative abundance of dinucleotides, effective codon number (ENC), codon usage biases were analyzed. Standard methods were used to investigate the relative roles of mutational pressure and translational selection on the three hantavirus segments. All three segments of hantaviruses are CpG depleted. Mutational pressure is the predominant evolutionary force leading to CpG depletion among hantaviruses. Interestingly, the S segment of hantaviruses is GpU depleted and in contrast to CpG depletion, the depletion of GpU dinucleotides from the S segment is driven by translational selection. Our findings also suggest that mutational pressure is the primary evolutionary pressure acting on the S and the M segments of hantaviruses. While translational selection plays a key role in shaping the evolution of the L segment. Our findings highlight how different evolutionary pressures may contribute disproportionally to the evolution of the three hantavirus segments. These findings provide new insights on the current understanding of hantavirus evolution. There is a dichotomy among evolutionary pressures shaping a) the relative abundance of different dinucleotides in hantavirus genomes b) the evolution of the three hantavirus segments.

  17. A network approach to analyzing highly recombinant malaria parasite genes.

    PubMed

    Larremore, Daniel B; Clauset, Aaron; Buckee, Caroline O

    2013-01-01

    The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.

  18. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes

    PubMed Central

    Larremore, Daniel B.; Clauset, Aaron; Buckee, Caroline O.

    2013-01-01

    The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences. PMID:24130474

  19. Can you sequence ecology? Metagenomics of adaptive diversification.

    PubMed

    Marx, Christopher J

    2013-01-01

    Few areas of science have benefited more from the expansion in sequencing capability than the study of microbial communities. Can sequence data, besides providing hypotheses of the functions the members possess, detect the evolutionary and ecological processes that are occurring? For example, can we determine if a species is adapting to one niche, or if it is diversifying into multiple specialists that inhabit distinct niches? Fortunately, adaptation of populations in the laboratory can serve as a model to test our ability to make such inferences about evolution and ecology from sequencing. Even adaptation to a single niche can give rise to complex temporal dynamics due to the transient presence of multiple competing lineages. If there are multiple niches, this complexity is augmented by segmentation of the population into multiple specialists that can each continue to evolve within their own niche. For a known example of parallel diversification that occurred in the laboratory, sequencing data gave surprisingly few obvious, unambiguous signs of the ecological complexity present. Whereas experimental systems are open to direct experimentation to test hypotheses of selection or ecological interaction, the difficulty in "seeing ecology" from sequencing for even such a simple system suggests translation to communities like the human microbiome will be quite challenging. This will require both improved empirical methods to enhance the depth and time resolution for the relevant polymorphisms and novel statistical approaches to rigorously examine time-series data for signs of various evolutionary and ecological phenomena within and between species.

  20. A wing expressed sequence tag resource for Bicyclus anynana butterflies, an evo-devo model

    PubMed Central

    Beldade, Patrícia; Rudd, Stephen; Gruber, Jonathan D; Long, Anthony D

    2006-01-01

    Background Butterfly wing color patterns are a key model for integrating evolutionary developmental biology and the study of adaptive morphological evolution. Yet, despite the biological, economical and educational value of butterflies they are still relatively under-represented in terms of available genomic resources. Here, we describe an Expression Sequence Tag (EST) project for Bicyclus anynana that has identified the largest available collection to date of expressed genes for any butterfly. Results By targeting cDNAs from developing wings at the stages when pattern is specified, we biased gene discovery towards genes potentially involved in pattern formation. Assembly of 9,903 ESTs from a subtracted library allowed us to identify 4,251 genes of which 2,461 were annotated based on BLAST analyses against relevant gene collections. Gene prediction software identified 2,202 peptides, of which 215 longer than 100 amino acids had no homology to any known proteins and, thus, potentially represent novel or highly diverged butterfly genes. We combined gene and Single Nucleotide Polymorphism (SNP) identification by constructing cDNA libraries from pools of outbred individuals, and by sequencing clones from the 3' end to maximize alignment depth. Alignments of multi-member contigs allowed us to identify over 14,000 putative SNPs, with 316 genes having at least one high confidence double-hit SNP. We furthermore identified 320 microsatellites in transcribed genes that can potentially be used as genetic markers. Conclusion Our project was designed to combine gene and sequence polymorphism discovery and has generated the largest gene collection available for any butterfly and many potential markers in expressed genes. These resources will be invaluable for exploring the potential of B. anynana in particular, and butterflies in general, as models in ecological, evolutionary, and developmental genetics. PMID:16737530

  1. Evolution of Pre-Main Sequence Accretion Disks

    NASA Technical Reports Server (NTRS)

    Hartmann, Lee W.

    2004-01-01

    The aim of this project is to develop a comprehensive global picture of the physical conditions in, and evolutionary timescales of, pre-main sequence accretion disks. The results of this work will help constrain the initial conditions for planet formation. To this end we are developing much larger samples of 3-10 Myr-old stars to provide better empirical constraints on protoplanetary disk evolution; measuring disk accretion rates in these systems; and constructing detailed model disk structures consistent with observations to infer physical conditions such as grain growth in protoplanetary disks.

  2. Evolution of Pre-Main Sequence Accretion Disks

    NASA Technical Reports Server (NTRS)

    Hartmann, Lee W.

    2003-01-01

    The aim of this project is to develop a comprehensive global picture of the physical conditions in, and evolutionary timescales of, pre-main sequence accretion disks. The results of this work will help constrain the initial conditions for planet formation. To this end we are developing much larger samples of 3-10 Myr-old stars to provide better empirical constraints on protoplanetary disk evolution; measuring disk accretion rates in these systems; and constructing detailed model disk structures consistent with observations to infer physical conditions such as grain growth in protoplanetary disks.

  3. Evolution of Pre-Main Sequence Accretion Disks

    NASA Technical Reports Server (NTRS)

    Hartmann, Lee W.

    2005-01-01

    The aim of this project was to develop a comprehensive global picture of the physical conditions in, and evolutionary timescales of, premain sequence accretion disks. The results of this work will help constrain the initial conditions for planet formation. To this end we developed much larger samples of 3-10 Myr-old stars to provide better empirical constraints on protoplanetary disk evolution; measured disk accretion rates in these systems; and constructed detailed model disk structures consistent with observations to infer physical conditions such as grain growth in protoplanetary disks.

  4. Evolution of epigenetic regulation in vertebrate genomes

    PubMed Central

    Lowdon, Rebecca F.; Jang, Hyo Sik; Wang, Ting

    2016-01-01

    Empirical models of sequence evolution have spurred progress in the field of evolutionary genetics for decades. We are now realizing the importance and complexity of the eukaryotic epigenome. While epigenome analysis has been applied to genomes from single cell eukaryotes to human, comparative analyses are still relatively few, and computational algorithms to quantify epigenome evolution remain scarce. Accordingly, a quantitative model of epigenome evolution remains to be established. Here we review the comparative epigenomics literature and synthesize its overarching themes. We also suggest one mechanism, transcription factor binding site turnover, which relates sequence evolution to epigenetic conservation or divergence. Lastly, we propose a framework for how the field can move forward to build a coherent quantitative model of epigenome evolution. PMID:27080453

  5. Petri net modeling of high-order genetic systems using grammatical evolution.

    PubMed

    Moore, Jason H; Hahn, Lance W

    2003-11-01

    Understanding how DNA sequence variations impact human health through a hierarchy of biochemical and physiological systems is expected to improve the diagnosis, prevention, and treatment of common, complex human diseases. We have previously developed a hierarchical dynamic systems approach based on Petri nets for generating biochemical network models that are consistent with genetic models of disease susceptibility. This modeling approach uses an evolutionary computation approach called grammatical evolution as a search strategy for optimal Petri net models. We have previously demonstrated that this approach routinely identifies biochemical network models that are consistent with a variety of genetic models in which disease susceptibility is determined by nonlinear interactions between two DNA sequence variations. In the present study, we evaluate whether the Petri net approach is capable of identifying biochemical networks that are consistent with disease susceptibility due to higher order nonlinear interactions between three DNA sequence variations. The results indicate that our model-building approach is capable of routinely identifying good, but not perfect, Petri net models. Ideas for improving the algorithm for this high-dimensional problem are presented.

  6. Sequence co-evolution gives 3D contacts and structures of protein complexes

    PubMed Central

    Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S

    2014-01-01

    Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213

  7. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  8. Biophysical and structural considerations for protein sequence evolution

    PubMed Central

    2011-01-01

    Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites. Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model. PMID:22171550

  9. SENCA: A Multilayered Codon Model to Study the Origins and Dynamics of Codon Usage

    PubMed Central

    Pouyet, Fanny; Bailly-Bechet, Marc; Mouchiroud, Dominique; Guéguen, Laurent

    2016-01-01

    Gene sequences are the target of evolution operating at different levels, including the nucleotide, codon, and amino acid levels. Disentangling the impact of those different levels on gene sequences requires developing a probabilistic model with three layers. Here we present SENCA (site evolution of nucleotides, codons, and amino acids), a codon substitution model that separately describes 1) nucleotide processes which apply on all sites of a sequence such as the mutational bias, 2) preferences between synonymous codons, and 3) preferences among amino acids. We argue that most synonymous substitutions are not neutral and that SENCA provides more accurate estimates of selection compared with more classical codon sequence models. We study the forces that drive the genomic content evolution, intraspecifically in the core genome of 21 prokaryotes and interspecifically for five Enterobacteria. We retrieve the existence of a universal mutational bias toward AT, and that taking into account selection on synonymous codon usage has consequences on the measurement of selection on nonsynonymous substitutions. We also confirm that codon usage bias is mostly driven by selection on preferred codons. We propose new summary statistics to measure the relative importance of the different evolutionary processes acting on sequences. PMID:27401173

  10. A single determinant dominates the rate of yeast protein evolution.

    PubMed

    Drummond, D Allan; Raval, Alpan; Wilke, Claus O

    2006-02-01

    A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.

  11. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ovacik, Meric A.; Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu; Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogeneticmore » relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.« less

  12. Revisiting the age, evolutionary history and species level diversity of the genus Hydra (Cnidaria: Hydrozoa).

    PubMed

    Schwentner, Martin; Bosch, Thomas C G

    2015-10-01

    The genus Hydra has long served as a model system in comparative immunology, developmental and evolutionary biology. Despite its relevance for fundamental research, Hydra's evolutionary origins and species level diversity are not well understood. Detailed previous studies using molecular techniques identified several clades within Hydra, but how these are related to described species remained largely an open question. In the present study, we compiled all published sequence data for three mitochondrial and nuclear genes (COI, 16S and ITS), complemented these with some new sequence data and delimited main genetic lineages (=hypothetical species) objectively by employing two DNA barcoding approaches. Conclusions on the species status of these main lineages were based on inferences of reproductive isolation. Relevant divergence times within Hydra were estimated based on relaxed molecular clock analyses with four genes (COI, 16S, EF1α and 28S) and four cnidarians fossil calibration points All in all, 28 main lineages could be delimited, many more than anticipated from earlier studies. Because allopatric distributions were common, inferences of reproductive isolation often remained ambiguous but reproductive isolation was rarely refuted. Our results support three major conclusions which are central for Hydra research: (1) species level diversity was underestimated by molecular studies; (2) species affiliations of several crucial 'workhorses' of Hydra evolutionary research were wrong and (3) crown group Hydra originated ∼200mya. Our results demonstrate that the taxonomy of Hydra requires a thorough revision and that evolutionary studies need to take this into account when interspecific comparisons are made. Hydra originated on Pangea. Three of four extant groups evolved ∼70mya ago, possibly on the northern landmass of Laurasia. Consequently, Hydra's cosmopolitan distribution is the result of transcontinental and transoceanic dispersal. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Connecting the dots between genes, biochemistry, and disease susceptibility: systems biology modeling in human genetics.

    PubMed

    Moore, Jason H; Boczko, Erik M; Summar, Marshall L

    2005-02-01

    Understanding how DNA sequence variations impact human health through a hierarchy of biochemical and physiological systems is expected to improve the diagnosis, prevention, and treatment of common, complex human diseases. We have previously developed a hierarchical dynamic systems approach based on Petri nets for generating biochemical network models that are consistent with genetic models of disease susceptibility. This modeling approach uses an evolutionary computation approach called grammatical evolution as a search strategy for optimal Petri net models. We have previously demonstrated that this approach routinely identifies biochemical network models that are consistent with a variety of genetic models in which disease susceptibility is determined by nonlinear interactions between two or more DNA sequence variations. We review here this approach and then discuss how it can be used to model biochemical and metabolic data in the context of genetic studies of human disease susceptibility.

  14. Using Evolutionary Data in Developing Phylogenetic Trees: A Scaffolded Approach with Authentic Data

    ERIC Educational Resources Information Center

    Davenport, K. D.; Milks, Kirstin Jane; Van Tassell, Rebecca

    2015-01-01

    Analyzing evolutionary relationships requires that students have a thorough understanding of evidence and of how scientists use evidence to develop these relationships. In this lesson sequence, students work in groups to process many different lines of evidence of evolutionary relationships between ungulates, then construct a scientific argument…

  15. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds.

    PubMed

    Simkovic, Felix; Thomas, Jens M H; Keegan, Ronan M; Winn, Martyn D; Mayans, Olga; Rigden, Daniel J

    2016-07-01

    For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.

  16. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds

    PubMed Central

    Simkovic, Felix; Thomas, Jens M. H.; Keegan, Ronan M.; Winn, Martyn D.; Mayans, Olga; Rigden, Daniel J.

    2016-01-01

    For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions (‘decoys’), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue–residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing. PMID:27437113

  17. Breaking symmetry: the zebrafish as a model for understanding left-right asymmetry in the developing brain.

    PubMed

    Roussigne, Myriam; Blader, Patrick; Wilson, Stephen W

    2012-03-01

    How does left-right asymmetry develop in the brain and how does the resultant asymmetric circuitry impact on brain function and lateralized behaviors? By enabling scientists to address these questions at the levels of genes, neurons, circuitry and behavior,the zebrafish model system provides a route to resolve the complexity of brain lateralization. In this review, we present the progress made towards characterizing the nature of the gene networks and the sequence of morphogenetic events involved in the asymmetric development of zebrafish epithalamus. In an attempt to integrate the recent extensive knowledge into a working model and to identify the future challenges,we discuss how insights gained at a cellular/developmental level can be linked to the data obtained at a molecular/genetic level. Finally, we present some evolutionary thoughts and discuss how significant discoveries made in zebrafish should provide entry points to better understand the evolutionary origins of brain lateralization.

  18. Genetics on the Fly: A Primer on the Drosophila Model System

    PubMed Central

    Hales, Karen G.; Korey, Christopher A.; Larracuente, Amanda M.; Roberts, David M.

    2015-01-01

    Fruit flies of the genus Drosophila have been an attractive and effective genetic model organism since Thomas Hunt Morgan and colleagues made seminal discoveries with them a century ago. Work with Drosophila has enabled dramatic advances in cell and developmental biology, neurobiology and behavior, molecular biology, evolutionary and population genetics, and other fields. With more tissue types and observable behaviors than in other short-generation model organisms, and with vast genome data available for many species within the genus, the fly’s tractable complexity will continue to enable exciting opportunities to explore mechanisms of complex developmental programs, behaviors, and broader evolutionary questions. This primer describes the organism’s natural history, the features of sequenced genomes within the genus, the wide range of available genetic tools and online resources, the types of biological questions Drosophila can help address, and historical milestones. PMID:26564900

  19. Rooting the archaebacterial tree: the pivotal role of Thermococcus celer in archaebacterial evolution

    NASA Technical Reports Server (NTRS)

    Achenbach-Richter, L.; Gupta, R.; Zillig, W.; Woese, C. R.

    1988-01-01

    The sequence of the 16S ribosomal RNA gene from the archaebacterium Thermococcus celer shows the organism to be related to the methanogenic archaebacteria rather than to its phenotypic counterparts, the extremely thermophilic archaebacteria. This conclusion turns on the position of the root of the archaebacterial phylogenetic tree, however. The problems encountered in rooting this tree are analyzed in detail. Under conditions that suppress evolutionary noise both the parsimony and evolutionary distance methods yield a root location (using a number of eubacterial or eukaryotic outgroup sequences) that is consistent with that determined by an "internal rooting" method, based upon an (approximate) determination of relative evolutionary rates.

  20. Upon Accounting for the Impact of Isoenzyme Loss, Gene Deletion Costs Anticorrelate with Their Evolutionary Rates.

    PubMed

    Jacobs, Christopher; Lambourne, Luke; Xia, Yu; Segrè, Daniel

    2017-01-01

    System-level metabolic network models enable the computation of growth and metabolic phenotypes from an organism's genome. In particular, flux balance approaches have been used to estimate the contribution of individual metabolic genes to organismal fitness, offering the opportunity to test whether such contributions carry information about the evolutionary pressure on the corresponding genes. Previous failure to identify the expected negative correlation between such computed gene-loss cost and sequence-derived evolutionary rates in Saccharomyces cerevisiae has been ascribed to a real biological gap between a gene's fitness contribution to an organism "here and now" and the same gene's historical importance as evidenced by its accumulated mutations over millions of years of evolution. Here we show that this negative correlation does exist, and can be exposed by revisiting a broadly employed assumption of flux balance models. In particular, we introduce a new metric that we call "function-loss cost", which estimates the cost of a gene loss event as the total potential functional impairment caused by that loss. This new metric displays significant negative correlation with evolutionary rate, across several thousand minimal environments. We demonstrate that the improvement gained using function-loss cost over gene-loss cost is explained by replacing the base assumption that isoenzymes provide unlimited capacity for backup with the assumption that isoenzymes are completely non-redundant. We further show that this change of the assumption regarding isoenzymes increases the recall of epistatic interactions predicted by the flux balance model at the cost of a reduction in the precision of the predictions. In addition to suggesting that the gene-to-reaction mapping in genome-scale flux balance models should be used with caution, our analysis provides new evidence that evolutionary gene importance captures much more than strict essentiality.

  1. Inferring the mode of origin of polyploid species from next-generation sequence data.

    PubMed

    Roux, Camille; Pannell, John R

    2015-03-01

    Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels. © 2015 John Wiley & Sons Ltd.

  2. A Systematic Bayesian Integration of Epidemiological and Genetic Data

    PubMed Central

    Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin

    2015-01-01

    Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399

  3. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    PubMed

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  4. Prediction of RNA secondary structures: from theory to models and real molecules

    NASA Astrophysics Data System (ADS)

    Schuster, Peter

    2006-05-01

    RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA molecules can be modelled readily at the secondary structure level and allows prediction of the most stable (mfe) conformations of complexes together with suboptimal states. Cofolding algorithms are important tools for efficient and highly specific primer design in the polymerase chain reaction (PCR) and help to explain the mechanisms of small interference RNA (si-RNA) molecules in gene regulation. The evolutionary optimization of RNA structures is illustrated by the search for a target structure and mimics aptamer selection in evolutionary biotechnology. It occurs typically in steps consisting of short adaptive phases interrupted by long epochs of little or no obvious progress in optimization. During these quasi-stationary epochs the populations are essentially confined to neutral networks where they search for sequences that allow a continuation of the adaptive process. Modelling RNA evolution as a simultaneous process in sequence and shape space provides answers to questions of the optimal population size and mutation rates. Kinetic folding is a stochastic process in conformation space. Exact solutions are derived by direct simulation in the form of trajectory sampling or by solving the master equation. The exact solutions can be approximated straightforwardly by Arrhenius kinetics on barrier trees, which represent simplified versions of conformational energy landscapes. The existence of at least one sequence forming any arbitrarily chosen pair of structures is granted by the intersection theorem. Folding kinetics is the key to understanding and designing multistable RNA molecules or RNA switches. These RNAs form two or more long lived conformations, and conformational changes occur either spontaneously or are induced through binding of small molecules or other biopolymers. RNA switches are found in nature where they act as elements in genetic and metabolic regulation. The reliability of RNA secondary structure prediction is limited by the accuracy with which the empirical parameters can be determined and by principal deficiencies, for example by the lack of energy contributions resulting from tertiary interactions. In addition, native structures may be determined by folding kinetics rather than by thermodynamics. We address the first problem by considering base pair probabilities or base pairing entropies, which are derived from the partition function of conformations. A high base pair probability corresponding to a low pairing entropy is taken as an indicator of a high reliability of prediction. Pseudoknots are discussed as an example of a tertiary interaction that is highly important for RNA function. Moreover, pseudoknot formation is readily incorporated into structure prediction algorithms. Some examples of experimental data on RNA secondary structures that are readily explained using the landscape concept are presented. They deal with (i) properties of RNA molecules with random sequences, (ii) RNA molecules from restricted alphabets, (iii) existence of neutral networks, (iv) shape space covering, (v) riboswitches and (vi) evolution of non-coding RNAs as an example of evolution restricted to neutral networks.

  5. Selective modes determine evolutionary rates, gene compactness and expression patterns in Brassica.

    PubMed

    Guo, Yue; Liu, Jing; Zhang, Jiefu; Liu, Shengyi; Du, Jianchang

    2017-07-01

    It has been well documented that most nuclear protein-coding genes in organisms can be classified into two categories: positively selected genes (PSGs) and negatively selected genes (NSGs). The characteristics and evolutionary fates of different types of genes, however, have been poorly understood. In this study, the rates of nonsynonymous substitution (K a ) and the rates of synonymous substitution (K s ) were investigated by comparing the orthologs between the two sequenced Brassica species, Brassica rapa and Brassica oleracea, and the evolutionary rates, gene structures, expression patterns, and codon bias were compared between PSGs and NSGs. The resulting data show that PSGs have higher protein evolutionary rates, lower synonymous substitution rates, shorter gene length, fewer exons, higher functional specificity, lower expression level, higher tissue-specific expression and stronger codon bias than NSGs. Although the quantities and values are different, the relative features of PSGs and NSGs have been largely verified in the model species Arabidopsis. These data suggest that PSGs and NSGs differ not only under selective pressure (K a /K s ), but also in their evolutionary, structural and functional properties, indicating that selective modes may serve as a determinant factor for measuring evolutionary rates, gene compactness and expression patterns in Brassica. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  6. Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events.

    PubMed

    Wang, Xiyin; Wang, Jingpeng; Jin, Dianchuan; Guo, Hui; Lee, Tae-Ho; Liu, Tao; Paterson, Andrew H

    2015-06-01

    Multiple comparisons among genomes can clarify their evolution, speciation, and functional innovations. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) clades have been published, and their genomic-level comparison is an essential foundation for evolutionary, functional, and translational research. Using a formal and conservative approach, we aligned these genomes. Direct comparison of paralogous gene pairs all duplicated simultaneously reveal striking variation in evolutionary rates among whole genomes, with nucleotide substitution slowest in rice and up to 48% faster in other grasses, adding a new dimension to the value of rice as a grass model. We reconstructed ancestral genome contents for major evolutionary nodes, potentially contributing to understanding the divergence and speciation of grasses. Recent fossil evidence suggests revisions of the estimated dates of key evolutionary events, implying that the pan-grass polyploidization occurred ∼96 million years ago and could not be related to the Cretaceous-Tertiary mass extinction as previously inferred. Adjusted dating to reflect both updated fossil evidence and lineage-specific evolutionary rates suggested that maize subgenome divergence and maize-sorghum divergence were virtually simultaneous, a coincidence that would be explained if polyploidization directly contributed to speciation. This work lays a solid foundation for Poaceae translational genomics. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.

  7. Phenotypic and genotypic expression of self-incompatibility haplotypes in Arabidopsis lyrata suggests unique origin of alleles in different dominance classes.

    PubMed

    Prigoda, Nadia L; Nassuth, Annette; Mable, Barbara K

    2005-07-01

    The highly divergent alleles of the SRK gene in outcrossing Arabidopsis lyrata have provided important insights into the evolutionary history of self-incompatibility (SI) alleles and serve as an ideal model for studies of the evolutionary and molecular interactions between alleles in cell-cell recognition systems in general. One tantalizing question is how new specificities arise in systems that require coordination between male and female components. Allelic recruitment via gene conversion has been proposed as one possibility, based on the division of DNA sequences at the SRK locus into two distinctive groups: (1) sequences whose relationships are not well resolved and display the long branch lengths expected for a gene under balancing selection (Class A); and (2) sequences falling into a well-supported group with shorter branch lengths (Class B) that are closely related to an unlinked paralogous locus. The purpose of this study was to determine if differences in phenotype (site of expression assayed using allele-specific reverse transcription-polymerase chain reaction) or function (dominance relationships assayed through controlled pollinations) accompany the sequence-based classification. Expression of Class A alleles was restricted to floral tissues, as predicted for genes involved in the SI response. In contrast, Class B alleles, despite being tightly linked to the SI phenotype, were unexpectedly expressed in both leaves and floral tissues; the same pattern found for a related unlinked paralogous sequence. Whereas Class A included haplotypes in three different dominance classes, all Class B haplotypes were found to be recessive to all except one Class A haplotype. In addition, mapping of expression and dominance patterns onto an S-domain-based genealogy suggested that allelic dominance may be determined more by evolutionary history than by frequency-dependent selection for lowered dominance as some theories suggest. The possibility that interlocus gene conversion might have contributed to allelic diversity is discussed.

  8. Phylogenetic Network Analysis Revealed the Occurrence of Horizontal Gene Transfer of 16S rRNA in the Genus Enterobacter

    PubMed Central

    Sato, Mitsuharu; Miyazaki, Kentaro

    2017-01-01

    Horizontal gene transfer (HGT) is a ubiquitous genetic event in bacterial evolution, but it seldom occurs for genes involved in highly complex supramolecules (or biosystems), which consist of many gene products. The ribosome is one such supramolecule, but several bacteria harbor dissimilar and/or chimeric 16S rRNAs in their genomes, suggesting the occurrence of HGT of this gene. However, we know little about whether the genes actually experience HGT and, if so, the frequency of such a transfer. This is primarily because the methods currently employed for phylogenetic analysis (e.g., neighbor-joining, maximum likelihood, and maximum parsimony) of 16S rRNA genes assume point mutation-driven tree-shape evolution as an evolutionary model, which is intrinsically inappropriate to decipher the evolutionary history for genes driven by recombination. To address this issue, we applied a phylogenetic network analysis, which has been used previously for detection of genetic recombination in homologous alleles, to the 16S rRNA gene. We focused on the genus Enterobacter, whose phylogenetic relationships inferred by multi-locus sequence alignment analysis and 16S rRNA sequences are incompatible. All 10 complete genomic sequences were retrieved from the NCBI database, in which 71 16S rRNA genes were included. Neighbor-joining analysis demonstrated that the genes residing in the same genomes clustered, indicating the occurrence of intragenomic recombination. However, as suggested by the low bootstrap values, evolutionary relationships between the clusters were uncertain. We then applied phylogenetic network analysis to representative sequences from each cluster. We found three ancestral 16S rRNA groups; the others were likely created through recursive recombination between the ancestors and chimeric descendants. Despite the large sequence changes caused by the recombination events, the RNA secondary structures were conserved. Successive intergenomic and intragenomic recombination thus shaped the evolution of 16S rRNA genes in the genus Enterobacter. PMID:29180992

  9. Using Maximum Entropy to Find Patterns in Genomes

    NASA Astrophysics Data System (ADS)

    Liu, Sophia; Hockenberry, Adam; Lancichinetti, Andrea; Jewett, Michael; Amaral, Luis

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. To accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. This approach can also be easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes. National Institute of General Medical Science, Northwestern University Presidential Fellowship, National Science Foundation, David and Lucile Packard Foundation, Camille Dreyfus Teacher Scholar Award.

  10. Positive evolutionary selection of an HD motif on Alzheimer precursor protein orthologues suggests a functional role.

    PubMed

    Miklós, István; Zádori, Zoltán

    2012-02-01

    HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the "transcription binding site turnover." CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.

  11. Positive Evolutionary Selection of an HD Motif on Alzheimer Precursor Protein Orthologues Suggests a Functional Role

    PubMed Central

    Miklós, István; Zádori, Zoltán

    2012-01-01

    HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the “transcription binding site turnover.” CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs. PMID:22319430

  12. Evolutionary process of deep-sea bathymodiolus mussels.

    PubMed

    Miyazaki, Jun-Ichi; de Oliveira Martins, Leonardo; Fujita, Yuko; Matsumoto, Hiroto; Fujiwara, Yoshihiro

    2010-04-27

    Since the discovery of deep-sea chemosynthesis-based communities, much work has been done to clarify their organismal and environmental aspects. However, major topics remain to be resolved, including when and how organisms invade and adapt to deep-sea environments; whether strategies for invasion and adaptation are shared by different taxa or unique to each taxon; how organisms extend their distribution and diversity; and how they become isolated to speciate in continuous waters. Deep-sea mussels are one of the dominant organisms in chemosynthesis-based communities, thus investigations of their origin and evolution contribute to resolving questions about life in those communities. We investigated worldwide phylogenetic relationships of deep-sea Bathymodiolus mussels and their mytilid relatives by analyzing nucleotide sequences of the mitochondrial cytochrome c oxidase subunit I (COI) and NADH dehydrogenase subunit 4 (ND4) genes. Phylogenetic analysis of the concatenated sequence data showed that mussels of the subfamily Bathymodiolinae from vents and seeps were divided into four groups, and that mussels of the subfamily Modiolinae from sunken wood and whale carcasses assumed the outgroup position and shallow-water modioline mussels were positioned more distantly to the bathymodioline mussels. We provisionally hypothesized the evolutionary history of Bathymodilolus mussels by estimating evolutionary time under a relaxed molecular clock model. Diversification of bathymodioline mussels was initiated in the early Miocene, and subsequently diversification of the groups occurred in the early to middle Miocene. The phylogenetic relationships support the "Evolutionary stepping stone hypothesis," in which mytilid ancestors exploited sunken wood and whale carcasses in their progressive adaptation to deep-sea environments. This hypothesis is also supported by the evolutionary transition of symbiosis in that nutritional adaptation to the deep sea proceeded from extracellular to intracellular symbiotic states in whale carcasses. The estimated evolutionary time suggests that the mytilid ancestors were able to exploit whales during adaptation to the deep sea.

  13. Evolutionary characterization of the West Nile Virus complete genome.

    PubMed

    Gray, R R; Veras, N M C; Santos, L A; Salemi, M

    2010-07-01

    The spatial dynamics of the West Nile Virus epidemic in North America are largely unknown. Previous studies that investigated the evolutionary history of the virus used sequence data from the structural genes (prM and E); however, these regions may lack phylogenetic information and obscure true evolutionary relationships. This study systematically evaluated the evolutionary patterns in the eleven genes of the WNV genome in order to determine which region(s) were most phylogenetically informative. We found that while the E region lacks resolution and can potentially result in misleading conclusions, the full NS3 or NS5 regions have strong phylogenetic signal. Furthermore, we show that geographic structure of WNV infection within the US is more pronounced than previously reported in studies that used the structural genes. We conclude that future evolutionary studies should focus on NS3 and NS5 in order to maximize the available sequences while retaining maximal interpretative power to infer temporal and geographic trends among WNV strains. Copyright 2010 Elsevier Inc. All rights reserved.

  14. The impact of age, biogenesis, and genomic clustering on Drosophila microRNA evolution

    PubMed Central

    Mohammed, Jaaved; Flynt, Alex S.; Siepel, Adam; Lai, Eric C.

    2013-01-01

    The molecular evolutionary signatures of miRNAs inform our understanding of their emergence, biogenesis, and function. The known signatures of miRNA evolution have derived mostly from the analysis of deeply conserved, canonical loci. In this study, we examine the impact of age, biogenesis pathway, and genomic arrangement on the evolutionary properties of Drosophila miRNAs. Crucial to the accuracy of our results was our curation of high-quality miRNA alignments, which included nearly 150 corrections to ortholog calls and nucleotide sequences of the global 12-way Drosophilid alignments currently available. Using these data, we studied primary sequence conservation, normalized free-energy values, and types of structure-preserving substitutions. We expand upon common miRNA evolutionary patterns that reflect fundamental features of miRNAs that are under functional selection. We observe that melanogaster-subgroup-specific miRNAs, although recently emerged and rapidly evolving, nonetheless exhibit evolutionary signatures that are similar to well-conserved miRNAs and distinct from other structured noncoding RNAs and bulk conserved non-miRNA hairpins. This provides evidence that even young miRNAs may be selected for regulatory activities. More strikingly, we observe that mirtrons and clustered miRNAs both exhibit distinct evolutionary properties relative to solo, well-conserved miRNAs, even after controlling for sequence depth. These studies highlight the previously unappreciated impact of biogenesis strategy and genomic location on the evolutionary dynamics of miRNAs, and affirm that miRNAs do not evolve as a unitary class. PMID:23882112

  15. Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

    PubMed Central

    Dröge, J.; Gregor, I.; McHardy, A. C.

    2015-01-01

    Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150

  16. Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

    PubMed

    Oliani, L C; Lidani, K C F; Gabriel, J E

    2015-10-16

    MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways.

  17. The Ditylenchus destructor genome provides new insights into the evolution of plant parasitic nematodes

    PubMed Central

    Zheng, Jinshui; Peng, Donghai; Chen, Ling; Liu, Hualin; Chen, Feng; Xu, Mengci; Ju, Shouyong; Ruan, Lifang

    2016-01-01

    Plant-parasitic nematodes were found in 4 of the 12 clades of phylum Nematoda. These nematodes in different clades may have originated independently from their free-living fungivorous ancestors. However, the exact evolutionary process of these parasites is unclear. Here, we sequenced the genome sequence of a migratory plant nematode, Ditylenchus destructor. We performed comparative genomics among the free-living nematode, Caenorhabditis elegans and all the plant nematodes with genome sequences available. We found that, compared with C. elegans, the core developmental control processes underwent heavy reduction, though most signal transduction pathways were conserved. We also found D. destructor contained more homologies of the key genes in the above processes than the other plant nematodes. We suggest that Ditylenchus spp. may be an intermediate evolutionary history stage from free-living nematodes that feed on fungi to obligate plant-parasitic nematodes. Based on the facts that D. destructor can feed on fungi and has a relatively short life cycle, and that it has similar features to both C. elegans and sedentary plant-parasitic nematodes from clade 12, we propose it as a new model to study the biology, biocontrol of plant nematodes and the interaction between nematodes and plants. PMID:27466450

  18. Phylotranscriptomic consolidation of the jawed vertebrate timetree.

    PubMed

    Irisarri, Iker; Baurain, Denis; Brinkmann, Henner; Delsuc, Frédéric; Sire, Jean-Yves; Kupfer, Alexander; Petersen, Jörn; Jarek, Michael; Meyer, Axel; Vences, Miguel; Philippe, Hervé

    2017-09-01

    Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.

  19. A Molecular Phylogeny of Hemiptera Inferred from Mitochondrial Genome Sequences

    PubMed Central

    Song, Nan; Liang, Ai-Ping; Bu, Cui-Ping

    2012-01-01

    Classically, Hemiptera is comprised of two suborders: Homoptera and Heteroptera. Homoptera includes Cicadomorpha, Fulgoromorpha and Sternorrhyncha. However, according to previous molecular phylogenetic studies based on 18S rDNA, Fulgoromorpha has a closer relationship to Heteroptera than to other hemipterans, leaving Homoptera as paraphyletic. Therefore, the position of Fulgoromorpha is important for studying phylogenetic structure of Hemiptera. We inferred the evolutionary affiliations of twenty-five superfamilies of Hemiptera using mitochondrial protein-coding genes and rRNAs. We sequenced three mitogenomes, from Pyrops candelaria, Lycorma delicatula and Ricania marginalis, representing two additional families in Fulgoromorpha. Pyrops and Lycorma are representatives of an additional major family Fulgoridae in Fulgoromorpha, whereas Ricania is a second representative of the highly derived clade Ricaniidae. The organization and size of these mitogenomes are similar to those of the sequenced fulgoroid species. Our consensus phylogeny of Hemiptera largely supported the relationships (((Fulgoromorpha,Sternorrhyncha),Cicadomorpha),Heteroptera), and thus supported the classic phylogeny of Hemiptera. Selection of optimal evolutionary models (exclusion and inclusion of two rRNA genes or of third codon positions of protein-coding genes) demonstrated that rapidly evolving and saturated sites should be removed from the analyses. PMID:23144967

  20. Germline transformation of the butterfly Bicyclus anynana.

    PubMed

    Marcus, Jeffrey M; Ramos, Diane M; Monteiro, Antónia

    2004-08-07

    Ecological and evolutionary theory has frequently been inspired by the diversity of colour patterns on the wings of butterflies. More recently, these varied patterns have also become model systems for studying the evolution of developmental mechanisms. A technique that will facilitate our understanding of butterfly colour-pattern development is germline transformation. Germline transformation permits functional tests of candidate gene products and of cis-regulatory regions, and provides a means of generating new colour-pattern mutants by insertional mutagenesis. We report the successful transformation of the African satyrid butterfly Bicyclus anynana with two different transposable element vectors, Hermes and piggyBac, each carrying EGFP coding sequences driven by the 3XP3 synthetic enhancer that drives gene expression in the eyes. Candidate lines identified by screening for EGFP in adult eyes were later confirmed by PCR amplification of a fragment of the EGFP coding sequence from genomic DNA. Flanking DNA surrounding the insertions was amplified by inverse PCR and sequenced. Transformation rates were 5% for piggyBac and 10.2% for Hermes. Ultimately, the new data generated by these techniques may permit an integrated understanding of the developmental genetics of colour-pattern formation and of the ecological and evolutionary processes in which these patterns play a role.

  1. Inferring phylogeny and speciation of Gymnosporangium species, and their coevolution with host plants

    PubMed Central

    Zhao, Peng; Liu, Fang; Li, Ying-Ming; Cai, Lei

    2016-01-01

    Gymnosporangium species (Pucciniaceae, Pucciniales) cause serious diseases and significant economic losses to apple cultivars. Most of the reported species are heteroecious and complete their life cycles on two different plant hosts belonging to two unrelated genera, i.e. Juniperus and Malus. However, the phylogenetic relationships among Gymnosporangium species and the evolutionary history of Gymnosporangium on its aecial and telial hosts were still undetermined. In this study, we recognized species based on rDNA sequence data by using coalescent method of generalized mixed Yule-coalescent (GMYC) and Poisson Tree Processes (PTP) models. The evolutionary relationships of Gymnosporangium species and their hosts were investigated by comparing the cophylogenetic analyses of Gymnosporangium species with Malus species and Juniperus species, respectively. The concordant results of GMYC and PTP analyses recognized 14 species including 12 known species and two undescribed species. In addition, host alternations of 10 Gymnosporangium species were uncovered by linking the derived sequences between their aecial and telial stages. This study revealed the evolutionary process of Gymnosporangium species, and clarified that the aecial hosts played more important roles than telial hosts in the speciation of Gymnosporangium species. Host switch, losses, duplication and failure to divergence all contributed to the speciation of Gymnosporangium species. PMID:27385413

  2. Synonymous Mutations at the Beginning of the Influenza A Virus Hemagglutinin Gene Impact Experimental Fitness.

    PubMed

    Canale, Aneth S; Venev, Sergey V; Whitfield, Troy W; Caffrey, Daniel R; Marasco, Wayne A; Schiffer, Celia A; Kowalik, Timothy F; Jensen, Jeffrey D; Finberg, Robert W; Zeldovich, Konstantin B; Wang, Jennifer P; Bolon, Daniel N A

    2018-04-13

    The fitness effects of synonymous mutations can provide insights into biological and evolutionary mechanisms. We analyzed the experimental fitness effects of all single-nucleotide mutations, including synonymous substitutions, at the beginning of the influenza A virus hemagglutinin (HA) gene. Many synonymous substitutions were deleterious both in bulk competition and for individually isolated clones. Investigating protein and RNA levels of a subset of individually expressed HA variants revealed that multiple biochemical properties contribute to the observed experimental fitness effects. Our results indicate that a structural element in the HA segment viral RNA may influence fitness. Examination of naturally evolved sequences in human hosts indicates a preference for the unfolded state of this structural element compared to that found in swine hosts. Our overall results reveal that synonymous mutations may have greater fitness consequences than indicated by simple models of sequence conservation, and we discuss the implications of this finding for commonly used evolutionary tests and analyses. Copyright © 2018. Published by Elsevier Ltd.

  3. Shedding new light on opsin evolution

    PubMed Central

    Porter, Megan L.; Blasic, Joseph R.; Bok, Michael J.; Cameron, Evan G.; Pringle, Thomas; Cronin, Thomas W.; Robinson, Phyllis R.

    2012-01-01

    Opsin proteins are essential molecules in mediating the ability of animals to detect and use light for diverse biological functions. Therefore, understanding the evolutionary history of opsins is key to understanding the evolution of light detection and photoreception in animals. As genomic data have appeared and rapidly expanded in quantity, it has become possible to analyse opsins that functionally and histologically are less well characterized, and thus to examine opsin evolution strictly from a genetic perspective. We have incorporated these new data into a large-scale, genome-based analysis of opsin evolution. We use an extensive phylogeny of currently known opsin sequence diversity as a foundation for examining the evolutionary distributions of key functional features within the opsin clade. This new analysis illustrates the lability of opsin protein-expression patterns, site-specific functionality (i.e. counterion position) and G-protein binding interactions. Further, it demonstrates the limitations of current model organisms, and highlights the need for further characterization of many of the opsin sequence groups with unknown function. PMID:22012981

  4. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  5. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  6. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines

    Treesearch

    Michael DeGiorgio; John Syring; Andrew J. Eckert; Aaron Liston; Richard Cronn; David B. Neale; Noah A. Rosenberg

    2014-01-01

    Background: As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models,...

  7. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution

    PubMed Central

    Modahl, Cassandra M.; Mackessy, Stephen P.

    2016-01-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides access to cDNA sequences in the absence of living specimens, even from commercial venom sources, to evaluate important regional differences in venom composition and to study snake venom protein evolution. PMID:27280639

  8. Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time

    PubMed Central

    Lemey, Philippe; Rambaut, Andrew; Welch, John J.; Suchard, Marc A.

    2010-01-01

    Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features. PMID:20203288

  9. The origin, current diversity and future conservation of the modern lion (Panthera leo)

    PubMed Central

    Barnett, Ross; Yamaguchi, Nobuyuki; Barnes, Ian; Cooper, Alan

    2006-01-01

    Understanding the phylogeographic processes affecting endangered species is crucial both to interpreting their evolutionary history and to the establishment of conservation strategies. Lions provide a key opportunity to explore such processes; however, a lack of genetic diversity and shortage of suitable samples has until now hindered such investigation. We used mitochondrial control region DNA (mtDNA) sequences to investigate the phylogeographic history of modern lions, using samples from across their entire range. We find the sub-Saharan African lions are basal among modern lions, supporting a single African origin model of modern lion evolution, equivalent to the ‘recent African origin’ model of modern human evolution. We also find the greatest variety of mtDNA haplotypes in the centre of Africa, which may be due to the distribution of physical barriers and continental-scale habitat changes caused by Pleistocene glacial oscillations. Our results suggest that the modern lion may currently consist of three geographic populations on the basis of their recent evolutionary history: North African–Asian, southern African and middle African. Future conservation strategies should take these evolutionary subdivisions into consideration. PMID:16901830

  10. nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data.

    PubMed

    Zhang, Changsheng; Cai, Hongmin; Huang, Jingying; Song, Yan

    2016-09-17

    Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data.

  11. Genomic investigations of evolutionary dynamics and epistasis in microbial evolution experiments.

    PubMed

    Jerison, Elizabeth R; Desai, Michael M

    2015-12-01

    Microbial evolution experiments enable us to watch adaptation in real time, and to quantify the repeatability and predictability of evolution by comparing identical replicate populations. Further, we can resurrect ancestral types to examine changes over evolutionary time. Until recently, experimental evolution has been limited to measuring phenotypic changes, or to tracking a few genetic markers over time. However, recent advances in sequencing technology now make it possible to extensively sequence clones or whole-population samples from microbial evolution experiments. Here, we review recent work exploiting these techniques to understand the genomic basis of evolutionary change in experimental systems. We first focus on studies that analyze the dynamics of genome evolution in microbial systems. We then survey work that uses observations of sequence evolution to infer aspects of the underlying fitness landscape, concentrating on the epistatic interactions between mutations and the constraints these interactions impose on adaptation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  13. The development of the red giant branch. I - Theoretical evolutionary sequences

    NASA Technical Reports Server (NTRS)

    Sweigart, Allen V.; Greggio, Laura; Renzini, Alvio

    1989-01-01

    A grid of 100 evolutionary sequences extending from the zero-age main sequence to the onset of helium burning has been computed for stellar masses between 1.4 and 3.4 solar masses, helium abundances of 0.20 and 0.30, and heavy-element abundances of 0.004, 0.01, and 0.04. Using these computations the transition in the morphology of the red giant branch (RGB) between low-mass stars, which have an extended and luminous first RGB phase prior to helium ignition, and intermediate-mass stars, which do not, is investigated. Extensive tabulations of the numerical results are provided to aid in applying these sequences. The effects of the first dredge-up on the surface helium and CNO abundances of the sequences is discussed.

  14. Population genetics and the evolution of geographic range limits in an annual plant.

    PubMed

    Moeller, David A; Geber, Monica A; Tiffin, Peter

    2011-10-01

    Abstract Theoretical models of species' geographic range limits have identified both demographic and evolutionary mechanisms that prevent range expansion. Stable range limits have been paradoxical for evolutionary biologists because they represent locations where populations chronically fail to respond to selection. Distinguishing among the proposed causes of species' range limits requires insight into both current and historical population dynamics. The tools of molecular population genetics provide a window into the stability of range limits, historical demography, and rates of gene flow. Here we evaluate alternative range limit models using a multilocus data set based on DNA sequences and microsatellites along with field demographic data from the annual plant Clarkia xantiana ssp. xantiana. Our data suggest that central and peripheral populations have very large historical and current effective population sizes and that there is little evidence for population size changes or bottlenecks associated with colonization in peripheral populations. Whereas range limit populations appear to have been stable, central populations exhibit a signature of population expansion and have contributed asymmetrically to the genetic diversity of peripheral populations via migration. Overall, our results discount strictly demographic models of range limits and more strongly support evolutionary genetic models of range limits, where adaptation is prevented by a lack of genetic variation or maladaptive gene flow.

  15. Evolution of microbes and viruses: a paradigm shift in evolutionary biology?

    PubMed Central

    Koonin, Eugene V.; Wolf, Yuri I.

    2012-01-01

    When Charles Darwin formulated the central principles of evolutionary biology in the Origin of Species in 1859 and the architects of the Modern Synthesis integrated these principles with population genetics almost a century later, the principal if not the sole objects of evolutionary biology were multicellular eukaryotes, primarily animals and plants. Before the advent of efficient gene sequencing, all attempts to extend evolutionary studies to bacteria have been futile. Sequencing of the rRNA genes in thousands of microbes allowed the construction of the three- domain “ribosomal Tree of Life” that was widely thought to have resolved the evolutionary relationships between the cellular life forms. However, subsequent massive sequencing of numerous, complete microbial genomes revealed novel evolutionary phenomena, the most fundamental of these being: (1) pervasive horizontal gene transfer (HGT), in large part mediated by viruses and plasmids, that shapes the genomes of archaea and bacteria and call for a radical revision (if not abandonment) of the Tree of Life concept, (2) Lamarckian-type inheritance that appears to be critical for antivirus defense and other forms of adaptation in prokaryotes, and (3) evolution of evolvability, i.e., dedicated mechanisms for evolution such as vehicles for HGT and stress-induced mutagenesis systems. In the non-cellular part of the microbial world, phylogenomics and metagenomics of viruses and related selfish genetic elements revealed enormous genetic and molecular diversity and extremely high abundance of viruses that come across as the dominant biological entities on earth. Furthermore, the perennial arms race between viruses and their hosts is one of the defining factors of evolution. Thus, microbial phylogenomics adds new dimensions to the fundamental picture of evolution even as the principle of descent with modification discovered by Darwin and the laws of population genetics remain at the core of evolutionary biology. PMID:22993722

  16. Possibility that the far ultraviolet excess in M31 is due to main sequence stars

    NASA Technical Reports Server (NTRS)

    Tinsley, B. M.

    1972-01-01

    The far ultraviolet excess in the central region of M31, observed by OAO-2, could be due to young main sequence stars. More than enough such stars are present in the model for the M31 inner disk population derived by Tinsley and Spinrad (1971) to match line- and color-indices at longer wavelengths. If the far ultraviolet radiation of typical galaxies arises from young stars, the theoretical ultraviolet background is enhanced greatly by evolutionary effects. For evolution at the rate of Tinsley and Spinrad's model for M31, or of Arnett's (1971) linear model for our galaxy, the enhancement is a factor 2.5 to 14, depending on the Hubble constant and the spectrum at wavelengths below 1700 A.

  17. The first complete chloroplast genome of the Genistoid legume Lupinus luteus: evidence for a novel major lineage-specific rearrangement and new insights regarding plastome evolution in the legume family.

    PubMed

    Martin, Guillaume E; Rousseau-Gueutin, Mathieu; Cordonnier, Solenn; Lima, Oscar; Michon-Coudouel, Sophie; Naquin, Delphine; de Carvalho, Julie Ferreira; Aïnouche, Malika; Salmon, Armel; Aïnouche, Abdelkader

    2014-06-01

    To date chloroplast genomes are available only for members of the non-protein amino acid-accumulating clade (NPAAA) Papilionoid lineages in the legume family (i.e. Millettioids, Robinoids and the 'inverted repeat-lacking clade', IRLC). It is thus very important to sequence plastomes from other lineages in order to better understand the unusual evolution observed in this model flowering plant family. To this end, the plastome of a lupine species, Lupinus luteus, was sequenced to represent the Genistoid lineage, a noteworthy but poorly studied legume group. The plastome of L. luteus was reconstructed using Roche-454 and Illumina next-generation sequencing. Its structure, repetitive sequences, gene content and sequence divergence were compared with those of other Fabaceae plastomes. PCR screening and sequencing were performed in other allied legumes in order to determine the origin of a large inversion identified in L. luteus. The first sequenced Genistoid plastome (L. luteus: 155 894 bp) resulted in the discovery of a 36-kb inversion, embedded within the already known 50-kb inversion in the large single-copy (LSC) region of the Papilionoideae. This inversion occurs at the base or soon after the Genistoid emergence, and most probably resulted from a flip-flop recombination between identical 29-bp inverted repeats within two trnS genes. Comparative analyses of the chloroplast gene content of L. luteus vs. Fabaceae and extra-Fabales plastomes revealed the loss of the plastid rpl22 gene, and its functional relocation to the nucleus was verified using lupine transcriptomic data. An investigation into the evolutionary rate of coding and non-coding sequences among legume plastomes resulted in the identification of remarkably variable regions. This study resulted in the discovery of a novel, major 36-kb inversion, specific to the Genistoids. Chloroplast mutational hotspots were also identified, which contain novel and potentially informative regions for molecular evolutionary studies at various taxonomic levels in the legumes. Taken together, the results provide new insights into the evolutionary landscape of the legume plastome. © The Author 2014. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  19. Decontaminate feature for tracking: adaptive tracking via evolutionary feature subset

    NASA Astrophysics Data System (ADS)

    Liu, Qiaoyuan; Wang, Yuru; Yin, Minghao; Ren, Jinchang; Li, Ruizhi

    2017-11-01

    Although various visual tracking algorithms have been proposed in the last 2-3 decades, it remains a challenging problem for effective tracking with fast motion, deformation, occlusion, etc. Under complex tracking conditions, most tracking models are not discriminative and adaptive enough. When the combined feature vectors are inputted to the visual models, this may lead to redundancy causing low efficiency and ambiguity causing poor performance. An effective tracking algorithm is proposed to decontaminate features for each video sequence adaptively, where the visual modeling is treated as an optimization problem from the perspective of evolution. Every feature vector is compared to a biological individual and then decontaminated via classical evolutionary algorithms. With the optimized subsets of features, the "curse of dimensionality" has been avoided while the accuracy of the visual model has been improved. The proposed algorithm has been tested on several publicly available datasets with various tracking challenges and benchmarked with a number of state-of-the-art approaches. The comprehensive experiments have demonstrated the efficacy of the proposed methodology.

  20. MicroRNA categorization using sequence motifs and k-mers.

    PubMed

    Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens

    2017-03-14

    Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.

  1. The Spectral Energy Distribution of the Earliest Phases of Massive Star Formation from the Spizter and Herschel Archives

    NASA Astrophysics Data System (ADS)

    Klein, Randolf; Looney, Leslie; Henning, Thomas; Chakrabarti, Sukanya; Shenoy, Sachin

    2015-08-01

    Infrared Dark Clouds (IRDCs) are very good candidates for the earliest phases of massive star formation, but can only be found in regions with high infrared background. We have searched for early phases among cold and massive (M>100M⊙) cloud cores by selecting cores from millimeter continuum surveys (Faundez et al. 2004, Sridharan et al. 2005, Klein et al. 2005, Beltran et al. 2006) without associations at short wavelengths. We compared the millimeter continuum peak positions with IR and radio catalogs (2MASS, MSX, IRAS, and NVSS) and excluded cores that had sources associated with the cores' peaks. We compiled a list of 173 cores in over 117 regions that are candidates for very early phases of Massive Star Formation (MSF). Now with the Spitzer and Herschel archives, these cores can be characterized further. The GLIMPSE and MIPSGAL programs alone covered 86 of these regions. The Herschel Archive adds even longer wavelengths. We are compiling this data set to construct the complete spectral energy distribution (SED) in the mid- and far-infrared with good spatial resolution and broad spectral coverage. This allow us to disentangle the complex regions and model the SED of the deeply embedded protostars/clusters.We will be presenting the IR properties of all cores and their embedded source, attempt a characterization, and order the cores in an evolutionary sequence. The resulting properties can be compared to e.g. IRDCs, a class of objects suggested to be the earliest stages of MSF. With the relative large number of cores, we can try to answer questions like: How homogeneous or diverse are our regions in terms of their evolutionary stage? Where do our embedded sources fit in the evolutionary sequence of IRDCs, hot molecular cores, ultra-compact HII regions, etc? How is the MSF shaping the environment and vice versa? Can we extrapolate to the initial conditions of MSF using our evolutionary sequence?

  2. Comparative and Evolutionary Analysis of Grass Pollen Allergens Using Brachypodium distachyon as a Model System.

    PubMed

    Sharma, Akanksha; Sharma, Niharika; Bhalla, Prem; Singh, Mohan

    2017-01-01

    Comparative genomics have facilitated the mining of biological information from a genome sequence, through the detection of similarities and differences with genomes of closely or more distantly related species. By using such comparative approaches, knowledge can be transferred from the model to non-model organisms and insights can be gained in the structural and evolutionary patterns of specific genes. In the absence of sequenced genomes for allergenic grasses, this study was aimed at understanding the structure, organisation and expression profiles of grass pollen allergens using the genomic data from Brachypodium distachyon as it is phylogenetically related to the allergenic grasses. Combining genomic data with the anther RNA-Seq dataset revealed 24 pollen allergen genes belonging to eight allergen groups mapping on the five chromosomes in B. distachyon. High levels of anther-specific expression profiles were observed for the 24 identified putative allergen-encoding genes in Brachypodium. The genomic evidence suggests that gene encoding the group 5 allergen, the most potent trigger of hay fever and allergic asthma originated as a pollen specific orphan gene in a common grass ancestor of Brachypodium and Triticiae clades. Gene structure analysis showed that the putative allergen-encoding genes in Brachypodium either lack or contain reduced number of introns. Promoter analysis of the identified Brachypodium genes revealed the presence of specific cis-regulatory sequences likely responsible for high anther/pollen-specific expression. With the identification of putative allergen-encoding genes in Brachypodium, this study has also described some important plant gene families (e.g. expansin superfamily, EF-Hand family, profilins etc) for the first time in the model plant Brachypodium. Altogether, the present study provides new insights into structural characterization and evolution of pollen allergens and will further serve as a base for their functional characterization in related grass species.

  3. Assessing fluctuating evolutionary pressure in yeast and mammal evolutionary rate covariation using bioinformatics of meiotic protein genetic sequences

    NASA Astrophysics Data System (ADS)

    Dehipawala, Sunil; Nguyen, A.; Tremberger, G.; Cheung, E.; Holden, T.; Lieberman, D.; Cheung, T.

    2013-09-01

    The evolutionary rate co-variation in meiotic proteins has been reported for yeast and mammal using phylogenic branch lengths which assess retention, duplication and mutation. The bioinformatics of the corresponding DNA sequences could be classified as a diagram of fractal dimension and Shannon entropy. Results from biomedical gene research provide examples on the diagram methodology. The identification of adaptive selection using entropy marker and functional-structural diversity using fractal dimension would support a regression analysis where the coefficient of determination would serve as evolutionary pathway marker for DNA sequences and be an important component in the astrobiology community. Comparisons between biomedical genes such as EEF2 (elongation factor 2 human, mouse, etc), WDR85 in epigenetics, HAR1 in human specificity, clinical trial targeted cancer gene CD47, SIRT6 in spermatogenesis, and HLA-C in mosquito bite immunology demonstrate the diagram classification methodology. Comparisons to the SEPT4-XIAP pair in stem cell apoptosis, testesexpressed taste genes TAS1R3-GNAT3 pair, and amyloid beta APLP1-APLP2 pair with the yeast-mammal DNA sequences for meiotic proteins RAD50-MRE11 pair and NCAPD2-ICK pair have accounted for the observed fluctuating evolutionary pressure systematically. Regression with high R-sq values or a triangular-like cluster pattern for concordant pairs in co-variation among the studied species could serve as evidences for the possible location of common ancestors in the entropy-fractal dimension diagram, consistent with an example of the human-chimp common ancestor study using the FOXP2 regulated genes reported in human fetal brain study. The Deinococcus radiodurans R1 Rad-A could be viewed as an outlier in the RAD50 diagram and also in the free energy versus fractal dimension regression Cook's distance, consistent with a non-Earth source for this radiation resistant bacterium. Convergent and divergent fluctuating evolutionary pressure could be studied with extension to genetic sequences in organisms in possible astrobiology conditions, with the assumption that the continuation of a book of life would require meiotic proteins everywhere in the universe.

  4. Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

    PubMed

    Galián, José A; Rosato, Marcela; Rosselló, Josep A

    2014-03-01

    Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.

  5. An algebraic hypothesis about the primeval genetic code architecture.

    PubMed

    Sánchez, Robersy; Grau, Ricardo

    2009-09-01

    A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.

  6. Evolutionary genomics and HIV restriction factors.

    PubMed

    Pyndiah, Nitisha; Telenti, Amalio; Rausell, Antonio

    2015-03-01

    To provide updated insights into innate antiviral immunity and highlight prototypical evolutionary features of well characterized HIV restriction factors. Recently, a new HIV restriction factor, Myxovirus resistance 2, has been discovered and the region/residue responsible for its activity identified using an evolutionary approach. Furthermore, IFI16, an innate immunity protein known to sense several viruses, has been shown to contribute to the defense to HIV-1 by causing cell death upon sensing HIV-1 DNA. Restriction factors against HIV show characteristic signatures of positive selection. Different patterns of accelerated sequence evolution can distinguish antiviral strategies--offense or defence--as well as the level of specificity of the antiviral properties. Sequence analysis of primate orthologs of restriction factors serves to localize functional domains and sites responsible for antiviral action. We use recent discoveries to illustrate how evolutionary genomic analyses help identify new antiviral genes and their mechanisms of action.

  7. Toxin structures as evolutionary tools: Using conserved 3D folds to study the evolution of rapidly evolving peptides.

    PubMed

    Undheim, Eivind A B; Mobli, Mehdi; King, Glenn F

    2016-06-01

    Three-dimensional (3D) structures have been used to explore the evolution of proteins for decades, yet they have rarely been utilized to study the molecular evolution of peptides. Here, we highlight areas in which 3D structures can be particularly useful for studying the molecular evolution of peptide toxins. Although we focus our discussion on animal toxins, including one of the most widespread disulfide-rich peptide folds known, the inhibitor cystine knot, our conclusions should be widely applicable to studies of the evolution of disulfide-constrained peptides. We show that conserved 3D folds can be used to identify evolutionary links and test hypotheses regarding the evolutionary origin of peptides with extremely low sequence identity; construct accurate multiple sequence alignments; and better understand the evolutionary forces that drive the molecular evolution of peptides. Also watch the video abstract. © 2016 WILEY Periodicals, Inc.

  8. Heavy-element yields and abundances of asymptotic giant branch models with a Small Magellanic Cloud metallicity

    NASA Astrophysics Data System (ADS)

    Karakas, Amanda I.; Lugaro, Maria; Carlos, Marília; Cseh, Borbála; Kamath, Devika; García-Hernández, D. A.

    2018-06-01

    We present new theoretical stellar yields and surface abundances for asymptotic giant branch (AGB) models with a metallicity appropriate for stars in the Small Magellanic Cloud (SMC, Z = 0.0028, [Fe/H] ≈ -0.7). New evolutionary sequences and post-processing nucleosynthesis results are presented for initial masses between 1 and 7 M⊙, where the 7 M⊙ is a super-AGB star with an O-Ne core. Models above 1.15 M⊙ become carbon rich during the AGB, and hot bottom burning begins in models M ≥ 3.75 M⊙. We present stellar surface abundances as a function of thermal pulse number for elements between C to Bi and for a selection of isotopic ratios for elements up to Fe and Ni (e.g. 12C/13C), which can be compared to observations. The integrated stellar yields are presented for each model in the grid for hydrogen, helium, and all stable elements from C to Bi. We present evolutionary sequences of intermediate-mass models between 4 and 7 M⊙ and nucleosynthesis results for three masses (M = 3.75, 5, and 7 M⊙) including s-process elements for two widely used AGB mass-loss prescriptions. We discuss our new models in the context of evolved AGB and post-AGB stars in the SMCs, barium stars in our Galaxy, the composition of Galactic globular clusters including Mg isotopes with a similar metallicity to our models, and to pre-solar grains which may have an origin in metal-poor AGB stars.

  9. Secuencias evolutivas e isocronas para estrellas de baja masa e intermedia

    NASA Astrophysics Data System (ADS)

    Panei, J.; Baume, G.

    2016-08-01

    We present theoretical evolutionary sequences for low- and intermediate-mass stars. The masses calculated range from 1.7 to 10 M. The initial chemical composition is . In addition, we have taken into account a nuclear network with 17 isotopes and 34 nuclear reactions. With respect to the mix, we considered overshooting with a parameter . The evolutionary calculations were initialized from the region of instability of Hayashi, in order to calculate isochrones of pre-sequence, too.

  10. Characterization of irritans mariner-like elements in the olive fruit fly Bactrocera oleae (Diptera: Tephritidae): evolutionary implications.

    PubMed

    Ben Lazhar-Ajroud, Wafa; Caruso, Aurore; Mezghani, Maha; Bouallegue, Maryem; Tastard, Emmanuelle; Denis, Françoise; Rouault, Jacques-Deric; Makni, Hanem; Capy, Pierre; Chénais, Benoît; Makni, Mohamed; Casse, Nathalie

    2016-08-01

    Genomic variation among species is commonly driven by transposable element (TE) invasion; thus, the pattern of TEs in a genome allows drawing an evolutionary history of the studied species. This paper reports in vitro and in silico detection and characterization of irritans mariner-like elements (MLEs) in the genome and transcriptome of Bactrocera oleae (Rossi) (Diptera: Tephritidae). Eleven irritans MLE sequences have been isolated in vitro using terminal inverted repeats (TIRs) as primers, and 215 have been extracted in silico from the sequenced genome of B. oleae. Additionally, the sequenced genomes of Bactrocera tryoni (Froggatt) and Bactrocera cucurbitae (Diptera: Tephritidae) have been explored to identify irritans MLEs. A total of 129 sequences from B. tryoni have been extracted, while the genome of B. cucurbitae appears probably devoid of irritans MLEs. All detected irritans MLEs are defective due to several mutations and are clustered together in a monophyletic group suggesting a common ancestor. The evolutionary history and dynamics of these TEs are discussed in relation with the phylogenetic distribution of their hosts. The knowledge on the structure, distribution, dynamic, and evolution of irritans MLEs in Bactrocera species contributes to the understanding of both their evolutionary history and the invasion history of their hosts. This could also be the basis for genetic control strategies using transposable elements.

  11. Genomic V exons from whole genome shotgun data in reptiles.

    PubMed

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  12. Evolutionary and biotechnology implications of plastid genome variation in the inverted-repeat-lacking clade of legumes.

    PubMed

    Sabir, Jamal; Schwarz, Erika; Ellison, Nicholas; Zhang, Jin; Baeshen, Nabih A; Mutwakil, Muhammed; Jansen, Robert; Ruhlman, Tracey

    2014-08-01

    Land plant plastid genomes (plastomes) provide a tractable model for evolutionary study in that they are relatively compact and gene dense. Among the groups that display an appropriate level of variation for structural features, the inverted-repeat-lacking clade (IRLC) of papilionoid legumes presents the potential to advance general understanding of the mechanisms of genomic evolution. Here, are presented six complete plastome sequences from economically important species of the IRLC, a lineage previously represented by only five completed plastomes. A number of characters are compared across the IRLC including gene retention and divergence, synteny, repeat structure and functional gene transfer to the nucleus. The loss of clpP intron 2 was identified in one newly sequenced member of IRLC, Glycyrrhiza glabra. Using deeply sequenced nuclear transcriptomes from two species helped clarify the nature of the functional transfer of accD to the nucleus in Trifolium, which likely occurred in the lineage leading to subgenus Trifolium. Legumes are second only to cereal crops in agricultural importance based on area harvested and total production. Genetic improvement via plastid transformation of IRLC crop species is an appealing proposition. Comparative analyses of intergenic spacer regions emphasize the need for complete genome sequences for developing transformation vectors for plastid genetic engineering of legume crops. © 2014 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  13. Short communication: phylodynamics analysis of the human immunodeficiency virus type 1 envelope gene in mother and child pairs.

    PubMed

    Santos, Luciane Amorim; Gray, Rebecca R; Monteiro-Cunha, Joana Paixão; Strazza, Evandra; Kashima, Simone; Santos, Edson de Souza; Araújo, Thessika Hialla Almeida; Gonçalves, Marilda de Souza; Salemi, Marco; Alcantara, Luiz Carlos Junior

    2015-09-01

    Characterizing the impact of HIV transmission routes on viral genetic diversity can improve the understanding of the mechanisms of virus evolution and adaptation. HIV vertical transmission can occur in utero, during delivery, or while breastfeeding. The present study investigated the phylodynamics of the HIV-1 env gene in mother-to-child transmission by analyzing one chronically infected pair from Brazil and three acutely infected pairs from Zambia, with three to five time points. Sequences from 25 clones from each sample were obtained and aligned using Clustal X. ML trees were constructed in PhyML using the best evolutionary model. Bayesian analyses testing the relaxed and strict molecular clock were performed using BEAST and a Bayesian Skyline Plot (BSP) was construed. The genetic variability of previously described epitopes was investigated and compared between each individual time point and between mother and child sequences. The relaxed molecular clock was the best-fitted model for all datasets. The tree topologies did not show differentiation in the evolutionary dynamics of the virus circulating in the mother from the viral population in the child. In the BSP, the effective population size was more constant in time in the chronically infected patients while in the acute patients it was possible to detect bottlenecks. The genetic variability within viral epitopes recognized by the human immune system was considerably higher among the chronically infected pair in comparison with acutely infected pairs. These results contribute to a better understanding of HIV-1 evolutionary dynamics in mother-to-child transmission.

  14. Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches

    PubMed Central

    Krobath, Heinrich; Chan, Hue Sun

    2016-01-01

    Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein’s folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit-chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches. PMID:27253392

  15. Genome-wide data reveal cryptic diversity and genetic introgression in an Oriental cynopterine fruit bat radiation.

    PubMed

    Chattopadhyay, Balaji; Garg, Kritika M; Kumar, A K Vinoth; Doss, D Paramanantha Swami; Rheindt, Frank E; Kandula, Sripathi; Ramakrishnan, Uma

    2016-02-18

    The Oriental fruit bat genus Cynopterus, with several geographically overlapping species, presents an interesting case study to evaluate the evolutionary significance of coexistence versus isolation. We examined the morphological and genetic variability of congeneric fruit bats Cynopterus sphinx and C. brachyotis using 405 samples from two natural contact zones and 17 allopatric locations in the Indian subcontinent; and investigated the population differentiation patterns, evolutionary history, and the possibility of cryptic diversity in this species pair. Analysis of microsatellites, cytochrome b gene sequences, and restriction digestion based genome-wide data revealed that C. sphinx and C. brachyotis do not hybridize in contact zones. However, cytochrome b gene sequences and genome-wide SNP data helped uncover a cryptic, hitherto unrecognized cynopterine lineage in northeastern India coexisting with C. sphinx. Further analyses of shared variation of SNPs using Patterson's D statistics suggest introgression between this lineage and C. sphinx. Multivariate analyses of morphology using genetically classified grouping confirmed substantial morphological overlap between C. sphinx and C. brachyotis, specifically in the high elevation contact zones in southern India. Our results uncover novel diversity and detect a pattern of genetic introgression in a cryptic radiation of bats, demonstrating the complicated nature of lineage diversification in this poorly understood taxonomic group. Our results highlight the importance of genome-wide data to study evolutionary processes of morphologically similar species pairs. Our approach represents a significant step forward in evolutionary research on young radiations of non-model species that may retain the ability of interspecific gene flow.

  16. Neutrino-heated stars and broad-line emission from active galactic nuclei

    NASA Technical Reports Server (NTRS)

    Macdonald, James; Stanev, Todor; Biermann, Peter L.

    1991-01-01

    Nonthermal radiation from active galactic nuclei indicates the presence of highly relativistic particles. The interaction of these high-energy particles with matter and photons gives rise to a flux of high-energy neutrinos. In this paper, the influence of the expected high neutrino fluxes on the structure and evolution of single, main-sequence stars is investigated. Sequences of models of neutrino-heated stars in thermal equilibrium are presented for masses 0.25, 0.5, 0.8, and 1.0 solar mass. In addition, a set of evolutionary sequences for mass 0.5 solar mass have been computed for different assumed values for the incident neutrino energy flux. It is found that winds driven by the heating due to high-energy particles and hard electromagnetic radiation of the outer layers of neutrino-bloated stars may satisfy the requirements of the model of Kazanas (1989) for the broad-line emission clouds in active galactic nuclei.

  17. Ignoring heterozygous sites biases phylogenomic estimates of divergence times: implications for the evolutionary history of microtus voles.

    PubMed

    Lischer, Heidi E L; Excoffier, Laurent; Heckel, Gerald

    2014-04-01

    Phylogenetic reconstruction of the evolutionary history of closely related organisms may be difficult because of the presence of unsorted lineages and of a relatively high proportion of heterozygous sites that are usually not handled well by phylogenetic programs. Genomic data may provide enough fixed polymorphisms to resolve phylogenetic trees, but the diploid nature of sequence data remains analytically challenging. Here, we performed a phylogenomic reconstruction of the evolutionary history of the common vole (Microtus arvalis) with a focus on the influence of heterozygosity on the estimation of intraspecific divergence times. We used genome-wide sequence information from 15 voles distributed across the European range. We provide a novel approach to integrate heterozygous information in existing phylogenetic programs by repeated random haplotype sampling from sequences with multiple unphased heterozygous sites. We evaluated the impact of the use of full, partial, or no heterozygous information for tree reconstructions on divergence time estimates. All results consistently showed four deep and strongly supported evolutionary lineages in the vole data. These lineages undergoing divergence processes split only at the end or after the last glacial maximum based on calibration with radiocarbon-dated paleontological material. However, the incorporation of information from heterozygous sites had a significant impact on absolute and relative branch length estimations. Ignoring heterozygous information led to an overestimation of divergence times between the evolutionary lineages of M. arvalis. We conclude that the exclusion of heterozygous sites from evolutionary analyses may cause biased and misleading divergence time estimates in closely related taxa.

  18. Conservation of Endo16 expression in sea urchins despite evolutionary divergence in both cis and trans-acting components of transcriptional regulation

    NASA Technical Reports Server (NTRS)

    Romano, Laura A.; Wray, Gregory A.

    2003-01-01

    Evolutionary changes in transcriptional regulation undoubtedly play an important role in creating morphological diversity. However, there is little information about the evolutionary dynamics of cis-regulatory sequences. This study examines the functional consequence of evolutionary changes in the Endo16 promoter of sea urchins. The Endo16 gene encodes a large extracellular protein that is expressed in the endoderm and may play a role in cell adhesion. Its promoter has been characterized in exceptional detail in the purple sea urchin, Strongylocentrotus purpuratus. We have characterized the structure and function of the Endo16 promoter from a second sea urchin species, Lytechinus variegatus. The Endo16 promoter sequences have evolved in a strongly mosaic manner since these species diverged approximately 35 million years ago: the most proximal region (module A) is conserved, but the remaining modules (B-G) are unalignable. Despite extensive divergence in promoter sequences, the pattern of Endo16 transcription is largely conserved during embryonic and larval development. Transient expression assays demonstrate that 2.2 kb of upstream sequence in either species is sufficient to drive GFP reporter expression that correctly mimics this pattern of Endo16 transcription. Reciprocal cross-species transient expression assays imply that changes have also evolved in the set of transcription factors that interact with the Endo16 promoter. Taken together, these results suggest that stabilizing selection on the transcriptional output may have operated to maintain a similar pattern of Endo16 expression in S. purpuratus and L. variegatus, despite dramatic divergence in promoter sequence and mechanisms of transcriptional regulation.

  19. MIPS plant genome information resources.

    PubMed

    Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X

    2007-01-01

    The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

  20. Comment on "Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage".

    PubMed

    Nakagome, Shigeki; Mano, Shuhei; Hasegawa, Masami

    2013-03-29

    Based on nuclear and mitochondrial DNA, Hailer et al. (Reports, 20 April 2012, p. 344) suggested early divergence of polar bears from a common ancestor with brown bears and subsequent introgression. Our population genetic analysis that traces each of the genealogies in the independent nuclear loci does not support the evolutionary model proposed by the authors.

  1. Neutrality and evolvability of designed protein sequences

    NASA Astrophysics Data System (ADS)

    Bhattacherjee, Arnab; Biswas, Parbati

    2010-07-01

    The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.

  2. Birth order in small multihospital systems.

    PubMed

    Luke, R D; Ozcan, Y A; Begun, J W

    1990-06-01

    The strategic behaviors of small multihospital systems have received little attention in the literature despite the fact that small systems are the predominant scale among multihospital systems. This study examines one important aspect of small-system strategic behaviors: the birth-order or evolutionary patterns of hospital acquisition. The evolutionary patterns of acquisition are compared across three strategic model types studied elsewhere: local market, investment, and historical. Using data obtained from a variety of sources, local market model systems are found, in the sequence of acquisition, to be significantly different from the other two model types in terms of relative distances of acquisitions from the initiating or parent hospital, the sizes of acquisition hospitals, the complexity of those hospitals, and the likelihood that the acquisitions are located in rural areas. Differences between parents and acquisitions are also significant, as hypothesized, for the market model system types, although they are not generally significant for the other two model types. The findings suggest that the market model represents an important strategic form that may have important implications for the restructuring of hospital markets.

  3. Deciphering evolutionary strata on plant sex chromosomes and fungal mating-type chromosomes through compositional segmentation.

    PubMed

    Pandey, Ravi S; Azad, Rajeev K

    2016-03-01

    Sex chromosomes have evolved from a pair of homologous autosomes which differentiated into sex determination systems, such as XY or ZW system, as a consequence of successive recombination suppression between the gametologous chromosomes. Identifying the regions of recombination suppression, namely, the "evolutionary strata", is central to understanding the history and dynamics of sex chromosome evolution. Evolution of sex chromosomes as a consequence of serial recombination suppressions is well-studied for mammals and birds, but not for plants, although 48 dioecious plants have already been reported. Only two plants Silene latifolia and papaya have been studied until now for the presence of evolutionary strata on their X chromosomes, made possible by the sequencing of sex-linked genes on both the X and Y chromosomes, which is a requirement of all current methods that determine stratum structure based on the comparison of gametologous sex chromosomes. To circumvent this limitation and detect strata even if only the sequence of sex chromosome in the homogametic sex (i.e. X or Z chromosome) is available, we have developed an integrated segmentation and clustering method. In application to gene sequences on the papaya X chromosome and protein-coding sequences on the S. latifolia X chromosome, our method could decipher all known evolutionary strata, as reported by previous studies. Our method, after validating on known strata on the papaya and S. latifolia X chromosome, was applied to the chromosome 19 of Populus trichocarpa, an incipient sex chromosome, deciphering two, yet unknown, evolutionary strata. In addition, we applied this approach to the recently sequenced sex chromosome V of the brown alga Ectocarpus sp. that has a haploid sex determination system (UV system) recovering the sex determining and pseudoautosomal regions, and then to the mating-type chromosomes of an anther-smut fungus Microbotryum lychnidis-dioicae predicting five strata in the non-recombining region of both the chromosomes.

  4. JCoDA: a tool for detecting evolutionary selection.

    PubMed

    Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir

    2010-05-27

    The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.

  5. JCoDA: a tool for detecting evolutionary selection

    PubMed Central

    2010-01-01

    Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581

  6. Molecular Phylogenetics and Temporal Diversification in the Genus Aeromonas Based on the Sequences of Five Housekeeping Genes

    PubMed Central

    Lorén, J. Gaspar; Farfán, Maribel; Fusté, M. Carmen

    2014-01-01

    Several approaches have been developed to estimate both the relative and absolute rates of speciation and extinction within clades based on molecular phylogenetic reconstructions of evolutionary relationships, according to an underlying model of diversification. However, the macroevolutionary models established for eukaryotes have scarcely been used with prokaryotes. We have investigated the rate and pattern of cladogenesis in the genus Aeromonas (γ-Proteobacteria, Proteobacteria, Bacteria) using the sequences of five housekeeping genes and an uncorrelated relaxed-clock approach. To our knowledge, until now this analysis has never been applied to all the species described in a bacterial genus and thus opens up the possibility of establishing models of speciation from sequence data commonly used in phylogenetic studies of prokaryotes. Our results suggest that the genus Aeromonas began to diverge between 248 and 266 million years ago, exhibiting a constant divergence rate through the Phanerozoic, which could be described as a pure birth process. PMID:24586399

  7. In search of actionable targets for agrigenomics and microalgal biofuel production: sequence-structural diversity studies on algal and higher plants with a focus on GPAT protein.

    PubMed

    Misra, Namrata; Panda, Prasanna Kumar

    2013-04-01

    The triacylglycerol (TAG) pathway provides several targets for genetic engineering to optimize microalgal lipid productivity. GPAT (glycerol-3-phosphate acyltransferase) is a crucial enzyme that catalyzes the initial step of TAG biosynthesis. Despite many recent biochemical studies, a comprehensive sequence-structure analysis of GPAT across diverse lipid-yielding organisms is lacking. Hence, we performed a comparative genomic analysis of plastid-located GPAT proteins from 7 microalgae and 3 higher plants species. The close evolutionary relationship observed between red algae/diatoms and green algae/plant lineages in the phylogenetic tree were further corroborated by motif and gene structure analysis. The predicted molecular weight, amino acid composition, Instability Index, and hydropathicity profile gave an overall representation of the biochemical features of GPAT protein across the species under study. Furthermore, homology models of GPAT from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Glycine max provided deep insights into the protein architecture and substrate binding sites. Despite low sequence identity found between algal and plant GPATs, the developed models exhibited strikingly conserved topology consisting of 14α helices and 9β sheets arranged in two domains. However, subtle variations in amino acids of fatty acyl binding site were identified that might influence the substrate selectivity of GPAT. Together, the results will provide useful resources to understand the functional and evolutionary relationship of GPAT and potentially benefit in development of engineered enzyme for augmenting algal biofuel production.

  8. The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences

    PubMed Central

    Xue, Cheng; Raveendran, Muthuswamy; Harris, R. Alan; Fawcett, Gloria L.; Liu, Xiaoming; White, Simon; Dahdouli, Mahmoud; Rio Deiros, David; Below, Jennifer E.; Salerno, William; Cox, Laura; Fan, Guoping; Ferguson, Betsy; Horvath, Julie; Johnson, Zach; Kanthaswamy, Sree; Kubisch, H. Michael; Liu, Dahai; Platt, Michael; Smith, David G.; Sun, Binghua; Vallender, Eric J.; Wang, Feng; Wiseman, Roger W.; Chen, Rui; Muzny, Donna M.; Gibbs, Richard A.; Yu, Fuli; Rogers, Jeffrey

    2016-01-01

    Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease. PMID:27934697

  9. Prevalent Role of Gene Features in Determining Evolutionary Fates of Whole-Genome Duplication Duplicated Genes in Flowering Plants1[W][OA

    PubMed Central

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-01-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs. PMID:23396833

  10. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    ERIC Educational Resources Information Center

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  11. Cry-Bt identifier: a biological database for PCR detection of Cry genes present in transgenic plants.

    PubMed

    Singh, Vinay Kumar; Ambwani, Sonu; Marla, Soma; Kumar, Anil

    2009-10-23

    We describe the development of a user friendly tool that would assist in the retrieval of information relating to Cry genes in transgenic crops. The tool also helps in detection of transformed Cry genes from Bacillus thuringiensis present in transgenic plants by providing suitable designed primers for PCR identification of these genes. The tool designed based on relational database model enables easy retrieval of information from the database with simple user queries. The tool also enables users to access related information about Cry genes present in various databases by interacting with different sources (nucleotide sequences, protein sequence, sequence comparison tools, published literature, conserved domains, evolutionary and structural data). http://insilicogenomics.in/Cry-btIdentifier/welcome.html.

  12. Resolving the Origin of Rabbit Hemorrhagic Disease Virus: Insights from an Investigation of the Viral Stocks Released in Australia

    PubMed Central

    Eden, John-Sebastian; Read, Andrew J.; Duckworth, Janine A.; Strive, Tanja

    2015-01-01

    To resolve the evolutionary history of rabbit hemorrhagic disease virus (RHDV), we performed a genomic analysis of the viral stocks imported and released as a biocontrol measure in Australia, as well as a global phylogenetic analysis. Importantly, conflicts were identified between the sequences determined here and those previously published that may have affected evolutionary rate estimates. By removing likely erroneous sequences, we show that RHDV emerged only shortly before its initial description in China. PMID:26378178

  13. Nucleotide variability at its limit? Insights into the number and evolutionary dynamics of the sex-determining specificities of the honey bee Apis mellifera.

    PubMed

    Lechner, Sarah; Ferretti, Luca; Schöning, Caspar; Kinuthia, Wanja; Willemsen, David; Hasselmann, Martin

    2014-02-01

    Deciphering the evolutionary processes driving nucleotide variation in multiallelic genes is limited by the number of genetic systems in which such genes occur. The complementary sex determiner (csd) gene in the honey bee Apis mellifera is an informative example for studying allelic diversity and the underlying evolutionary forces in a well-described model of balancing selection. Acting as the primary signal of sex determination, diploid individuals heterozygous for csd develop into females, whereas csd homozygotes are diploid males that have zero fitness. Examining 77 of the functional heterozygous csd allele pairs, we established a combinatorical criteria that provide insights into the minimum number of amino acid differences among those pairs. Given a data set of 244 csd sequences, we show that the total number of csd alleles found in A. mellifera ranges from 53 (locally) to 87 (worldwide), which is much higher than was previously reported (20). Using a coupon-collector model, we extrapolate the presence of in total 116-145 csd alleles worldwide. The hypervariable region (HVR) is of particular importance in determining csd allele specificity, and we provide for this region evidence of high evolutionary rate for length differences exceeding those of microsatellites. The proportion of amino acids driven by positive selection and the rate of nonsynonymous substitutions in the HVR-flanking regions reach values close to 1 but differ with respect to the HVR length. Using a model of csd coalescence, we identified the high originating rate of csd specificities as a major evolutionary force, leading to an origin of a novel csd allele every 400,000 years. The csd polymorphism frequencies in natural populations indicate an excess of new mutations, whereas signs of ancestral transspecies polymorphism can still be detected. This study provides a comprehensive view of the enormous diversity and the evolutionary forces shaping a multiallelic gene.

  14. Nucleotide Variability at Its Limit? Insights into the Number and Evolutionary Dynamics of the Sex-Determining Specificities of the Honey Bee Apis mellifera

    PubMed Central

    Lechner, Sarah; Ferretti, Luca; Schöning, Caspar; Kinuthia, Wanja; Willemsen, David; Hasselmann, Martin

    2014-01-01

    Deciphering the evolutionary processes driving nucleotide variation in multiallelic genes is limited by the number of genetic systems in which such genes occur. The complementary sex determiner (csd) gene in the honey bee Apis mellifera is an informative example for studying allelic diversity and the underlying evolutionary forces in a well-described model of balancing selection. Acting as the primary signal of sex determination, diploid individuals heterozygous for csd develop into females, whereas csd homozygotes are diploid males that have zero fitness. Examining 77 of the functional heterozygous csd allele pairs, we established a combinatorical criteria that provide insights into the minimum number of amino acid differences among those pairs. Given a data set of 244 csd sequences, we show that the total number of csd alleles found in A. mellifera ranges from 53 (locally) to 87 (worldwide), which is much higher than was previously reported (20). Using a coupon-collector model, we extrapolate the presence of in total 116–145 csd alleles worldwide. The hypervariable region (HVR) is of particular importance in determining csd allele specificity, and we provide for this region evidence of high evolutionary rate for length differences exceeding those of microsatellites. The proportion of amino acids driven by positive selection and the rate of nonsynonymous substitutions in the HVR-flanking regions reach values close to 1 but differ with respect to the HVR length. Using a model of csd coalescence, we identified the high originating rate of csd specificities as a major evolutionary force, leading to an origin of a novel csd allele every 400,000 years. The csd polymorphism frequencies in natural populations indicate an excess of new mutations, whereas signs of ancestral transspecies polymorphism can still be detected. This study provides a comprehensive view of the enormous diversity and the evolutionary forces shaping a multiallelic gene. PMID:24170493

  15. Diversity and evolutionary patterns of immune genes in free-ranging Namibian leopards (Panthera pardus pardus).

    PubMed

    Castro-Prieto, Aines; Wachter, Bettina; Melzheimer, Joerg; Thalwitzer, Susanne; Sommer, Simone

    2011-01-01

    The genes of the major histocompatibility complex (MHC) are a key component of the mammalian immune system and have become important molecular markers for fitness-related genetic variation in wildlife populations. Currently, no information about the MHC sequence variation and constitution in African leopards exists. In this study, we isolated and characterized genetic variation at the adaptively most important region of MHC class I and MHC class II-DRB genes in 25 free-ranging African leopards from Namibia and investigated the mechanisms that generate and maintain MHC polymorphism in the species. Using single-stranded conformation polymorphism analysis and direct sequencing, we detected 6 MHC class I and 6 MHC class II-DRB sequences, which likely correspond to at least 3 MHC class I and 3 MHC class II-DRB loci. Amino acid sequence variation in both MHC classes was higher or similar in comparison to other reported felids. We found signatures of positive selection shaping the diversity of MHC class I and MHC class II-DRB loci during the evolutionary history of the species. A comparison of MHC class I and MHC class II-DRB sequences of the leopard to those of other felids revealed a trans-species mode of evolution. In addition, the evolutionary relationships of MHC class II-DRB sequences between African and Asian leopard subspecies are discussed.

  16. BANYAN. IV. Fundamental parameters of low-mass star candidates in nearby young stellar kinematic groups—isochronal age determination using magnetic evolutionary models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malo, Lison; Doyon, René; Albert, Loïc

    2014-09-01

    Based on high-resolution optical spectra obtained with ESPaDOnS at Canada-France-Hawaii Telescope, we determine fundamental parameters (T {sub eff}, R, L {sub bol}, log g, and metallicity) for 59 candidate members of nearby young kinematic groups. The candidates were identified through the BANYAN Bayesian inference method of Malo et al., which takes into account the position, proper motion, magnitude, color, radial velocity, and parallax (when available) to establish a membership probability. The derived parameters are compared to Dartmouth magnetic evolutionary models and field stars with the goal of constraining the age of our candidates. We find that, in general, low-mass starsmore » in our sample are more luminous and have inflated radii compared to older stars, a trend expected for pre-main-sequence stars. The Dartmouth magnetic evolutionary models show a good fit to observations of field K and M stars, assuming a magnetic field strength of a few kG, as typically observed for cool stars. Using the low-mass members of the β Pictoris moving group, we have re-examined the age inconsistency problem between lithium depletion age and isochronal age (Hertzspring-Russell diagram). We find that the inclusion of the magnetic field in evolutionary models increases the isochronal age estimates for the K5V-M5V stars. Using these models and field strengths, we derive an average isochronal age between 15 and 28 Myr and we confirm a clear lithium depletion boundary from which an age of 26 ± 3 Myr is derived, consistent with previous age estimates based on this method.« less

  17. MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks

    USDA-ARS?s Scientific Manuscript database

    Motivation: Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary h...

  18. Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

    PubMed Central

    Labudde, Dirk

    2015-01-01

    The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540

  19. Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

    PubMed

    Grunert, Steffen; Labudde, Dirk

    2015-01-01

    The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.

  20. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin.

    PubMed

    Peyretaillade, E; Broussolle, V; Peyret, P; Méténier, G; Gouy, M; Vivarès, C P

    1998-06-01

    An intronless gene encoding a protein of 592 amino acid residues with similarity to 70-kDa heat shock proteins (HSP70s) has been cloned and sequenced from the amitochondrial protist Encephalitozoon cuniculi (phylum Microsporidia). Southern blot analyses show the presence of a single gene copy located on chromosome XI. The encoded protein exhibits an N-terminal hydrophobic leader sequence and two motifs shared by proteobacterial and mitochondrially expressed HSP70 homologs. Phylogenetic analysis using maximum likelihood and evolutionary distances place the E. cuniculi sequence in the cluster of mitochondrially expressed HSP70s, with a higher evolutionary rate than those of homologous sequences. Similar results were obtained after cloning a fragment of the homologous gene in the closely related species E. hellem. The presence of a nuclear targeting signal-like sequence supports a role of the Encephalitozoon HSP70 as a molecular chaperone of nuclear proteins. No evidence for cytosolic or endoplasmic reticulum forms of HSP70 was obtained through PCR amplification. These data suggest that Encephalitozoon species have evolved from an ancestor bearing mitochondria, which is in disagreement with the postulated presymbiotic origin of Microsporidia. The specific role and intracellular localization of the mitochondrial HSP70-like protein remain to be elucidated.

  1. Ploidy, sex and crossing over in an evolutionary aging model

    NASA Astrophysics Data System (ADS)

    Lobo, Matheus P.; Onody, Roberto N.

    2006-02-01

    Nowadays, many forms of reproduction coexist in nature: Asexual, sexual, apomictic and meiotic parthenogenesis, hermaphroditism and parasex. The mechanisms of their evolution and what made them successful reproductive alternatives are very challenging and debated questions. Here, using a simple evolutionary aging model, we give a possible scenario. By studying the performance of populations where individuals may have diverse characteristics-different ploidies, sex with or without crossing over, as well as the absence of sex-we find an evolution sequence that may explain why there are actually two major or leading groups: Sexual and asexual. We also investigate the dependence of these characteristics on different conditions of fertility and deleterious mutations. Finally, if the primeval organisms on Earth were, in fact, asexual individuals we conjecture that the sexual form of reproduction could have more easily been set and found its niche during a period of low-intensity mutations.

  2. Standing variation in spatially growing populations

    NASA Astrophysics Data System (ADS)

    Fusco, Diana; Gralka, Matti; Kayser, Jona; Hallatschek, Oskar

    Patterns of genetic diversity not only reflect the evolutionary history of a species but they can also determine the evolutionary response to environmental change. For instance, the standing genetic diversity of a microbial population can be key to rescue in the face of an antibiotic attack. While genetic diversity is in general shaped by both demography and evolution, very little is understood when both factors matter, as e.g. for biofilms with pronounced spatial organization. Here, we quantitatively explore patterns of genetic diversity by using microbial colonies and well-mixed test tube populations as antipodal model systems with extreme and very little spatial structure, respectively. We find that Eden model simulations and KPZ theory can remarkably reproduce the genetic diversity in microbial colonies obtained via population sequencing. The excellent agreement allows to draw conclusions on the resilience of spatially-organized populations and to uncover new strategies to contain antibiotic resistance.

  3. A Time-Calibrated Road Map of Brassicaceae Species Radiation and Evolutionary History[OPEN

    PubMed Central

    Hohmann, Nora; Wolf, Eva M.

    2015-01-01

    The Brassicaceae include several major crop plants and numerous important model species in comparative evolutionary research such as Arabidopsis, Brassica, Boechera, Thellungiella, and Arabis species. As any evolutionary hypothesis needs to be placed in a temporal context, reliably dated major splits within the evolution of Brassicaceae are essential. We present a comprehensive time-calibrated framework with important divergence time estimates based on whole-chloroplast sequence data for 29 Brassicaceae species. Diversification of the Brassicaceae crown group started at the Eocene-to-Oligocene transition. Subsequent major evolutionary splits are dated to ∼20 million years ago, coinciding with the Oligocene-to-Miocene transition, with increasing drought and aridity and transient glaciation events. The age of the Arabidopsis thaliana crown group is 6 million years ago, at the Miocene and Pliocene border. The overall species richness of the family is well explained by high levels of neopolyploidy (43% in total), but this trend is neither directly associated with an increase in genome size nor is there a general lineage-specific constraint. Our results highlight polyploidization as an important source for generating new evolutionary lineages adapted to changing environments. We conclude that species radiation, paralleled by high levels of neopolyploidization, follows genome size decrease, stabilization, and genetic diploidization. PMID:26410304

  4. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    PubMed

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  5. The effect of orthology and coregulation on detecting regulatory motifs.

    PubMed

    Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

    2010-02-03

    Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.

  6. The Effect of Orthology and Coregulation on Detecting Regulatory Motifs

    PubMed Central

    Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

    2010-01-01

    Background Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. Methodology We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Results and Conclusions Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE. PMID:20140085

  7. Model for Codon Position Bias in RNA Editing

    NASA Astrophysics Data System (ADS)

    Liu, Tsunglin; Bundschuh, Ralf

    2005-08-01

    RNA editing can be crucial for the expression of genetic information via inserting, deleting, or substituting a few nucleotides at specific positions in an RNA sequence. Within coding regions in an RNA sequence, editing usually occurs with a certain bias in choosing the positions of the editing sites. In the mitochondrial genes of Physarum polycephalum, many more editing events have been observed at the third codon position than at the first and second, while in some plant mitochondria the second codon position dominates. Here we propose an evolutionary model that explains this bias as the basis of selection at the protein level. The model predicts a distribution of the three positions rather close to the experimental observation in Physarum. This suggests that the codon position bias in Physarum is mainly a consequence of selection at the protein level.

  8. A model for codon position bias in RNA editing

    NASA Astrophysics Data System (ADS)

    Bundschuh, Ralf; Liu, Tsunglin

    2006-03-01

    RNA editing can be crucial for the expression of genetic information via inserting, deleting, or substituting a few nucleotides at specific positions in an RNA sequence. Within coding regions in an RNA sequence, editing usually occurs with a certain bias in choosing the positions of the editing sites. In the mitochondrial genes of Physarum polycephalum, many more editing events have been observed at the third codon position than at the first and second, while in some plant mitochondria the second codon position dominates. Here we propose an evolutionary model that explains this bias as the basis of selection at the protein level. The model predicts a distribution of the three positions rather close to the experimental observation in Physarum. This suggests that the codon position bias in Physarum is mainly a consequence of selection at the protein level.

  9. Novel variable number of tandem repeats of gibbon MAOA gene and its evolutionary significance.

    PubMed

    Choi, Yuri; Jung, Yi-Deun; Ayarpadikannan, Selvam; Koga, Akihiko; Imai, Hiroo; Hirai, Hirohisa; Roos, Christian; Kim, Heui-Soo

    2014-08-01

    Variable number of tandem repeats (VNTRs) are scattered throughout the primate genome, and genetic variation of these VNTRs have been accumulated during primate radiation. Here, we analyzed VNTRs upstream of the monoamine oxidase A (MAOA) gene in 11 different gibbon species. An abundance of truncated VNTR sequences and copy number differences were observed compared to those of human VNTR sequences. To better understand the biological role of these VNTRs, a luciferase activity assay was conducted and results indicated that selected VNTR sequences of the MAOA gene from human and three different gibbon species (Hylobates klossii, Hylobates lar, and Nomascus concolor) showed silencing ability. Together, these data could be useful for understanding the evolutionary history and functional significance of MAOA VNTR sequences in gibbon species.

  10. MESA models of the evolutionary state of the interacting binary epsilon Aurigae

    NASA Astrophysics Data System (ADS)

    Gibson, Justus L.; Stencel, Robert E.

    2018-06-01

    Using MESA code (Modules for Experiments in Stellar Astrophysics, version 9575), an evaluation was made of the evolutionary state of the epsilon Aurigae binary system (HD 31964, F0Iap + disc). We sought to satisfy several observational constraints: (1) requiring evolutionary tracks to pass close to the current temperature and luminosity of the primary star; (2) obtaining a period near the observed value of 27.1 years; (3) matching a mass function of 3.0; (4) concurrent Roche lobe overflow and mass transfer; (5) an isotopic ratio 12C/13C = 5 and, (6) matching the interferometrically determined angular diameter. A MESA model starting with binary masses of 9.85 + 4.5 M⊙, with a 100 d initial period, produces a 1.2 + 10.6 M⊙ result having a 547 d period, and a single digit 12C/13C ratio. These values were reached near an age of 20 Myr, when the donor star comes close to the observed luminosity and temperature for epsilon Aurigae A, as a post-RGB/pre-AGB star. Contemporaneously, the accretor then appears as an upper main-sequence, early B-type star. This benchmark model can provide a basis for further exploration of this interacting binary, and other long-period binary stars.

  11. Neutral forces acting on intragenomic variability shape the Escherichia coli regulatory network topology.

    PubMed

    Ruths, Troy; Nakhleh, Luay

    2013-05-07

    Cis-regulatory networks (CRNs) play a central role in cellular decision making. Like every other biological system, CRNs undergo evolution, which shapes their properties by a combination of adaptive and nonadaptive evolutionary forces. Teasing apart these forces is an important step toward functional analyses of the different components of CRNs, designing regulatory perturbation experiments, and constructing synthetic networks. Although tests of neutrality and selection based on molecular sequence data exist, no such tests are currently available based on CRNs. In this work, we present a unique genotype model of CRNs that is grounded in a genomic context and demonstrate its use in identifying portions of the CRN with properties explainable by neutral evolutionary forces at the system, subsystem, and operon levels. We leverage our model against experimentally derived data from Escherichia coli. The results of this analysis show statistically significant and substantial neutral trends in properties previously identified as adaptive in origin--degree distribution, clustering coefficient, and motifs--within the E. coli CRN. Our model captures the tightly coupled genome-interactome of an organism and enables analyses of how evolutionary events acting at the genome level, such as mutation, and at the population level, such as genetic drift, give rise to neutral patterns that we can quantify in CRNs.

  12. Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle.

    PubMed

    Zhan, Xiangjiang; Pan, Shengkai; Wang, Junyi; Dixon, Andrew; He, Jing; Muller, Margit G; Ni, Peixiang; Hu, Li; Liu, Yuan; Hou, Haolong; Chen, Yuanping; Xia, Jinquan; Luo, Qiong; Xu, Pengwei; Chen, Ying; Liao, Shengguang; Cao, Changchang; Gao, Shukun; Wang, Zhaobao; Yue, Zhen; Li, Guoqing; Yin, Ye; Fox, Nick C; Wang, Jun; Bruford, Michael W

    2013-05-01

    As top predators, falcons possess unique morphological, physiological and behavioral adaptations that allow them to be successful hunters: for example, the peregrine is renowned as the world's fastest animal. To examine the evolutionary basis of predatory adaptations, we sequenced the genomes of both the peregrine (Falco peregrinus) and saker falcon (Falco cherrug), and we present parallel, genome-wide evidence for evolutionary innovation and selection for a predatory lifestyle. The genomes, assembled using Illumina deep sequencing with greater than 100-fold coverage, are both approximately 1.2 Gb in length, with transcriptome-assisted prediction of approximately 16,200 genes for both species. Analysis of 8,424 orthologs in both falcons, chicken, zebra finch and turkey identified consistent evidence for genome-wide rapid evolution in these raptors. SNP-based inference showed contrasting recent demographic trajectories for the two falcons, and gene-based analysis highlighted falcon-specific evolutionary novelties for beak development and olfaction and specifically for homeostasis-related genes in the arid environment-adapted saker.

  13. DiscML: an R package for estimating evolutionary rates of discrete characters using maximum likelihood.

    PubMed

    Kim, Tane; Hao, Weilong

    2014-09-27

    The study of discrete characters is crucial for the understanding of evolutionary processes. Even though great advances have been made in the analysis of nucleotide sequences, computer programs for non-DNA discrete characters are often dedicated to specific analyses and lack flexibility. Discrete characters often have different transition rate matrices, variable rates among sites and sometimes contain unobservable states. To obtain the ability to accurately estimate a variety of discrete characters, programs with sophisticated methodologies and flexible settings are desired. DiscML performs maximum likelihood estimation for evolutionary rates of discrete characters on a provided phylogeny with the options that correct for unobservable data, rate variations, and unknown prior root probabilities from the empirical data. It gives users options to customize the instantaneous transition rate matrices, or to choose pre-determined matrices from models such as birth-and-death (BD), birth-death-and-innovation (BDI), equal rates (ER), symmetric (SYM), general time-reversible (GTR) and all rates different (ARD). Moreover, we show application examples of DiscML on gene family data and on intron presence/absence data. DiscML was developed as a unified R program for estimating evolutionary rates of discrete characters with no restriction on the number of character states, and with flexibility to use different transition models. DiscML is ideal for the analyses of binary (1s/0s) patterns, multi-gene families, and multistate discrete morphological characteristics.

  14. The evolutionary fate of the chloroplast and nuclear rps16 genes as revealed through the sequencing and comparative analyses of four novel legume chloroplast genomes from Lupinus

    PubMed Central

    Keller, J.; Rousseau-Gueutin, M.; Martin, G.E.; Morice, J.; Boutte, J.; Coissac, E.; Ourari, M.; Aïnouche, M.; Salmon, A.; Cabello-Hurtado, F.

    2017-01-01

    Abstract The Fabaceae family is considered as a model system for understanding chloroplast genome evolution due to the presence of extensive structural rearrangements, gene losses and localized hypermutable regions. Here, we provide sequences of four chloroplast genomes from the Lupinus genus, belonging to the underinvestigated Genistoid clade. Notably, we found in Lupinus species the functional loss of the essential rps16 gene, which was most likely replaced by the nuclear rps16 gene that encodes chloroplast and mitochondrion targeted RPS16 proteins. To study the evolutionary fate of the rps16 gene, we explored all available plant chloroplast, mitochondrial and nuclear genomes. Whereas no plant mitochondrial genomes carry an rps16 gene, many plants still have a functional nuclear and chloroplast rps16 gene. Ka/Ks ratios revealed that both chloroplast and nuclear rps16 copies were under purifying selection. However, due to the dual targeting of the nuclear rps16 gene product and the absence of a mitochondrial copy, the chloroplast gene may be lost. We also performed comparative analyses of lupine plastomes (SNPs, indels and repeat elements), identified the most variable regions and examined their phylogenetic utility. The markers identified here will help to reveal the evolutionary history of lupines, Genistoids and closely related clades. PMID:28338826

  15. Peeling Back the Evolutionary Layers of Molecular Mechanisms Responsive to Exercise-Stress in the Skeletal Muscle of the Racing Horse

    PubMed Central

    Kim, Hyeongmin; Lee, Taeheon; Park, WonCheoul; Lee, Jin Woo; Kim, Jaemin; Lee, Bo-Young; Ahn, Hyeonju; Moon, Sunjin; Cho, Seoae; Do, Kyoung-Tag; Kim, Heui-Soo; Lee, Hak-Kyo; Lee, Chang-Kyu; Kong, Hong-Sik; Yang, Young-Mok; Park, Jongsun; Kim, Hak-Min; Kim, Byung Chul; Hwang, Seungwoo; Bhak, Jong; Burt, Dave; Park, Kyoung-Do; Cho, Byung-Wook; Kim, Heebal

    2013-01-01

    The modern horse (Equus caballus) is the product of over 50 million yrs of evolution. The athletic abilities of the horse have been enhanced during the past 6000 yrs under domestication. Therefore, the horse serves as a valuable model to understand the physiology and molecular mechanisms of adaptive responses to exercise. The structure and function of skeletal muscle show remarkable plasticity to the physical and metabolic challenges following exercise. Here, we reveal an evolutionary layer of responsiveness to exercise-stress in the skeletal muscle of the racing horse. We analysed differentially expressed genes and their co-expression networks in a large-scale RNA-sequence dataset comparing expression before and after exercise. By estimating genome-wide dN/dS ratios using six mammalian genomes, and FST and iHS using re-sequencing data derived from 20 horses, we were able to peel back the evolutionary layers of adaptations to exercise-stress in the horse. We found that the oldest and thickest layer (dN/dS) consists of system-wide tissue and organ adaptations. We further find that, during the period of horse domestication, the older layer (FST) is mainly responsible for adaptations to inflammation and energy metabolism, and the most recent layer (iHS) for neurological system process, cell adhesion, and proteolysis. PMID:23580538

  16. On the Evolution of O(He)-Type Stars

    NASA Technical Reports Server (NTRS)

    Kruk, Jeffrey W.; Reindl, N.; Rauch, T.; Werner, K.

    2012-01-01

    O(He) stars represent a small group of four very hot post-AGB stars whose atmospheres are composed of almost pure helium. Their evolution deviates from the hydrogen-deficient post-AGO evolutionary sequence of carbon-dominated stars like e.g. PG 1159 or Wolf- Rayet stars. While (very) late thermal pulse evolutionary models can explain the observed He/C/O abundances in these objects, they do not reproduce He-dominated surface abundances. Currently it seems most likely that the O(He) stars originate from a double helium white dwarf merger and so they could be the successors of the luminous helium-rich sdO-stars. An other possibility is that O(He)-stars could be successors of RCB or EHe stars.

  17. Minimal-assumption inference from population-genomic data

    NASA Astrophysics Data System (ADS)

    Weissman, Daniel; Hallatschek, Oskar

    Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.

  18. Genome-wide comparative analysis of four Indian Drosophila species.

    PubMed

    Mohanty, Sujata; Khanna, Radhika

    2017-12-01

    Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.

  19. Mitogenome Sequencing in the Genus Camelus Reveals Evidence for Purifying Selection and Long-term Divergence between Wild and Domestic Bactrian Camels.

    PubMed

    Mohandesan, Elmira; Fitak, Robert R; Corander, Jukka; Yadamsuren, Adiya; Chuluunbat, Battsetseg; Abdelhadi, Omer; Raziq, Abdul; Nagy, Peter; Stalder, Gabrielle; Walzer, Chris; Faye, Bernard; Burger, Pamela A

    2017-08-30

    The genus Camelus is an interesting model to study adaptive evolution in the mitochondrial genome, as the three extant Old World camel species inhabit hot and low-altitude as well as cold and high-altitude deserts. We sequenced 24 camel mitogenomes and combined them with three previously published sequences to study the role of natural selection under different environmental pressure, and to advance our understanding of the evolutionary history of the genus Camelus. We confirmed the heterogeneity of divergence across different components of the electron transport system. Lineage-specific analysis of mitochondrial protein evolution revealed a significant effect of purifying selection in the concatenated protein-coding genes in domestic Bactrian camels. The estimated dN/dS < 1 in the concatenated protein-coding genes suggested purifying selection as driving force for shaping mitogenome diversity in camels. Additional analyses of the functional divergence in amino acid changes between species-specific lineages indicated fixed substitutions in various genes, with radical effects on the physicochemical properties of the protein products. The evolutionary time estimates revealed a divergence between domestic and wild Bactrian camels around 1.1 [0.58-1.8] million years ago (mya). This has major implications for the conservation and management of the critically endangered wild species, Camelus ferus.

  20. The Ditylenchus destructor genome provides new insights into the evolution of plant parasitic nematodes.

    PubMed

    Zheng, Jinshui; Peng, Donghai; Chen, Ling; Liu, Hualin; Chen, Feng; Xu, Mengci; Ju, Shouyong; Ruan, Lifang; Sun, Ming

    2016-07-27

    Plant-parasitic nematodes were found in 4 of the 12 clades of phylum Nematoda. These nematodes in different clades may have originated independently from their free-living fungivorous ancestors. However, the exact evolutionary process of these parasites is unclear. Here, we sequenced the genome sequence of a migratory plant nematode, Ditylenchus destructor We performed comparative genomics among the free-living nematode, Caenorhabditis elegans and all the plant nematodes with genome sequences available. We found that, compared with C. elegans, the core developmental control processes underwent heavy reduction, though most signal transduction pathways were conserved. We also found D. destructor contained more homologies of the key genes in the above processes than the other plant nematodes. We suggest that Ditylenchus spp. may be an intermediate evolutionary history stage from free-living nematodes that feed on fungi to obligate plant-parasitic nematodes. Based on the facts that D. destructor can feed on fungi and has a relatively short life cycle, and that it has similar features to both C. elegans and sedentary plant-parasitic nematodes from clade 12, we propose it as a new model to study the biology, biocontrol of plant nematodes and the interaction between nematodes and plants. © 2016 The Author(s).

  1. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

    PubMed Central

    2011-01-01

    Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models. PMID:21542930

  2. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    PubMed

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.

  3. The Evolution of Ion Pumps.

    ERIC Educational Resources Information Center

    Maloney, Peter C.; Wilson, T. Hastings

    1985-01-01

    Constructs an evolutionary sequence to account for the diversity of ion pumps found today. Explanations include primary ion pumps in bacteria, features and distribution of ATP-driven pumps, preference for cation transport, and proton pump reversal. The integrated evolutionary hypothesis should encourage new experimental approaches. (DH)

  4. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  5. Analyses of the radiation of birnaviruses from diverse host phyla and of their evolutionary affinities with other double-stranded RNA and positive strand RNA viruses using robust structure-based multiple sequence alignments and advanced phylogenetic methods

    PubMed Central

    2013-01-01

    Background Birnaviruses form a distinct family of double-stranded RNA viruses infecting animals as different as vertebrates, mollusks, insects and rotifers. With such a wide host range, they constitute a good model for studying the adaptation to the host. Additionally, several lines of evidence link birnaviruses to positive strand RNA viruses and suggest that phylogenetic analyses may provide clues about transition. Results We characterized the genome of a birnavirus from the rotifer Branchionus plicalitis. We used X-ray structures of RNA-dependent RNA polymerases and capsid proteins to obtain multiple structure alignments that allowed us to obtain reliable multiple sequence alignments and we employed “advanced” phylogenetic methods to study the evolutionary relationships between some positive strand and double-stranded RNA viruses. We showed that the rotifer birnavirus genome exhibited an organization remarkably similar to other birnaviruses. As this host was phylogenetically very distant from the other known species targeted by birnaviruses, we revisited the evolutionary pathways within the Birnaviridae family using phylogenetic reconstruction methods. We also applied a number of phylogenetic approaches based on structurally conserved domains/regions of the capsid and RNA-dependent RNA polymerase proteins to study the evolutionary relationships between birnaviruses, other double-stranded RNA viruses and positive strand RNA viruses. Conclusions We show that there is a good correlation between the phylogeny of the birnaviruses and that of their hosts at the phylum level using the RNA-dependent RNA polymerase (genomic segment B) on the one hand and a concatenation of the capsid protein, protease and ribonucleoprotein (genomic segment A) on the other hand. This correlation tends to vanish within phyla. The use of advanced phylogenetic methods and robust structure-based multiple sequence alignments allowed us to obtain a more accurate picture (in terms of probability of the tree topologies) of the evolutionary affinities between double-stranded RNA and positive strand RNA viruses. In particular, we were able to show that there exists a good statistical support for the claims that dsRNA viruses are not monophyletic and that viruses with permuted RdRps belong to a common evolution lineage as previously proposed by other groups. We also propose a tree topology with a good statistical support describing the evolutionary relationships between the Picornaviridae, Caliciviridae, Flaviviridae families and a group including the Alphatetraviridae, Nodaviridae, Permutotretraviridae, Birnaviridae, and Cystoviridae families. PMID:23865988

  6. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    PubMed

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  7. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  8. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology

    Treesearch

    Richard Cronn; Aaron Liston; Matthew Parks; David S. Gernandt; Rongkun Shen; Todd Mockler

    2008-01-01

    Organellar DNA sequences are widely used in evolutionary and population genetic studies; however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to...

  9. Evolutionary optimization of biopolymers and sequence structure maps

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reidys, C.M.; Kopp, S.; Schuster, P.

    1996-06-01

    Searching for biopolymers having a predefined function is a core problem of biotechnology, biochemistry and pharmacy. On the level of RNA sequences and their corresponding secondary structures we show that this problem can be analyzed mathematically. The strategy will be to study the properties of the RNA sequence to secondary structure mapping that is essential for the understanding of the search process. We show that to each secondary structure s there exists a neutral network consisting of all sequences folding into s. This network can be modeled as a random graph and has the following generic properties: it is densemore » and has a giant component within the graph of compatible sequences. The neutral network percolates sequence space and any two neutral nets come close in terms of Hamming distance. We investigate the distribution of the orders of neutral nets and show that above a certain threshold the topology of neutral nets allows to find practically all frequent secondary structures.« less

  10. An expressed sequence tag (EST) library for Drosophila serrata, a model system for sexual selection and climatic adaptation studies.

    PubMed

    Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F

    2009-01-21

    The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for this model species and facilitate comparative evolutionary studies within the montium subgroup of the D. melanogaster lineage.

  11. Evolutionary Roots and Diversification of the Genus Aeromonas.

    PubMed

    Sanglas, Ariadna; Albarral, Vicenta; Farfán, Maribel; Lorén, J G; Fusté, M C

    2017-01-01

    Despite the importance of diversification rates in the study of prokaryote evolution, they have not been quantitatively assessed for the majority of microorganism taxa. The investigation of evolutionary patterns in prokaryotes constitutes a challenge due to a very scarce fossil record, limited morphological differentiation and frequently complex taxonomic relationships, which make even species recognition difficult. Although the speciation models and speciation rates in eukaryotes have traditionally been established by analyzing the fossil record data, this is frequently incomplete, and not always available. More recently, several methods based on molecular sequence data have been developed to estimate speciation and extinction rates from phylogenies reconstructed from contemporary taxa. In this work, we determined the divergence time and temporal diversification of the genus Aeromonas by applying these methods widely used with eukaryotic taxa. Our analysis involved 150 Aeromonas strains using the concatenated sequences of two housekeeping genes (approximately 2,000 bp). Dating and diversification model analyses were performed using two different approaches: obtaining the consensus sequence from the concatenated sequences corresponding to all the strains belonging to the same species, or generating the species tree from multiple alignments of each gene. We used BEAST to perform a Bayesian analysis to estimate both the phylogeny and the divergence times. A global molecular clock cannot be assumed for any gene. From the chronograms obtained, we carried out a diversification analysis using several approaches. The results suggest that the genus Aeromonas began to diverge approximately 250 millions of years (Ma) ago. All methods used to determine Aeromonas diversification gave similar results, suggesting that the speciation process in this bacterial genus followed a rate-constant (Yule) diversification model, although there is a small probability that a slight deceleration occurred in recent times. We also determined the constant of diversification (λ) values, which in all cases were very similar, about 0.01 species/Ma, a value clearly lower than those described for different eukaryotes.

  12. Evolutionary Roots and Diversification of the Genus Aeromonas

    PubMed Central

    Sanglas, Ariadna; Albarral, Vicenta; Farfán, Maribel; Lorén, J. G.; Fusté, M. C.

    2017-01-01

    Despite the importance of diversification rates in the study of prokaryote evolution, they have not been quantitatively assessed for the majority of microorganism taxa. The investigation of evolutionary patterns in prokaryotes constitutes a challenge due to a very scarce fossil record, limited morphological differentiation and frequently complex taxonomic relationships, which make even species recognition difficult. Although the speciation models and speciation rates in eukaryotes have traditionally been established by analyzing the fossil record data, this is frequently incomplete, and not always available. More recently, several methods based on molecular sequence data have been developed to estimate speciation and extinction rates from phylogenies reconstructed from contemporary taxa. In this work, we determined the divergence time and temporal diversification of the genus Aeromonas by applying these methods widely used with eukaryotic taxa. Our analysis involved 150 Aeromonas strains using the concatenated sequences of two housekeeping genes (approximately 2,000 bp). Dating and diversification model analyses were performed using two different approaches: obtaining the consensus sequence from the concatenated sequences corresponding to all the strains belonging to the same species, or generating the species tree from multiple alignments of each gene. We used BEAST to perform a Bayesian analysis to estimate both the phylogeny and the divergence times. A global molecular clock cannot be assumed for any gene. From the chronograms obtained, we carried out a diversification analysis using several approaches. The results suggest that the genus Aeromonas began to diverge approximately 250 millions of years (Ma) ago. All methods used to determine Aeromonas diversification gave similar results, suggesting that the speciation process in this bacterial genus followed a rate-constant (Yule) diversification model, although there is a small probability that a slight deceleration occurred in recent times. We also determined the constant of diversification (λ) values, which in all cases were very similar, about 0.01 species/Ma, a value clearly lower than those described for different eukaryotes. PMID:28228750

  13. Quorum Sensing: A transcriptional Regulatory System Involved in the Pathogenicity of Burkholderia mallei

    DTIC Science & Technology

    2004-11-01

    contributes to the pathogenicity of B. mallei in vivo. 15. SUBJECT TERMS Burkholderia mallei , glanders , pseudomallei, quorum sensing, transcription...and D. M. Waag. 2000. Mouse model of sublethal and lethal intraperitoneal glanders ( Burkholderia mallei ). Vet. Pathol. 37:626–636. 10. Fuqua, C., M...sequence typing and evolutionary relationships among the causative agents of melioidosis and glanders , Burk- holderia pseudomallei and Burkholderia mallei

  14. The Rise and Fall of an Evolutionary Innovation: Contrasting Strategies of Venom Evolution in Ancient and Young Animals

    PubMed Central

    Sunagar, Kartik; Moran, Yehu

    2015-01-01

    Animal venoms are theorized to evolve under the significant influence of positive Darwinian selection in a chemical arms race scenario, where the evolution of venom resistance in prey and the invention of potent venom in the secreting animal exert reciprocal selection pressures. Venom research to date has mainly focused on evolutionarily younger lineages, such as snakes and cone snails, while mostly neglecting ancient clades (e.g., cnidarians, coleoids, spiders and centipedes). By examining genome, venom-gland transcriptome and sequences from the public repositories, we report the molecular evolutionary regimes of several centipede and spider toxin families, which surprisingly accumulated low-levels of sequence variations, despite their long evolutionary histories. Molecular evolutionary assessment of over 3500 nucleotide sequences from 85 toxin families spanning the breadth of the animal kingdom has unraveled a contrasting evolutionary strategy employed by ancient and evolutionarily young clades. We show that the venoms of ancient lineages remarkably evolve under the heavy constraints of negative selection, while toxin families in lineages that originated relatively recently rapidly diversify under the influence of positive selection. We propose that animal venoms mostly employ a ‘two-speed’ mode of evolution, where the major influence of diversifying selection accompanies the earlier stages of ecological specialization (e.g., diet and range expansion) in the evolutionary history of the species–the period of expansion, resulting in the rapid diversification of the venom arsenal, followed by longer periods of purifying selection that preserve the potent toxin pharmacopeia–the period of purification and fixation. However, species in the period of purification may re-enter the period of expansion upon experiencing a major shift in ecology or environment. Thus, we highlight for the first time the significant roles of purifying and episodic selections in shaping animal venoms. PMID:26492532

  15. Free Energy Landscape - Settlements of Key Residues.

    NASA Astrophysics Data System (ADS)

    Aroutiounian, Svetlana

    2007-03-01

    FEL perspective in studies of protein folding transitions reflects notion that since there are ˜10^N conformations to scan in search of lowest free energy state, random search is beyond biological timescale. Protein folding must follow certain fel pathways and folding kinetics of evolutionary selected proteins dominates kinetic traps. Good model for functional robustness of natural proteins - coarse-grained model protein is not very accurate but affords bringing simulations closer to biological realm; Go-like potential secures the fel funnel shape; biochemical contacts signify the funnel bottleneck. Boltzmann-weighted ensemble of protein conformations and histogram method are used to obtain from MC sampling of protein conformational space the approximate probability distribution. The fel is F(rmsd) = -1/βLn[Hist(rmsd)], β=kBT and rmsd is root-mean-square-deviation from native conformation. The sperm whale myoglobin has rich dynamic behavior, is small and large - on computational scale, has a symmetry in architecture and unusual sextet of residue pairs. Main idea: there is a mathematical relation between protein fel and a key residues set providing stability to folding transition. Is the set evolutionary conserved also for functional reasons? Hypothesis: primary sequence determines the key residues positions conserved as stabilizers and the fel is the battlefield for the folding stability. Preliminary results: primary sequence - not the architecture, is the rule settler, indeed.

  16. CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence

    PubMed Central

    Nepal, Madhav P; Benson, Benjamin V

    2015-01-01

    Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the Ks-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future. PMID:25922568

  17. CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence.

    PubMed

    Nepal, Madhav P; Benson, Benjamin V

    2015-01-01

    Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the K s-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future.

  18. Studying the evolutionary relationships and phylogenetic trees of 21 groups of tRNA sequences based on complex networks.

    PubMed

    Wei, Fangping; Chen, Bowen

    2012-03-01

    To find out the evolutionary relationships among different tRNA sequences of 21 amino acids, 22 networks are constructed. One is constructed from whole tRNAs, and the other 21 networks are constructed from the tRNAs which carry the same amino acids. A new method is proposed such that the alignment scores of any two amino acids groups are determined by the average degree and the average clustering coefficient of their networks. The anticodon feature of isolated tRNA and the phylogenetic trees of 21 group networks are discussed. We find that some isolated tRNA sequences in 21 networks still connect with other tRNAs outside their group, which reflects the fact that those tRNAs might evolve by intercrossing among these 21 groups. We also find that most anticodons among the same cluster are only one base different in the same sites when S ≥ 70, and they stay in the same rank in the ladder of evolutionary relationships. Those observations seem to agree on that some tRNAs might mutate from the same ancestor sequences based on point mutation mechanisms.

  19. Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation.

    PubMed

    Wang, Xiyin; Guo, Hui; Wang, Jinpeng; Lei, Tianyu; Liu, Tao; Wang, Zhenyi; Li, Yuxian; Lee, Tae-Ho; Li, Jingping; Tang, Haibao; Jin, Dianchuan; Paterson, Andrew H

    2016-02-01

    The 'apparently' simple genomes of many angiosperms mask complex evolutionary histories. The reference genome sequence for cotton (Gossypium spp.) revealed a ploidy change of a complexity unprecedented to date, indeed that could not be distinguished as to its exact dosage. Herein, by developing several comparative, computational and statistical approaches, we revealed a 5× multiplication in the cotton lineage of an ancestral genome common to cotton and cacao, and proposed evolutionary models to show how such a decaploid ancestor formed. The c. 70% gene loss necessary to bring the ancestral decaploid to its current gene count appears to fit an approximate geometrical model; that is, although many genes may be lost by single-gene deletion events, some may be lost in groups of consecutive genes. Gene loss following cotton decaploidy has largely just reduced gene copy numbers of some homologous groups. We designed a novel approach to deconvolute layers of chromosome homology, providing definitive information on gene orthology and paralogy across broad evolutionary distances, both of fundamental value and serving as an important platform to support further studies in and beyond cotton and genomics communities. No claim to original US government works. New Phytologist © 2015 New Phytologist Trust.

  20. Phylogenetic Factor Analysis.

    PubMed

    Tolkoff, Max R; Alfaro, Michael E; Baele, Guy; Lemey, Philippe; Suchard, Marc A

    2018-05-01

    Phylogenetic comparative methods explore the relationships between quantitative traits adjusting for shared evolutionary history. This adjustment often occurs through a Brownian diffusion process along the branches of the phylogeny that generates model residuals or the traits themselves. For high-dimensional traits, inferring all pair-wise correlations within the multivariate diffusion is limiting. To circumvent this problem, we propose phylogenetic factor analysis (PFA) that assumes a small unknown number of independent evolutionary factors arise along the phylogeny and these factors generate clusters of dependent traits. Set in a Bayesian framework, PFA provides measures of uncertainty on the factor number and groupings, combines both continuous and discrete traits, integrates over missing measurements and incorporates phylogenetic uncertainty with the help of molecular sequences. We develop Gibbs samplers based on dynamic programming to estimate the PFA posterior distribution, over 3-fold faster than for multivariate diffusion and a further order-of-magnitude more efficiently in the presence of latent traits. We further propose a novel marginal likelihood estimator for previously impractical models with discrete data and find that PFA also provides a better fit than multivariate diffusion in evolutionary questions in columbine flower development, placental reproduction transitions and triggerfish fin morphometry.

  1. Determinants of the rate of protein sequence evolution

    PubMed Central

    Zhang, Jianzhi; Yang, Jian-Rong

    2015-01-01

    The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what constitutes functional constraint has remained unclear. The increasing availability of genomic data has allowed for much needed empirical examinations on the nature of functional constraint. These studies found that the evolutionary rate of a protein is predominantly influenced by its expression level rather than functional importance. A combination of theoretical and empirical analyses have identified multiple mechanisms behind these observations and demonstrated a prominent role that selection against errors in molecular and cellular processes plays in protein evolution. PMID:26055156

  2. Observation of quantum criticality with ultracold atoms in optical lattices

    NASA Astrophysics Data System (ADS)

    Zhang, Xibo

    As biological problems are becoming more complex and data growing at a rate much faster than that of computer hardware, new and faster algorithms are required. This dissertation investigates computational problems arising in two of the fields: comparative genomics and epigenomics, and employs a variety of computational techniques to address the problems. One fundamental question in the studies of chromosome evolution is whether the rearrangement breakpoints are happening at random positions or along certain hotspots. We investigate the breakpoint reuse phenomenon, and show the analyses that support the more recently proposed fragile breakage model as opposed to the conventional random breakage models for chromosome evolution. The identification of syntenic regions between chromosomes forms the basis for studies of genome architectures, comparative genomics, and evolutionary genomics. The previous synteny block reconstruction algorithms could not be scaled to a large number of mammalian genomes being sequenced; neither did they address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolutionary history of large-scale duplications prevalent in plant genomes. We present a new unified synteny block generation algorithm based on A-Bruijn graph framework that overcomes these shortcomings. In the epigenome sequencing, a sample may contain a mixture of epigenomes and there is a need to resolve the distinct methylation patterns from the mixture. Many sequencing applications, such as haplotype inference for diploid or polyploid genomes, and metagenomic sequencing, share the similar objective: to infer a set of distinct assemblies from reads that are sequenced from a heterogeneous sample and subsequently aligned to a reference genome. We model the problem from both a combinatorial and a statistical angles. First, we describe a theoretical framework. A linear-time algorithm is then given to resolve a minimum number of assemblies that are consistent with all reads, substantially improving on previous algorithms. An efficient algorithm is also described to determine a set of assemblies that is consistent with a maximum subset of the reads, a previously untreated problem. We then prove that allowing nested reads or permitting mismatches between reads and their assemblies renders these problems NP-hard. Second, we describe a mixture model-based approach, and applied the model for the detection of allele-specific methylations.

  3. Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty

    PubMed Central

    Longo, Mark S; Carone, Dawn M; Green, Eric D; O'Neill, Michael J; O'Neill, Rachel J

    2009-01-01

    Background Large-scale genome rearrangements brought about by chromosome breaks underlie numerous inherited diseases, initiate or promote many cancers and are also associated with karyotype diversification during species evolution. Recent research has shown that these breakpoints are nonrandomly distributed throughout the mammalian genome and many, termed "evolutionary breakpoints" (EB), are specific genomic locations that are "reused" during karyotypic evolution. When the phylogenetic trajectory of orthologous chromosome segments is considered, many of these EB are coincident with ancient centromere activity as well as new centromere formation. While EB have been characterized as repeat-rich regions, it has not been determined whether specific sequences have been retained during evolution that would indicate previous centromere activity or a propensity for new centromere formation. Likewise, the conservation of specific sequence motifs or classes at EBs among divergent mammalian taxa has not been determined. Results To define conserved sequence features of EBs associated with centromere evolution, we performed comparative sequence analysis of more than 4.8 Mb within the tammar wallaby, Macropus eugenii, derived from centromeric regions (CEN), euchromatic regions (EU), and an evolutionary breakpoint (EB) that has undergone convergent breakpoint reuse and past centromere activity in marsupials. We found a dramatic enrichment for long interspersed nucleotide elements (LINE1s) and endogenous retroviruses (ERVs) and a depletion of short interspersed nucleotide elements (SINEs) shared between CEN and EBs. We analyzed the orthologous human EB (14q32.33), known to be associated with translocations in many cancers including multiple myelomas and plasma cell leukemias, and found a conserved distribution of similar repetitive elements. Conclusion Our data indicate that EBs tracked within the class Mammalia harbor sequence features retained since the divergence of marsupials and eutherians that may have predisposed these genomic regions to large-scale chromosomal instability. PMID:19630942

  4. Upon accounting for the impact of isoenzyme loss, gene deletion costs anticorrelate with their evolutionary rates

    DOE PAGES

    Jacobs, Christopher; Lambourne, Luke; Xia, Yu; ...

    2017-01-20

    Here, system-level metabolic network models enable the computation of growth and metabolic phenotypes from an organism's genome. In particular, flux balance approaches have been used to estimate the contribution of individual metabolic genes to organismal fitness, offering the opportunity to test whether such contributions carry information about the evolutionary pressure on the corresponding genes. Previous failure to identify the expected negative correlation between such computed gene-loss cost and sequence-derived evolutionary rates in Saccharomyces cerevisiae has been ascribed to a real biological gap between a gene's fitness contribution to an organism "here and now"º and the same gene's historical importance asmore » evidenced by its accumulated mutations over millions of years of evolution. Here we show that this negative correlation does exist, and can be exposed by revisiting a broadly employed assumption of flux balance models. In particular, we introduce a new metric that we call "function-loss cost", which estimates the cost of a gene loss event as the total potential functional impairment caused by that loss. This new metric displays significant negative correlation with evolutionary rate, across several thousand minimal environments. We demonstrate that the improvement gained using function-loss cost over gene-loss cost is explained by replacing the base assumption that isoenzymes provide unlimited capacity for backup with the assumption that isoenzymes are completely non-redundant. We further show that this change of the assumption regarding isoenzymes increases the recall of epistatic interactions predicted by the flux balance model at the cost of a reduction in the precision of the predictions. In addition to suggesting that the gene-to-reaction mapping in genome-scale flux balance models should be used with caution, our analysis provides new evidence that evolutionary gene importance captures much more than strict essentiality.« less

  5. Upon accounting for the impact of isoenzyme loss, gene deletion costs anticorrelate with their evolutionary rates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacobs, Christopher; Lambourne, Luke; Xia, Yu

    Here, system-level metabolic network models enable the computation of growth and metabolic phenotypes from an organism's genome. In particular, flux balance approaches have been used to estimate the contribution of individual metabolic genes to organismal fitness, offering the opportunity to test whether such contributions carry information about the evolutionary pressure on the corresponding genes. Previous failure to identify the expected negative correlation between such computed gene-loss cost and sequence-derived evolutionary rates in Saccharomyces cerevisiae has been ascribed to a real biological gap between a gene's fitness contribution to an organism "here and now"º and the same gene's historical importance asmore » evidenced by its accumulated mutations over millions of years of evolution. Here we show that this negative correlation does exist, and can be exposed by revisiting a broadly employed assumption of flux balance models. In particular, we introduce a new metric that we call "function-loss cost", which estimates the cost of a gene loss event as the total potential functional impairment caused by that loss. This new metric displays significant negative correlation with evolutionary rate, across several thousand minimal environments. We demonstrate that the improvement gained using function-loss cost over gene-loss cost is explained by replacing the base assumption that isoenzymes provide unlimited capacity for backup with the assumption that isoenzymes are completely non-redundant. We further show that this change of the assumption regarding isoenzymes increases the recall of epistatic interactions predicted by the flux balance model at the cost of a reduction in the precision of the predictions. In addition to suggesting that the gene-to-reaction mapping in genome-scale flux balance models should be used with caution, our analysis provides new evidence that evolutionary gene importance captures much more than strict essentiality.« less

  6. SpreaD3: Interactive Visualization of Spatiotemporal History and Trait Evolutionary Processes.

    PubMed

    Bielejec, Filip; Baele, Guy; Vrancken, Bram; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

    2016-08-01

    Model-based phylogenetic reconstructions increasingly consider spatial or phenotypic traits in conjunction with sequence data to study evolutionary processes. Alongside parameter estimation, visualization of ancestral reconstructions represents an integral part of these analyses. Here, we present a complete overhaul of the spatial phylogenetic reconstruction of evolutionary dynamics software, now called SpreaD3 to emphasize the use of data-driven documents, as an analysis and visualization package that primarily complements Bayesian inference in BEAST (http://beast.bio.ed.ac.uk, last accessed 9 May 2016). The integration of JavaScript D3 libraries (www.d3.org, last accessed 9 May 2016) offers novel interactive web-based visualization capacities that are not restricted to spatial traits and extend to any discrete or continuously valued trait for any organism of interest. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Industrial Relevance of Chromosomal Copy Number Variation in Saccharomyces Yeasts.

    PubMed

    Gorter de Vries, Arthur R; Pronk, Jack T; Daran, Jean-Marc G

    2017-06-01

    Chromosomal copy number variation (CCNV) plays a key role in evolution and health of eukaryotes. The unicellular yeast Saccharomyces cerevisiae is an important model for studying the generation, physiological impact, and evolutionary significance of CCNV. Fundamental studies of this yeast have contributed to an extensive set of methods for analyzing and introducing CCNV. Moreover, these studies provided insight into the balance between negative and positive impacts of CCNV in evolutionary contexts. A growing body of evidence indicates that CCNV not only frequently occurs in industrial strains of Saccharomyces yeasts but also is a key contributor to the diversity of industrially relevant traits. This notion is further supported by the frequent involvement of CCNV in industrially relevant traits acquired during evolutionary engineering. This review describes recent developments in genome sequencing and genome editing techniques and discusses how these offer opportunities to unravel contributions of CCNV in industrial Saccharomyce s strains as well as to rationally engineer yeast chromosomal copy numbers and karyotypes. Copyright © 2017 Gorter de Vries et al.

  8. Star formation history: Modeling of visual binaries

    NASA Astrophysics Data System (ADS)

    Gebrehiwot, Y. M.; Tessema, S. B.; Malkov, O. Yu.; Kovaleva, D. A.; Sytov, A. Yu.; Tutukov, A. V.

    2018-05-01

    Most stars form in binary or multiple systems. Their evolution is defined by masses of components, orbital separation and eccentricity. In order to understand star formation and evolutionary processes, it is vital to find distributions of physical parameters of binaries. We have carried out Monte Carlo simulations in which we simulate different pairing scenarios: random pairing, primary-constrained pairing, split-core pairing, and total and primary pairing in order to get distributions of binaries over physical parameters at birth. Next, for comparison with observations, we account for stellar evolution and selection effects. Brightness, radius, temperature, and other parameters of components are assigned or calculated according to approximate relations for stars in different evolutionary stages (main-sequence stars, red giants, white dwarfs, relativistic objects). Evolutionary stage is defined as a function of system age and component masses. We compare our results with the observed IMF, binarity rate, and binary mass-ratio distributions for field visual binaries to find initial distributions and pairing scenarios that produce observed distributions.

  9. Bridging the physical scales in evolutionary biology: From protein sequence space to fitness of organisms and populations

    PubMed Central

    Bershtein, Shimon; Serohijos, Adrian W.R.; Shakhnovich, Eugene I.

    2016-01-01

    Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology. PMID:27810574

  10. Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations.

    PubMed

    Bershtein, Shimon; Serohijos, Adrian Wr; Shakhnovich, Eugene I

    2017-02-01

    Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Industrial Relevance of Chromosomal Copy Number Variation in Saccharomyces Yeasts

    PubMed Central

    Gorter de Vries, Arthur R.; Pronk, Jack T.

    2017-01-01

    ABSTRACT Chromosomal copy number variation (CCNV) plays a key role in evolution and health of eukaryotes. The unicellular yeast Saccharomyces cerevisiae is an important model for studying the generation, physiological impact, and evolutionary significance of CCNV. Fundamental studies of this yeast have contributed to an extensive set of methods for analyzing and introducing CCNV. Moreover, these studies provided insight into the balance between negative and positive impacts of CCNV in evolutionary contexts. A growing body of evidence indicates that CCNV not only frequently occurs in industrial strains of Saccharomyces yeasts but also is a key contributor to the diversity of industrially relevant traits. This notion is further supported by the frequent involvement of CCNV in industrially relevant traits acquired during evolutionary engineering. This review describes recent developments in genome sequencing and genome editing techniques and discusses how these offer opportunities to unravel contributions of CCNV in industrial Saccharomyces strains as well as to rationally engineer yeast chromosomal copy numbers and karyotypes. PMID:28341679

  12. Evolutionary neural networks for anomaly detection based on the behavior of a program.

    PubMed

    Han, Sang-Jun; Cho, Sung-Bae

    2006-06-01

    The process of learning the behavior of a given program by using machine-learning techniques (based on system-call audit data) is effective to detect intrusions. Rule learning, neural networks, statistics, and hidden Markov models (HMMs) are some of the kinds of representative methods for intrusion detection. Among them, neural networks are known for good performance in learning system-call sequences. In order to apply this knowledge to real-world problems successfully, it is important to determine the structures and weights of these call sequences. However, finding the appropriate structures requires very long time periods because there are no suitable analytical solutions. In this paper, a novel intrusion-detection technique based on evolutionary neural networks (ENNs) is proposed. One advantage of using ENNs is that it takes less time to obtain superior neural networks than when using conventional approaches. This is because they discover the structures and weights of the neural networks simultaneously. Experimental results with the 1999 Defense Advanced Research Projects Agency (DARPA) Intrusion Detection Evaluation (IDEVAL) data confirm that ENNs are promising tools for intrusion detection.

  13. Molecular hyperdiversity and evolution in very large populations.

    PubMed

    Cutter, Asher D; Jovelin, Richard; Dey, Alivia

    2013-04-01

    The genomic density of sequence polymorphisms critically affects the sensitivity of inferences about ongoing sequence evolution, function and demographic history. Most animal and plant genomes have relatively low densities of polymorphisms, but some species are hyperdiverse with neutral nucleotide heterozygosity exceeding 5%. Eukaryotes with extremely large populations, mimicking bacterial and viral populations, present novel opportunities for studying molecular evolution in sexually reproducing taxa with complex development. In particular, hyperdiverse species can help answer controversial questions about the evolution of genome complexity, the limits of natural selection, modes of adaptation and subtleties of the mutation process. However, such systems have some inherent complications and here we identify topics in need of theoretical developments. Close relatives of the model organisms Caenorhabditis elegans and Drosophila melanogaster provide known examples of hyperdiverse eukaryotes, encouraging functional dissection of resulting molecular evolutionary patterns. We recommend how best to exploit hyperdiverse populations for analysis, for example, in quantifying the impact of noncrossover recombination in genomes and for determining the identity and micro-evolutionary selective pressures on noncoding regulatory elements. © 2013 Blackwell Publishing Ltd.

  14. Evolutionary Dynamics of the Gametologous CTNNB1 Gene on the Z and W Chromosomes of Snakes.

    PubMed

    Laopichienpong, Nararat; Muangmai, Narongrit; Chanhome, Lawan; Suntrarachun, Sunutcha; Twilprawat, Panupon; Peyachoknagul, Surin; Srikulnath, Kornsorn

    2017-03-01

    Snakes exhibit genotypic sex determination with female heterogamety (ZZ males and ZW females), and the state of sex chromosome differentiation also varies among lineages. To investigate the evolutionary history of homologous genes located in the nonrecombining region of differentiated sex chromosomes in snakes, partial sequences of the gametologous CTNNB1 gene were analyzed for 12 species belonging to henophid (Cylindrophiidae, Xenopeltidae, and Pythonidae) and caenophid snakes (Viperidae, Elapidae, and Colubridae). Nonsynonymous/synonymous substitution ratios (Ka/Ks) in coding sequences were low (Ka/Ks < 1) between CTNNB1Z and CTNNB1W, suggesting that these 2 genes may have similar functional properties. However, frequencies of intron sequence substitutions and insertion–deletions were higher in CTNNB1Z than CTNNB1W, suggesting that Z-linked sequences evolved faster than W-linked sequences. Molecular phylogeny based on both intron and exon sequences showed the presence of 2 major clades: 1) Z-linked sequences of Caenophidia and 2) W-linked sequences of Caenophidia clustered with Z-linked sequences of Henophidia, which suggests that the sequence divergence between CTNNB1Z and CTNNB1W in Caenophidia may have occurred by the cessation of recombination after the split from Henophidia.

  15. Genomic sequencing of Pleistocene cave bears

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less

  16. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    PubMed

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  17. Novel antigenic shift in HA sequences of H1N1 viruses detected by big data analysis.

    PubMed

    Zhang, Ruiying; Xu, Chongfeng; Duan, Ziyuan

    2017-07-01

    The influenza virus H1N1 has been prevalent all over the world for nearly a century. Many studies on its evolutionary history, substitution rate and antigenicity-associated sites have been done with small datasets. To have a complete view, we analysed 3171 full-length HA sequences from human H1N1 viruses sampled from 1918 to 2016, and discovered a new clade has formed with sequences isolated in Iran. Based on genetic distance calculations, we revealed an uneven evolutionary rate among sequences isolated in different years. We also found that the HA1 fragment of the new clade is like that of viruses that existed in the 1930s, while the HA2 fragment is closely associated with strains isolated after the 2009 pandemic. This new, "mixed" HA sequence indicates a cryptic antigenic shift event occurred, and it should draw more attention to the new clade identified from sequences from Iran. Copyright © 2017. Published by Elsevier B.V.

  18. From protostellar to pre-main-sequence evolution

    NASA Astrophysics Data System (ADS)

    D'Antona, F.

    I summarize the status of pre-main-sequence evolutionary tracks starting from the first steps dating back to the concept of Hayashi track. Understanding of the dynamical protostellar phase in the vision of Palla & Stahler, who introduced the concept of the deuterium burning thermostat and of stellar birthline, provided for a long time a link between the dynamical and hydrostatic evolution. Disk accretion however changed considerably the view, but re-introducing some ambiguities which must still be solved. The limitations and uncertainties in the mass and age determination from models for young stellar objects are summarized, but the burning of light elements is still a powerful observational signature.

  19. Performance comparison of the Prophecy (forecasting) Algorithm in FFT form for unseen feature and time-series prediction

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James

    2013-06-01

    We introduce a generalized numerical prediction and forecasting algorithm. We have previously published it for malware byte sequence feature prediction and generalized distribution modeling for disparate test article analysis. We show how non-trivial non-periodic extrapolation of a numerical sequence (forecast and backcast) from the starting data is possible. Our ancestor-progeny prediction can yield new options for evolutionary programming. Our equations enable analytical integrals and derivatives to any order. Interpolation is controllable from smooth continuous to fractal structure estimation. We show how our generalized trigonometric polynomial can be derived using a Fourier transform.

  20. The Discovery, Distribution, and Evolution of Viruses Associated with Drosophila melanogaster

    PubMed Central

    Webster, Claire L.; Waldron, Fergal M.; Robertson, Shaun; Crowson, Daisy; Ferrari, Giada; Quintana, Juan F.; Brouqui, Jean-Michel; Bayne, Elizabeth H.; Longdon, Ben; Buck, Amy H.; Lazzaro, Brian P.; Akorli, Jewelna; Haddrill, Penelope R.; Obbard, Darren J.

    2015-01-01

    Drosophila melanogaster is a valuable invertebrate model for viral infection and antiviral immunity, and is a focus for studies of insect-virus coevolution. Here we use a metagenomic approach to identify more than 20 previously undetected RNA viruses and a DNA virus associated with wild D. melanogaster. These viruses not only include distant relatives of known insect pathogens but also novel groups of insect-infecting viruses. By sequencing virus-derived small RNAs, we show that the viruses represent active infections of Drosophila. We find that the RNA viruses differ in the number and properties of their small RNAs, and we detect both siRNAs and a novel miRNA from the DNA virus. Analysis of small RNAs also allows us to identify putative viral sequences that lack detectable sequence similarity to known viruses. By surveying >2,000 individually collected wild adult Drosophila we show that more than 30% of D. melanogaster carry a detectable virus, and more than 6% carry multiple viruses. However, despite a high prevalence of the Wolbachia endosymbiont—which is known to be protective against virus infections in Drosophila—we were unable to detect any relationship between the presence of Wolbachia and the presence of any virus. Using publicly available RNA-seq datasets, we show that the community of viruses in Drosophila laboratories is very different from that seen in the wild, but that some of the newly discovered viruses are nevertheless widespread in laboratory lines and are ubiquitous in cell culture. By sequencing viruses from individual wild-collected flies we show that some viruses are shared between D. melanogaster and D. simulans. Our results provide an essential evolutionary and ecological context for host–virus interaction in Drosophila, and the newly reported viral sequences will help develop D. melanogaster further as a model for molecular and evolutionary virus research. PMID:26172158

  1. Evolutionary relationships of flying foxes (genus Pteropus) in the Philippines inferred from DNA sequences of cytochrome b gene.

    PubMed

    Bastian, S T; Tanaka, K; Anunciado, R V P; Natural, N G; Sumalde, A C; Namikawa, T

    2002-04-01

    Six flying fox species, genus Pteropus (four from the Philippines) were investigated using complete cytochrome b gene sequences (1140 bp) to infer their evolutionary relationships. The DNA sequences generated via polymerase chain reaction were analyzed using the neighbor-joining, parsimony, and maximum likelihood methods. We estimated that the first evolutionary event among these Pteropus species occurred approximately 13.90 +/- 1.49 MYA. Within this short period of evolutionary time we further hypothesized that the ancestors of the flying foxes found in the Philippines experienced a subsequent diversification forming two clusters in the topology. The first cluster is composed of P. pumilus (Philippine endemic), P. speciosus (restricted in western Mindanao) with P. scapulatus, while the second one comprised P. vampyrus and P. dasymallus species based on the analysis from first and second codon positions. Consistently, all phylogenetic analyses divulged close association of P. dasymallus with P. vampyrus contradicting the previous report categorizing P. dasymallus under subniger species group with P. pumilus. P. speciosus, and P. hypomelanus. The Philippine endemic species (P. pumilus) is closely linked with P. speciosus. The representative samples of P. vampyrus showed a large genetic distance of 1.87%. The large genetic distance between P. dasymallus and P. hypomelanus, P. pumilus and P. speciosus denotes a distinct species group.

  2. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    PubMed Central

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  3. Evolution and Vaccination of Influenza Virus.

    PubMed

    Lam, Ham Ching; Bi, Xuan; Sreevatsan, Srinand; Boley, Daniel

    2017-08-01

    In this study, we present an application paradigm in which an unsupervised machine learning approach is applied to the high-dimensional influenza genetic sequences to investigate whether vaccine is a driving force to the evolution of influenza virus. We first used a visualization approach to visualize the evolutionary paths of vaccine-controlled and non-vaccine-controlled influenza viruses in a low-dimensional space. We then quantified the evolutionary differences between their evolutionary trajectories through the use of within- and between-scatter matrices computation to provide the statistical confidence to support the visualization results. We used the influenza surface Hemagglutinin (HA) gene for this study as the HA gene is the major target of the immune system. The visualization is achieved without using any clustering methods or prior information about the influenza sequences. Our results clearly showed that the evolutionary trajectories between vaccine-controlled and non-vaccine-controlled influenza viruses are different and vaccine as an evolution driving force cannot be completely eliminated.

  4. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures.

    PubMed

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/

  5. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures

    PubMed Central

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/ PMID:18971256

  6. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies.

    PubMed

    Galan, Maxime; Guivier, Emmanuel; Caraux, Gilles; Charbonnel, Nathalie; Cosson, Jean-François

    2010-05-11

    High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives for the study of evolutionary and functional genetics of highly polymorphic genes like major histocompatibility complex genes in vertebrates or loci regulating self-compatibility in plants. Important applications in biomedical research will include the detection of individual variation in disease susceptibility. Similarly, agronomy will benefit from this approach, through the study of genes implicated in productivity or disease susceptibility traits.

  7. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Molecular evolution of the crustacean hyperglycemic hormone family in ecdysozoans

    PubMed Central

    2010-01-01

    Background Crustacean Hyperglycemic Hormone (CHH) family peptides are neurohormones known to regulate several important functions in decapod crustaceans such as ionic and energetic metabolism, molting and reproduction. The structural conservation of these peptides, together with the variety of functions they display, led us to investigate their evolutionary history. CHH family peptides exist in insects (Ion Transport Peptides) and may be present in all ecdysozoans as well. In order to extend the evolutionary study to the entire family, CHH family peptides were thus searched in taxa outside decapods, where they have been, to date, poorly investigated. Results CHH family peptides were characterized by molecular cloning in a branchiopod crustacean, Daphnia magna, and in a collembolan, Folsomia candida. Genes encoding such peptides were also rebuilt in silico from genomic sequences of another branchiopod, a chelicerate and two nematodes. These sequences were included in updated datasets to build phylogenies of the CHH family in pancrustaceans. These phylogenies suggest that peptides found in Branchiopoda and Collembola are more closely related to insect ITPs than to crustacean CHHs. Datasets were also used to support a phylogenetic hypothesis about pancrustacean relationships, which, in addition to gene structures, allowed us to propose two evolutionary scenarios of this multigenic family in ecdysozoans. Conclusions Evolutionary scenarios suggest that CHH family genes of ecdysozoans originate from an ancestral two-exon gene, and genes of arthropods from a three-exon one. In malacostracans, the evolution of the CHH family has involved several duplication, insertion or deletion events, leading to neuropeptides with a wide variety of functions, as observed in decapods. This family could thus constitute a promising model to investigate the links between gene duplications and functional divergence. PMID:20184761

  9. APOL1 Nephropathy: A Population Genetics and Evolutionary Medicine Detective Story.

    PubMed

    Kruzel-Davila, Etty; Wasser, Walter G; Skorecki, Karl

    2017-11-01

    Common DNA sequence variants rarely have a high-risk association with a common disease. When such associations do occur, evolutionary forces must be sought, such as in the association of apolipoprotein L1 (APOL1) gene risk variants with nondiabetic kidney diseases in populations of African ancestry. The variants originated in West Africa and provided pathogenic resistance in the heterozygous state that led to high allele frequencies owing to an adaptive evolutionary selective sweep. However, the homozygous state is disadvantageous and is associated with a markedly increased risk of a spectrum of kidney diseases encompassing hypertension-attributed kidney disease, focal segmental glomerulosclerosis, human immunodeficiency virus nephropathy, sickle cell nephropathy, and progressive lupus nephritis. This scientific success story emerged with the help of the tools developed over the past 2 decades in human genome sequencing and population genomic databases. In this introductory article to a timely issue dedicated to illuminating progress in this area, we describe this unique population genetics and evolutionary medicine detective story. We emphasize the paradox of the inheritance mode, the missing heritability, and unresolved associations, including cardiovascular risk and diabetic nephropathy. We also highlight how genetic epidemiology elucidates mechanisms and how the principles of evolution can be used to unravel conserved pathways affected by APOL1 that may lead to novel therapies. The APOL1 gene provides a compelling example of a common variant association with common forms of nondiabetic kidney disease occurring in a continental population isolate with subsequent global admixture. Scientific collaboration using multiple experimental model systems and approaches should further clarify pathomechanisms further, leading to novel therapies. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity

    PubMed Central

    Yassour, Moran; Grabherr, Manfred; Blood, Philip D.; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D.; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N.; Henschel, Robert; LeDuc, Richard D.; Friedman, Nir; Regev, Aviv

    2013-01-01

    De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net. PMID:23845962

  11. A theoretical and observational study of the Red Giant Branch phase transition in Magellanic Cloud clusters - A progress report

    NASA Technical Reports Server (NTRS)

    Buonanno, R.; Corsi, C. E.; Fusi Pecci, F.; Greggio, L.; Renzini, A.; Sweigart, A. V.

    1986-01-01

    Preliminary results are reported for an investigation comparing theoretical models of the sudden appearance of an extended RGB (and its effects on the spectral energy distributions of stellar populations) with data from ESO CCD observations of clusters in the LMC and SMC. Isochrones for the entire RGB are being constructed on the basis of 100 new evolutionary sequences (calculated using the evolution code of Sweigart and Gross, 1976 and 1978) to permit determination of synthetic colors and spectral energy distributions. The observations so far indicate a main sequence about 0.1 mag redder than that predicted by the present models or by the isochrones of VandenBerg and Bell (1985), and fail to show a B-V color difference at the RGB phase transition.

  12. Darwinian evolution in the light of genomics

    PubMed Central

    Koonin, Eugene V.

    2009-01-01

    Comparative genomics and systems biology offer unprecedented opportunities for testing central tenets of evolutionary biology formulated by Darwin in the Origin of Species in 1859 and expanded in the Modern Synthesis 100 years later. Evolutionary-genomic studies show that natural selection is only one of the forces that shape genome evolution and is not quantitatively dominant, whereas non-adaptive processes are much more prominent than previously suspected. Major contributions of horizontal gene transfer and diverse selfish genetic elements to genome evolution undermine the Tree of Life concept. An adequate depiction of evolution requires the more complex concept of a network or ‘forest’ of life. There is no consistent tendency of evolution towards increased genomic complexity, and when complexity increases, this appears to be a non-adaptive consequence of evolution under weak purifying selection rather than an adaptation. Several universals of genome evolution were discovered including the invariant distributions of evolutionary rates among orthologous genes from diverse genomes and of paralogous gene family sizes, and the negative correlation between gene expression level and sequence evolution rate. Simple, non-adaptive models of evolution explain some of these universals, suggesting that a new synthesis of evolutionary biology might become feasible in a not so remote future. PMID:19213802

  13. Phylogenetic Relationships and Species Delimitation in Pinus Section Trifoliae Inferrred from Plastid DNA

    PubMed Central

    Hernández-León, Sergio; Gernandt, David S.; Pérez de la Rosa, Jorge A.; Jardón-Barbolla, Lev

    2013-01-01

    Recent diversification followed by secondary contact and hybridization may explain complex patterns of intra- and interspecific morphological and genetic variation in the North American hard pines (Pinus section Trifoliae), a group of approximately 49 tree species distributed in North and Central America and the Caribbean islands. We concatenated five plastid DNA markers for an average of 3.9 individuals per putative species and assessed the suitability of the five regions as DNA bar codes for species identification, species delimitation, and phylogenetic reconstruction. The ycf1 gene accounted for the greatest proportion of the alignment (46.9%), the greatest proportion of variable sites (74.9%), and the most unique sequences (75 haplotypes). Phylogenetic analysis recovered clades corresponding to subsections Australes, Contortae, and Ponderosae. Sequences for 23 of the 49 species were monophyletic and sequences for another 9 species were paraphyletic. Morphologically similar species within subsections usually grouped together, but there were exceptions consistent with incomplete lineage sorting or introgression. Bayesian relaxed molecular clock analyses indicated that all three subsections diversified relatively recently during the Miocene. The general mixed Yule-coalescent method gave a mixed model estimate of only 22 or 23 evolutionary entities for the plastid sequences, which corresponds to less than half the 49 species recognized based on morphological species assignments. Including more unique haplotypes per species may result in higher estimates, but low mutation rates, recent diversification, and large effective population sizes may limit the effectiveness of this method to detect evolutionary entities. PMID:23936218

  14. Sequence organization and evolutionary dynamics of Brachypodium-specific centromere retrotransposons.

    PubMed

    Qi, L L; Wu, J J; Friebe, B; Qian, C; Gu, Y Q; Fu, D L; Gill, B S

    2013-08-01

    Brachypodium distachyon is a wild annual grass belonging to the Pooideae, more closely related to wheat, barley, and forage grasses than rice and maize. As an experimental model, the completed genome sequence of B. distachyon provides a unique opportunity to study centromere evolution during the speciation of grasses. Centromeric satellite sequences have been identified in B. distachyon, but little is known about centromeric retrotransposons in this species. In the present study, bacterial artificial chromosome (BAC)-fluorescence in situ hybridization was conducted in maize, rice, barley, wheat, and rye using B. distachyon (Bd) centromere-specific BAC clones. Eight Bd centromeric BAC clones gave no detectable fluorescence in situ hybridization (FISH) signals on the chromosomes of rice and maize, and three of them also did not yield any FISH signals in barley, wheat, and rye. In addition, four of five Triticeae centromeric BAC clones did not hybridize to the B. distachyon centromeres, implying certain unique features of Brachypodium centromeres. Analysis of Brachypodium centromeric BAC sequences identified a long terminal repeat (LTR)-centromere retrotransposon of B. distachyon (CRBd1). This element was found in high copy number accounting for 1.6 % of the B. distachyon genome, and is enriched in Brachypodium centromeric regions. CRBd1 accumulated in active centromeres, but was lost from inactive ones. The LTR of CRBd1 appears to be specific to B. distachyon centromeres. These results reveal different evolutionary events of this retrotransposon family across grass species.

  15. Molecular epidemiology of Powassan virus in North America.

    PubMed

    Pesko, Kendra N; Torres-Perez, Fernando; Hjelle, Brian L; Ebel, Gregory D

    2010-11-01

    Powassan virus (POW) is a tick-borne flavivirus distributed in Canada, the northern USA and the Primorsky region of Russia. POW is the only tick-borne flavivirus endemic to the western hemisphere, where it is transmitted mainly between Ixodes cookei and groundhogs (Marmota monax). Deer tick virus (DTV), a genotype of POW that has been frequently isolated from deer ticks (Ixodes scapularis), appears to be maintained in an enzootic cycle between these ticks and white-footed mice (Peromyscus leucopus). DTV has been isolated from ticks in several regions of North America, including the upper Midwest and the eastern seaboard. The incidence of human disease due to POW is apparently increasing. Previous analysis of tick-borne flaviviruses endemic to North America have been limited to relatively short genome fragments. We therefore assessed the evolutionary dynamics of POW using newly generated complete and partial genome sequences. Maximum-likelihood and Bayesian phylogenetic inferences showed two well-supported, reciprocally monophyletic lineages corresponding to POW and DTV. Bayesian skyline plots based on year-of-sampling data indicated no significant population size change for either virus lineage. Statistical model-based selection analyses showed evidence of purifying selection in both lineages. Positive selection was detected in NS-5 sequences for both lineages and envelope sequences for POW. Our findings confirm that POW and DTV sequences are relatively stable over time, which suggests strong evolutionary constraint, and support field observations that suggest that tick-borne flavivirus populations are extremely stable in enzootic foci.

  16. Phylogenetic relationships and species delimitation in pinus section trifoliae inferrred from plastid DNA.

    PubMed

    Hernández-León, Sergio; Gernandt, David S; Pérez de la Rosa, Jorge A; Jardón-Barbolla, Lev

    2013-01-01

    Recent diversification followed by secondary contact and hybridization may explain complex patterns of intra- and interspecific morphological and genetic variation in the North American hard pines (Pinus section Trifoliae), a group of approximately 49 tree species distributed in North and Central America and the Caribbean islands. We concatenated five plastid DNA markers for an average of 3.9 individuals per putative species and assessed the suitability of the five regions as DNA bar codes for species identification, species delimitation, and phylogenetic reconstruction. The ycf1 gene accounted for the greatest proportion of the alignment (46.9%), the greatest proportion of variable sites (74.9%), and the most unique sequences (75 haplotypes). Phylogenetic analysis recovered clades corresponding to subsections Australes, Contortae, and Ponderosae. Sequences for 23 of the 49 species were monophyletic and sequences for another 9 species were paraphyletic. Morphologically similar species within subsections usually grouped together, but there were exceptions consistent with incomplete lineage sorting or introgression. Bayesian relaxed molecular clock analyses indicated that all three subsections diversified relatively recently during the Miocene. The general mixed Yule-coalescent method gave a mixed model estimate of only 22 or 23 evolutionary entities for the plastid sequences, which corresponds to less than half the 49 species recognized based on morphological species assignments. Including more unique haplotypes per species may result in higher estimates, but low mutation rates, recent diversification, and large effective population sizes may limit the effectiveness of this method to detect evolutionary entities.

  17. Quadrupedal locomotor simulation: producing more realistic gaits using dual-objective optimization

    PubMed Central

    Hirasaki, Eishi

    2018-01-01

    In evolutionary biomechanics it is often considered that gaits should evolve to minimize the energetic cost of travelling a given distance. In gait simulation this goal often leads to convincing gait generation. However, as the musculoskeletal models used get increasingly sophisticated, it becomes apparent that such a single goal can lead to extremely unrealistic gait patterns. In this paper, we explore the effects of requiring adequate lateral stability and show how this increases both energetic cost and the realism of the generated walking gait in a high biofidelity chimpanzee musculoskeletal model. We also explore the effects of changing the footfall sequences in the simulation so it mimics both the diagonal sequence walking gaits that primates typically use and also the lateral sequence walking gaits that are much more widespread among mammals. It is apparent that adding a lateral stability criterion has an important effect on the footfall phase relationship, suggesting that lateral stability may be one of the key drivers behind the observed footfall sequences in quadrupedal gaits. The observation that single optimization goals are no longer adequate for generating gait in current models has important implications for the use of biomimetic virtual robots to predict the locomotor patterns in fossil animals. PMID:29657790

  18. Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

    PubMed Central

    Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2013-01-01

    Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856

  19. Upper Albian to Lower Turonian deposits and associated breccias along the Dahar cuestas (southeastern Tunisia): Origin and depositional environments

    NASA Astrophysics Data System (ADS)

    Krimi, Mabrouk; Ouaja, Mohamed; Zargouni, Fouad

    2017-11-01

    The carbonate Zebbag Formation of Upper Albian to Lower Turonian age which outcrops along the Dahar cuestas (south eastern Tunisia) includes several breccia intervals. The stratigraphic hierarchy of these breccia levels led to achieving a detailed sequential analysis within a spectrum of depositional environments extending from subtidal to inner to middle ramp settings. Six major transgressive/regressive sequences make up the stacking of the elementary sequences beginning with transgressive and/or storm wave breccias capped by desiccation and/or collapse breccias. The stratigraphic evolutionary history of the breccia facies are interpreted as the result of the interplay between eustatic and tectonic factors. This model is in accord with the tectonic activities common during Upper Albian-Lower Turonian responsible for the sequences onlapping.

  20. Genealogy and gene trees.

    PubMed

    Rasmuson, Marianne

    2008-02-01

    Heredity can be followed in persons or in genes. Persons can be identified only a few generations back, but simplified models indicate that universal ancestors to all now living persons have occurred in the past. Genetic variability can be characterized as variants of DNA sequences. Data are available only from living persons, but from the pattern of variation gene trees can be inferred by means of coalescence models. The merging of lines backwards in time leads to a MRCA (most recent common ancestor). The time and place of living for this inferred person can give insights in human evolutionary history. Demographic processes are incorporated in the model, but since culture and customs are known to influence demography the models used ought to be tested against available genealogy. The Icelandic data base offers a possibility to do so and points to some discrepancies. Mitochondrial DNA and Y chromosome patterns give a rather consistent view of human evolutionary history during the latest 100 000 years but the earlier epochs of human evolution demand gene trees with longer branches. The results of such studies reveal as yet unsolved problems about the sources of our genome.

  1. Localization and characterization of X chromosome inversion breakpoints separating Drosophila mojavensis and Drosophila arizonae.

    PubMed

    Cirulli, Elizabeth T; Noor, Mohamed A F

    2007-01-01

    Ectopic exchange between transposable elements or other repetitive sequences along a chromosome can produce chromosomal inversions. As a result, genome sequence studies typically find sequence similarity between corresponding inversion breakpoint regions. Here, we identify and investigate the breakpoint regions of the X chromosome inversion distinguishing Drosophila mojavensis and Drosophila arizonae. We localize one inversion breakpoint to 13.7 kb and localize the other to a 1-Mb interval. Using this localization and assuming microsynteny between Drosophila melanogaster and D. arizonae, we pinpoint likely positions of the inversion breakpoints to windows of less than 3000 bp. These breakpoints define the size of the inversion to approximately 11 Mb. However, in contrast to many other studies, we fail to find significant sequence similarity between the 2 breakpoint regions. The localization of these inversion breakpoints will facilitate future genetic and molecular evolutionary studies in this species group, an emerging model system for ecological genetics.

  2. The different origins of magnetic fields and activity in the Hertzsprung gap stars, OU Andromedae and 31 Comae

    NASA Astrophysics Data System (ADS)

    Borisova, A.; Aurière, M.; Petit, P.; Konstantinova-Antova, R.; Charbonnel, C.; Drake, N. A.

    2016-06-01

    Context. When crossing the Hertzsprung gap, intermediate-mass stars develop a convective envelope. Fast rotators on the main sequence, or Ap star descendants, are expected to become magnetic active subgiants during this evolutionary phase. Aims: We compare the surface magnetic fields and activity indicators of two active, fast rotating red giants with similar masses and spectral class but different rotation rates - OU And (Prot = 24.2 d) and 31 Com (Prot = 6.8 d) - to address the question of the origin of their magnetism and high activity. Methods: Observations were carried out with the Narval spectropolarimeter in 2008 and 2013. We used the least-squares deconvolution (LSD) technique to extract Stokes V and I profiles with high signal-to-noise ratio to detect Zeeman signatures of the magnetic field of the stars. We then provide Zeeman-Doppler imaging (ZDI), activity indicators monitoring, and a precise estimation of stellar parameters. We use state-of-the-art stellar evolutionary models, including rotation, to infer the evolutionary status of our giants, as well as their initial rotation velocity on the main sequence, and we interpret our observational results in the light of the theoretical Rossby numbers. Results: The detected magnetic field of OU Andromedae (OU And) is a strong one. Its longitudinal component Bl reaches 40 G and presents an about sinusoidal variation with reversal of the polarity. The magnetic topology of OU And is dominated by large-scale elements and is mainly poloidal with an important dipole component, as well as a significant toroidal component. The detected magnetic field of 31 Comae (31 Com) is weaker, with a magnetic map showing a more complex field geometry, and poloidal and toroidal components of equal contributions. The evolutionary models show that the progenitors of OU And and 31 Com must have been rotating at velocities that correspond to 30 and 53%, respectively, of their critical rotation velocity on the zero age main sequence. Both OU And and 31 Com have very similar masses (2.7 and 2.85 M⊙, respectively), and they both lie in the Hertzsprung gap. Conclusions: OU And appears to be the probable descendant of a magnetic Ap star, and 31 Com the descendant of a relatively fast rotator on the main sequence. Because of the relatively fast rotation in the Hertzsprung gap and the onset of the development of a convective envelope, OU And also has a dynamo in operation. Based on observations obtained at the telescope Bernard Lyot (TBL) at Observatoire du Pic du Midi, CNRS/INSU and Université de Toulouse, France.

  3. Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error

    PubMed Central

    Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric

    2010-01-01

    It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140

  4. Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins

    NASA Astrophysics Data System (ADS)

    Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil

    2014-03-01

    Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.

  5. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data

    PubMed Central

    Roth, Andrew; Khattra, Jaswinder; Ho, Julie; Yap, Damian; Prentice, Leah M.; Melnyk, Nataliya; McPherson, Andrew; Bashashati, Ali; Laks, Emma; Biele, Justina; Ding, Jiarui; Le, Alan; Rosner, Jamie; Shumansky, Karey; Marra, Marco A.; Gilks, C. Blake; Huntsman, David G.; McAlpine, Jessica N.; Aparicio, Samuel

    2014-01-01

    The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN. PMID:25060187

  6. Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques

    PubMed Central

    Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn

    2015-01-01

    We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674

  7. Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fields, C.A.

    1996-06-01

    The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less

  8. Building toy models of proteins using coevolutionary information

    NASA Astrophysics Data System (ADS)

    Cheng, Ryan; Raghunathan, Mohit; Onuchic, Jose

    2015-03-01

    Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid positions within the multiple sequence alignment of a protein family. Here, we use Direct Coupling Analysis (DCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family to obtain the sequence-dependent interaction energies of a toy protein model. We demonstrate that this methodology predicts residue-residue interaction energies that are consistent with experimental mutational changes in protein stabilities as well as other computational methodologies. Furthermore, we demonstrate with several examples that DCA could be used to construct a structure-based model that quantitatively agrees with experimental data on folding mechanisms. This work serves as a potential framework for generating models of proteins that are enriched by evolutionary data that can potentially be used to engineer key functional motions and interactions in protein systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1427654).

  9. Rise and fall of political complexity in island South-East Asia and the Pacific.

    PubMed

    Currie, Thomas E; Greenhill, Simon J; Gray, Russell D; Hasegawa, Toshikazu; Mace, Ruth

    2010-10-14

    There is disagreement about whether human political evolution has proceeded through a sequence of incremental increases in complexity, or whether larger, non-sequential increases have occurred. The extent to which societies have decreased in complexity is also unclear. These debates have continued largely in the absence of rigorous, quantitative tests. We evaluated six competing models of political evolution in Austronesian-speaking societies using phylogenetic methods. Here we show that in the best-fitting model political complexity rises and falls in a sequence of small steps. This is closely followed by another model in which increases are sequential but decreases can be either sequential or in bigger drops. The results indicate that large, non-sequential jumps in political complexity have not occurred during the evolutionary history of these societies. This suggests that, despite the numerous contingent pathways of human history, there are regularities in cultural evolution that can be detected using computational phylogenetic methods.

  10. RiboMaker: computational design of conformation-based riboregulation.

    PubMed

    Rodrigo, Guillermo; Jaramillo, Alfonso

    2014-09-01

    The ability to engineer control systems of gene expression is instrumental for synthetic biology. Thus, bioinformatic methods that assist such engineering are appealing because they can guide the sequence design and prevent costly experimental screening. In particular, RNA is an ideal substrate to de novo design regulators of protein expression by following sequence-to-function models. We have implemented a novel algorithm, RiboMaker, aimed at the computational, automated design of bacterial riboregulation. RiboMaker reads the sequence and structure specifications, which codify for a gene regulatory behaviour, and optimizes the sequences of a small regulatory RNA and a 5'-untranslated region for an efficient intermolecular interaction. To this end, it implements an evolutionary design strategy, where random mutations are selected according to a physicochemical model based on free energies. The resulting sequences can then be tested experimentally, providing a new tool for synthetic biology, and also for investigating the riboregulation principles in natural systems. Web server is available at http://ribomaker.jaramillolab.org/. Source code, instructions and examples are freely available for download at http://sourceforge.net/projects/ribomaker/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  12. The genomic landscape of rapid, repeated evolutionary rescue from toxic pollution in wild fish

    USDA-ARS?s Scientific Manuscript database

    Here we describe evolutionary rescue from intense pollution via multiple modes of selection in killifish populations from 4 urban estuaries of the US eastern seaboard. Comparative transcriptomics and analysis of 384 whole genome sequences show that the functioning of a receptor-based signaling pathw...

  13. Analysis of evolutionary patterns of genes in campylobacter jejuni and C. coli

    USDA-ARS?s Scientific Manuscript database

    Background: In order to investigate the population genetics structure of thermophilic Campylobacter spp., we extracted a set of 1029 core gene families (CGF) from 25 sequenced genomes of C. jejuni, C. coli and C. lari. Based on these CGFs we employed different approaches to reveal the evolutionary ...

  14. Datamonkey 2.0: a modern web application for characterizing selective and other evolutionary processes.

    PubMed

    Weaver, Steven; Shank, Stephen D; Spielman, Stephanie J; Li, Michael; Muse, Spencer V; Kosakovsky Pond, Sergei L

    2018-01-02

    Inference of how evolutionary forces have shaped extant genetic diversity is a cornerstone of modern comparative sequence analysis. Advances in sequence generation and increased statistical sophistication of relevant methods now allow researchers to extract ever more evolutionary signal from the data, albeit at an increased computational cost. Here, we announce the release of Datamonkey 2.0, a completely re-engineered version of the Datamonkey web-server for analyzing evolutionary signatures in sequence data. For this endeavor, we leveraged recent developments in open-source libraries that facilitate interactive, robust, and scalable web application development. Datamonkey 2.0 provides a carefully curated collection of methods for interrogating coding-sequence alignments for imprints of natural selection, packaged as a responsive (i.e. can be viewed on tablet and mobile devices), fully interactive, and API-enabled web application. To complement Datamonkey 2.0, we additionally release HyPhy Vision, an accompanying JavaScript application for visualizing analysis results. HyPhy Vision can also be used separately from Datamonkey 2.0 to visualize locally-executed HyPhy analyses. Together, Datamonkey 2.0 and HyPhy Vision showcase how scientific software development can benefit from general-purpose open-source frameworks. Datamonkey 2.0 is freely and publicly available at http://www.datamonkey. org, and the underlying codebase is available from https://github.com/veg/datamonkey-js. © The Author 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

    PubMed Central

    Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540

  16. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    PubMed

    Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.

  17. Characterization of 47 MHC class I sequences in Filipino cynomolgus macaques

    PubMed Central

    Campbell, Kevin J.; Detmer, Ann M.; Karl, Julie A.; Wiseman, Roger W.; Blasky, Alex J.; Hughes, Austin L.; Bimber, Benjamin N.; O’Connor, Shelby L.; O’Connor, David H.

    2009-01-01

    Cynomolgus macaques (Macaca fascicularis) provide increasingly common models for infectious disease research. Several geographically distinct populations of these macaques from Southeast Asia and the Indian Ocean island of Mauritius are available for pathogenesis studies. Though host genetics may profoundly impact results of such studies, similarities and differences between populations are often overlooked. In this study we identified 47 full-length MHC class I nucleotide sequences in 16 cynomolgus macaques of Filipino origin. The majority of MHC class I sequences characterized (39 of 47) were unique to this regional population. However, we discovered eight sequences with perfect identity and six sequences with close similarity to previously defined MHC class I sequences from other macaque populations. We identified two ancestral MHC haplotypes that appear to be shared between Filipino and Mauritian cynomolgus macaques, notably a Mafa-B haplotype that has previously been shown to protect Mauritian cynomolgus macaques against challenge with a simian/human immunodeficiency virus, SHIV89.6P. We also identified a Filipino cynomolgus macaque MHC class I sequence for which the predicted protein sequence differs from Mamu-B*17 by a single amino acid. This is important because Mamu-B*17 is strongly associated with protection against simian immunodeficiency virus (SIV) challenge in Indian rhesus macaques. These findings have implications for the evolutionary history of Filipino cynomolgus macaques as well as for the use of this model in SIV/SHIV research protocols. PMID:19107381

  18. OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals.

    PubMed

    Douzery, Emmanuel J P; Scornavacca, Celine; Romiguier, Jonathan; Belkhir, Khalid; Galtier, Nicolas; Delsuc, Frédéric; Ranwez, Vincent

    2014-07-01

    Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  19. Comparative studies of gene expression and the evolution of gene regulation

    PubMed Central

    Romero, Irene Gallego; Ruvinsky, Ilya; Gilad, Yoav

    2014-01-01

    The hypothesis that differences in gene regulation play an important role in speciation and adaptation is more than 40 years old. With the advent of new sequencing technologies, we are able to characterize and study gene expression levels and associated regulatory mechanisms in a large number of individuals and species at unprecedented resolution and scale. We have thus gained new insights into the evolutionary pressures that shape gene expression levels, as well as developed an appreciation for the relative importance of evolutionary changes in different regulatory genetic and epigenetic mechanisms. The current challenge is to link gene regulatory changes to adaptive evolution of complex phenotypes. Here we mainly focus on comparative studies in primates, and how they are complemented by studies in model organisms. PMID:22705669

  20. The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects

    PubMed Central

    2011-01-01

    Background The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. Results The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. Conclusions The new genome shares many features with known neuropteran genomes but differs in its low A+T content. Comparative analysis of neuropterid mitochondrial genes showed that they experienced distinct evolutionary patterns. Both tRNA families and ribosomal RNAs show composite substitution pathways. The neuropterid mitochondrial genome is characterized by a complex evolutionary history. PMID:21569260

  1. Evaluating, Comparing, and Interpreting Protein Domain Hierarchies

    PubMed Central

    2014-01-01

    Abstract Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific aspects of a hierarchy, for probing the criteria underlying its construction and for direct comparisons between hierarchies. Information theoretical notions are used to quantify the contributions of specific hierarchical features to the underlying statistical model. Such features include subhierarchies, sequence subgroups, individual sequences, and subgroup-associated signature patterns. Underlying properties are graphically displayed in plots of each specific feature's contributions, in heat maps of pattern residue conservation, in “contrast alignments,” and through cross-mapping of subgroups between hierarchies. Together, these approaches provide a deeper understanding of protein domain functional divergence, reveal uncertainties caused by inconsistent patterns of sequence conservation, and help resolve conflicts between competing hierarchies. PMID:24559108

  2. Evolution of the β-adrenoreceptors in vertebrates.

    PubMed

    Zavala, Kattina; Vandewege, Michael W; Hoffmann, Federico G; Opazo, Juan C

    2017-01-01

    The study of the evolutionary history of genes related to human disease lies at the interface of evolution and medicine. These studies provide the evolutionary context on which medical researchers should work, and are also useful in providing information to suggest further genetic experiments, especially in model species where genetic manipulations can be made. Here we studied the evolution of the β-adrenoreceptor gene family in vertebrates with the aim of adding an evolutionary framework to the already abundant physiological information. Our results show that in addition to the three already described vertebrate β-adrenoreceptor genes there is an additional group containing cyclostome sequences. We suggest that β-adrenoreceptors diversified as a product of the two whole genome duplications that occurred in the ancestor of vertebrates. Gene expression patterns are in general consistent across species, suggesting that expression dynamics were established early in the evolutionary history of vertebrates, and have been maintained since then. Finally, amino acid polymorphisms that are associated to pathological conditions in humans appear to be common in non-human mammals, suggesting that the phenotypic effects of these mutations depend on epistatic interaction with other positions. The evolutionary analysis of the β-adrenoreceptors delivers new insights about the diversity of these receptors in vertebrates, the evolution of the expression patterns and a comparative perspective regarding the polymorphisms that in humans are linked to pathological conditions. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Functional and mechanistic diversity of distal transcription enhancers

    PubMed Central

    Bulger, Michael; Groudine, Mark

    2013-01-01

    Biological differences among metazoans, and between cell types in a given organism, arise in large part due to differences in gene expression patterns. The sequencing of multiple metazoan genomes, coupled with recent advances in genome-wide analysis of histone modifications and transcription factor binding, has revealed that among regulatory DNA sequences, gene-distal enhancers appear to exhibit the greatest diversity and cell-type specificity. Moreover, such elements are emerging as important targets for mutations that can give rise to disease and to genetic variability that underlies evolutionary change. Studies of long-range interactions between distal genomic sequences in the nucleus indicate that enhancers are often important determinants of nuclear organization, contributing to a general model for enhancer function that involves direct enhancer-promoter contact. In a number of systems, however, mechanisms for enhancer function are emerging that do not fit solely within such a model, suggesting that enhancers as a class of DNA regulatory element may be functionally and mechanistically diverse. PMID:21295696

  4. Protein 8-class secondary structure prediction using conditional neural fields.

    PubMed

    Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo

    2011-10-01

    Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Characterization of the transcriptome of an ecologically important avian species, the Vinous-throated Parrotbill Paradoxornis webbianus bulomachus (Paradoxornithidae; Aves)

    PubMed Central

    2012-01-01

    Background Adaptive divergence driven by environmental heterogeneity has long been a fascinating topic in ecology and evolutionary biology. The study of the genetic basis of adaptive divergence has, however, been greatly hampered by a lack of genomic information. The recent development of transcriptome sequencing provides an unprecedented opportunity to generate large amounts of genomic data for detailed investigations of the genetics of adaptive divergence in non-model organisms. Herein, we used the Illumina sequencing platform to sequence the transcriptome of brain and liver tissues from a single individual of the Vinous-throated Parrotbill, Paradoxornis webbianus bulomachus, an ecologically important avian species in Taiwan with a wide elevational range of sea level to 3100 m. Results Our 10.1 Gbp of sequences were first assembled based on Zebra Finch (Taeniopygia guttata) and chicken (Gallus gallus) RNA references. The remaining reads were then de novo assembled. After filtering out contigs with low coverage (<10X), we retained 67,791 of 487,336 contigs, which covered approximately 5.3% of the P. w. bulomachus genome. Of 7,779 contigs retained for a top-hit species distribution analysis, the majority (about 86%) were matched to known Zebra Finch and chicken transcripts. We also annotated 6,365 contigs to gene ontology (GO) terms: in total, 122 GO-slim terms were assigned, including biological process (41%), molecular function (32%), and cellular component (27%). Many potential genetic markers for future adaptive genomic studies were also identified: 8,589 single nucleotide polymorphisms, 1,344 simple sequence repeats and 109 candidate genes that might be involved in elevational or climate adaptation. Conclusions Our study shows that transcriptome data can serve as a rich genetic resource, even for a single run of short-read sequencing from a single individual of a non-model species. This is the first study providing transcriptomic information for species in the avian superfamily Sylvioidea, which comprises more than 1,000 species. Our data can be used to study adaptive divergence in heterogeneous environments and investigate other important ecological and evolutionary questions in parrotbills from different populations and even in other species in the Sylvioidea. PMID:22530590

  6. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous–Paleogene boundary

    PubMed Central

    Vanneste, Kevin; Baele, Guy; Maere, Steven; Van de Peer, Yves

    2014-01-01

    Ancient whole-genome duplications (WGDs), also referred to as paleopolyploidizations, have been reported in most evolutionary lineages. Their attributed role remains a major topic of discussion, ranging from an evolutionary dead end to a road toward evolutionary success, with evidence supporting both fates. Previously, based on dating WGDs in a limited number of plant species, we found a clustering of angiosperm paleopolyploidizations around the Cretaceous–Paleogene (K–Pg) extinction event about 66 million years ago. Here we revisit this finding, which has proven controversial, by combining genome sequence information for many more plant lineages and using more sophisticated analyses. We include 38 full genome sequences and three transcriptome assemblies in a Bayesian evolutionary analysis framework that incorporates uncorrelated relaxed clock methods and fossil uncertainty. In accordance with earlier findings, we demonstrate a strongly nonrandom pattern of genome duplications over time with many WGDs clustering around the K–Pg boundary. We interpret these results in the context of recent studies on invasive polyploid plant species, and suggest that polyploid establishment is promoted during times of environmental stress. We argue that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time of polyploidization could mitigate the stark contrast in the proposed evolutionary fates of polyploids. PMID:24835588

  7. rbcL gene sequences provide evidence for the evolutionary lineages of leptosporangiate ferns.

    PubMed

    Hasebe, M; Omori, T; Nakazawa, M; Sano, T; Kato, M; Iwatsuki, K

    1994-06-07

    Pteriodophytes have a longer evolutionary history than any other vascular land plant and, therefore, have endured greater loss of phylogenetically informative information. This factor has resulted in substantial disagreements in evaluating characters and, thus, controversy in establishing a stable classification. To compare competing classifications, we obtained DNA sequences of a chloroplast gene. The sequence of 1206 nt of the large subunit of the ribulose-bisphosphate carboxylase gene (rbcL) was determined from 58 species, representing almost all families of leptosporangiate ferns. Phlogenetic trees were inferred by the neighbor-joining and the parsimony methods. The two methods produced almost identical phylogenetic trees that provided insights concerning major general evolutionary trends in the leptosporangiate ferns. Interesting findings were as follows: (i) two morphologically distinct heterosporous water ferns, Marsilea and Salvinia, are sister genera; (ii) the tree ferns (Cyatheaceae, Dicksoniaceae, and Metaxyaceae) are monophyletic; and (iii) polypodioids are distantly related to the gleichenioids in spite of the similarity of their exindusiate soral morphology and are close to the higher indusiate ferns. In addition, the affinities of several "problematic genera" were assessed.

  8. Benchmark cool companions: ages and abundances for the PZ Telescopii system

    NASA Astrophysics Data System (ADS)

    Jenkins, J. S.; Pavlenko, Y. V.; Ivanyuk, O.; Gallardo, J.; Jones, M. I.; Day-Jones, A. C.; Jones, H. R. A.; Ruiz, M. T.; Pinfield, D. J.; Yakovina, L.

    2012-03-01

    We present new ages and abundance measurements for the pre-main-sequence star PZ Telescopii (more commonly known as PZ Tel). PZ Tel was recently found to host a young and low-mass companion. Such companions, whether they are brown dwarfs or planetary systems, can attain benchmark status by detailed study of the properties of the primary, and then evolutionary and bulk characteristics can be inferred for the companion. Using Fibre-fed Extended Range Optical Spectrograph spectra, we have measured atomic abundances (e.g. Fe and Li) and chromospheric activity for PZ Tel and used these to obtain the metallicity and age estimates for the companion. We have also determined the age independently using the latest evolutionary models. We find PZ Tel A to be a rapidly rotating (v sin i= 73 ± 5 km s-1), approximately solar metallicity star [log N(Fe) =-4.37 ± 0.06 dex or [Fe/H] = 0.05 ± 0.20 dex]. We measure a non-local thermodynamic equilibrium lithium abundance of log N(Li) = 3.1 ± 0.1 dex, which from depletion models gives rise to an age of 7? Myr for the system. Our measured chromospheric activity (? of -4.12) returns an age of 26 ± 2 Myr, as does fitting pre-main-sequence evolutionary tracks (τevol= 22 ± 3 Myr), both of these are in disagreement with the lithium age. We speculate on reasons for this difference and introduce new models for lithium depletion that incorporate both rotation and magnetic field effects. We also synthesize solar, metal-poor and metal-rich substellar evolutionary models to better determine the bulk properties of PZ Tel B, showing that PZ Tel B is probably more massive than previous estimates, meaning the companion is not a giant exoplanet, even though a planetary-like formation origin can go some way to describing the distribution of benchmark binaries currently known. We show how PZ Tel B compares to other currently known age and metallicity benchmark systems and try to empirically test the effects of dust opacity as a function of metallicity on the near-infrared colours of brown dwarfs. Current models suggest that in the near-infrared observations are more sensitive to low-mass companions orbiting more metal rich stars. We also look for trends between infrared photometry and metallicity amongst a growing population of substellar benchmark objects, and identify the need for more data in mass-age-metallicity parameter space.

  9. The Embryonic Transcriptome of the Red-Eared Slider Turtle (Trachemys scripta)

    PubMed Central

    Kaplinsky, Nicholas J.; Gilbert, Scott F.; Cebra-Thomas, Judith; Lilleväli, Kersti; Saare, Merly; Chang, Eric Y.; Edelman, Hannah E.; Frick, Melissa A.; Guan, Yin; Hammond, Rebecca M.; Hampilos, Nicholas H.; Opoku, David S. B.; Sariahmed, Karim; Sherman, Eric A.; Watson, Ray

    2013-01-01

    The bony shell of the turtle is an evolutionary novelty not found in any other group of animals, however, research into its formation has suggested that it has evolved through modification of conserved developmental mechanisms. Although these mechanisms have been extensively characterized in model organisms, the tools for characterizing them in non-model organisms such as turtles have been limited by a lack of genomic resources. We have used a next generation sequencing approach to generate and assemble a transcriptome from stage 14 and 17 Trachemys scripta embryos, stages during which important events in shell development are known to take place. The transcriptome consists of 231,876 sequences with an N50 of 1,166 bp. GO terms and EC codes were assigned to the 61,643 unique predicted proteins identified in the transcriptome sequences. All major GO categories and metabolic pathways are represented in the transcriptome. Transcriptome sequences were used to amplify several cDNA fragments designed for use as RNA in situ probes. One of these, BMP5, was hybridized to a T. scripta embryo and exhibits both conserved and novel expression patterns. The transcriptome sequences should be of broad use for understanding the evolution and development of the turtle shell and for annotating any future T. scripta genome sequences. PMID:23840449

  10. Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.

    PubMed

    Carretero-Paulet, Lorenzo; Albert, Victor A

    2016-01-01

    The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.

  11. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

    PubMed

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R

    2009-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.

  12. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

    PubMed Central

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

    2010-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534

  13. Origin and Reticulate Evolutionary Process of Wheatgrass Elymus trachycaulus (Triticeae: Poaceae)

    PubMed Central

    Zuo, Hongwei; Wu, Panpan; Wu, Dexiang; Sun, Genlou

    2015-01-01

    To study origin and evolutionary dynamics of tetraploid Elymus trachycaulus that has been cytologically defined as containing StH genomes, thirteen accessions of E. trachycaulus were analyzed using two low-copy nuclear gene Pepc (phosphoenolpyruvate carboxylase) and Rpb2 (the second largest subunit of RNA polymerase II), and one chloroplast region trnL–trnF (spacer between the tRNA Leu (UAA) gene and the tRNA-Phe (GAA) gene). Our chloroplast data indicated that Pseudoroegneria (St genome) was the maternal donor of E. trachycaulus. Rpb2 data indicated that the St genome in E. trachycaulus was originated from either P. strigosa, P. stipifolia, P. spicata or P. geniculate. The Hordeum (H genome)-like sequences of E. trachycaulus are polyphyletic in the Pepc tree, suggesting that the H genome in E. trachycaulus was contributed by multiple sources, whether due to multiple origins or introgression resulting from subsequent hybridization. Failure to recovering St copy of Pepc sequence in most accessions of E. trachycaulus might be caused by genome convergent evolution in allopolyploids. Multiple copies of H-like Pepc sequence from each accession with relative large deletions and insertions might be caused by either instability of Pepc sequence in H- genome or incomplete concerted evolution. Our results highlighted complex evolutionary history of E. trachycaulus. PMID:25946188

  14. Evolution of thermotolerance in hot spring cyanobacteria of the genus Synechococcus

    NASA Technical Reports Server (NTRS)

    Miller, S. R.; Castenholz, R. W.

    2000-01-01

    The extension of ecological tolerance limits may be an important mechanism by which microorganisms adapt to novel environments, but it may come at the evolutionary cost of reduced performance under ancestral conditions. We combined a comparative physiological approach with phylogenetic analyses to study the evolution of thermotolerance in hot spring cyanobacteria of the genus Synechococcus. Among the 20 laboratory clones of Synechococcus isolated from collections made along an Oregon hot spring thermal gradient, four different 16S rRNA gene sequences were identified. Phylogenies constructed by using the sequence data indicated that the clones were polyphyletic but that three of the four sequence groups formed a clade. Differences in thermotolerance were observed for clones with different 16S rRNA gene sequences, and comparison of these physiological differences within a phylogenetic framework provided evidence that more thermotolerant lineages of Synechococcus evolved from less thermotolerant ancestors. The extension of the thermal limit in these bacteria was correlated with a reduction in the breadth of the temperature range for growth, which provides evidence that enhanced thermotolerance has come at the evolutionary cost of increased thermal specialization. This study illustrates the utility of using phylogenetic comparative methods to investigate how evolutionary processes have shaped historical patterns of ecological diversification in microorganisms.

  15. Dynamic Nucleotide Mutation Gradients and Control Region Usage in Squamate Reptile Mitochondrial Genomes

    PubMed Central

    Castoe, T.A.; Gu, W.; de Koning, A.P.J.; Daza, J.M.; Jiang, Z.J.; Parkinson, C.L.; Pollock, D.D.

    2010-01-01

    Gradients of nucleotide bias and substitution rates occur in vertebrate mitochondrial genomes due to the asymmetric nature of the replication process. The evolution of these gradients has previously been studied in detail in primates, but not in other vertebrate groups. From the primate study, the strengths of these gradients are known to evolve in ways that can substantially alter the substitution process, but it is unclear how rapidly they evolve over evolutionary time or how different they may be in different lineages or groups of vertebrates. Given the importance of mitochondrial genomes in phylogenetics and molecular evolutionary research, a better understanding of how asymmetric mitochondrial substitution gradients evolve would contribute key insights into how this gradient evolution may mislead evolutionary inferences, and how it may also be incorporated into new evolutionary models. Most snake mitochondrial genomes have an additional interesting feature, 2 nearly identical control regions, which vary among different species in the extent that they are used as origins of replication. Given the expanded sampling of complete snake genomes currently available, together with 2 additional snakes sequenced in this study, we reexamined gradient strength and CR usage in alethinophidian snakes as well as several lizards that possess dual CRs. Our results suggest that nucleotide substitution gradients (and corresponding nucleotide bias) and CR usage is highly labile over the ∼200 m.y. of squamate evolution, and demonstrates greater overall variability than previously shown in primates. The evidence for the existence of such gradients, and their ability to evolve rapidly and converge among unrelated species suggests that gradient dynamics could easily mislead phylogenetic and molecular evolutionary inferences, and argues strongly that these dynamics should be incorporated into phylogenetic models. PMID:20215734

  16. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

    PubMed

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

    2016-07-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

  17. Beyond Linear Sequence Comparisons: The use of genome-levelcharacters for phylogenetic reconstruction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.

    2004-11-27

    Although the phylogenetic relationships of many organisms have been convincingly resolved by the comparisons of nucleotide or amino acid sequences, others have remained equivocal despite great effort. Now that large-scale genome sequencing projects are sampling many lineages, it is becoming feasible to compare large data sets of genome-level features and to develop this as a tool for phylogenetic reconstruction that has advantages over conventional sequence comparisons. Although it is unlikely that these will address a large number of evolutionary branch points across the broad tree of life due to the infeasibility of such sampling, they have great potential for convincinglymore » resolving many critical, contested relationships for which no other data seems promising. However, it is important that we recognize potential pitfalls, establish reasonable standards for acceptance, and employ rigorous methodology to guard against a return to earlier days of scenario-driven evolutionary reconstructions.« less

  18. Models for the a subunits of the Thermus thermophilus V/A-ATPase and Saccharomyces cerevisiae V-ATPase enzymes by cryo-EM and evolutionary covariance

    PubMed Central

    Schep, Daniel G.; Rubinstein, John L.

    2016-01-01

    Rotary ATPases couple ATP synthesis or hydrolysis to proton translocation across a membrane. However, understanding proton translocation has been hampered by a lack of structural information for the membrane-embedded a subunit. The V/A-ATPase from the eubacterium Thermus thermophilus is similar in structure to the eukaryotic V-ATPase but has a simpler subunit composition and functions in vivo to synthesize ATP rather than pump protons. We determined the T. thermophilus V/A-ATPase structure by cryo-EM at 6.4 Å resolution. Evolutionary covariance analysis allowed tracing of the a subunit sequence within the map, providing a complete model of the rotary ATPase. Comparing the membrane-embedded regions of the T. thermophilus V/A-ATPase and eukaryotic V-ATPase from Saccharomyces cerevisiae allowed identification of the α-helices that belong to the a subunit and revealed the existence of previously unknown subunits in the eukaryotic enzyme. Subsequent evolutionary covariance analysis enabled construction of a model of the a subunit in the S. cerevisae V-ATPase that explains numerous biochemical studies of that enzyme. Comparing the two a subunit structures determined here with a structure of the distantly related a subunit from the bovine F-type ATP synthase revealed a conserved pattern of residues, suggesting a common mechanism for proton transport in all rotary ATPases. PMID:26951669

  19. On the Statistical Properties of the Lower Main Sequence

    NASA Astrophysics Data System (ADS)

    Angelou, George C.; Bellinger, Earl P.; Hekker, Saskia; Basu, Sarbani

    2017-04-01

    Astronomy is in an era where all-sky surveys are mapping the Galaxy. The plethora of photometric, spectroscopic, asteroseismic, and astrometric data allows us to characterize the comprising stars in detail. Here we quantify to what extent precise stellar observations reveal information about the properties of a star, including properties that are unobserved, or even unobservable. We analyze the diagnostic potential of classical and asteroseismic observations for inferring stellar parameters such as age, mass, and radius from evolutionary tracks of solar-like oscillators on the lower main sequence. We perform rank correlation tests in order to determine the capacity of each observable quantity to probe structural components of stars and infer their evolutionary histories. We also analyze the principal components of classic and asteroseismic observables to highlight the degree of redundancy present in the measured quantities and demonstrate the extent to which information of the model parameters can be extracted. We perform multiple regression using combinations of observable quantities in a grid of evolutionary simulations and appraise the predictive utility of each combination in determining the properties of stars. We identify the combinations that are useful and provide limits to where each type of observable quantity can reveal information about a star. We investigate the accuracy with which targets in the upcoming TESS and PLATO missions can be characterized. We demonstrate that the combination of observations from GAIA and PLATO will allow us to tightly constrain stellar masses, ages, and radii with machine learning for the purposes of Galactic and planetary studies.

  20. The evolutionary fate of the chloroplast and nuclear rps16 genes as revealed through the sequencing and comparative analyses of four novel legume chloroplast genomes from Lupinus.

    PubMed

    Keller, J; Rousseau-Gueutin, M; Martin, G E; Morice, J; Boutte, J; Coissac, E; Ourari, M; Aïnouche, M; Salmon, A; Cabello-Hurtado, F; Aïnouche, A

    2017-08-01

    The Fabaceae family is considered as a model system for understanding chloroplast genome evolution due to the presence of extensive structural rearrangements, gene losses and localized hypermutable regions. Here, we provide sequences of four chloroplast genomes from the Lupinus genus, belonging to the underinvestigated Genistoid clade. Notably, we found in Lupinus species the functional loss of the essential rps16 gene, which was most likely replaced by the nuclear rps16 gene that encodes chloroplast and mitochondrion targeted RPS16 proteins. To study the evolutionary fate of the rps16 gene, we explored all available plant chloroplast, mitochondrial and nuclear genomes. Whereas no plant mitochondrial genomes carry an rps16 gene, many plants still have a functional nuclear and chloroplast rps16 gene. Ka/Ks ratios revealed that both chloroplast and nuclear rps16 copies were under purifying selection. However, due to the dual targeting of the nuclear rps16 gene product and the absence of a mitochondrial copy, the chloroplast gene may be lost. We also performed comparative analyses of lupine plastomes (SNPs, indels and repeat elements), identified the most variable regions and examined their phylogenetic utility. The markers identified here will help to reveal the evolutionary history of lupines, Genistoids and closely related clades. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  1. Comparative and Evolutionary Analysis of Grass Pollen Allergens Using Brachypodium distachyon as a Model System

    PubMed Central

    Sharma, Akanksha; Sharma, Niharika; Bhalla, Prem; Singh, Mohan

    2017-01-01

    Comparative genomics have facilitated the mining of biological information from a genome sequence, through the detection of similarities and differences with genomes of closely or more distantly related species. By using such comparative approaches, knowledge can be transferred from the model to non-model organisms and insights can be gained in the structural and evolutionary patterns of specific genes. In the absence of sequenced genomes for allergenic grasses, this study was aimed at understanding the structure, organisation and expression profiles of grass pollen allergens using the genomic data from Brachypodium distachyon as it is phylogenetically related to the allergenic grasses. Combining genomic data with the anther RNA-Seq dataset revealed 24 pollen allergen genes belonging to eight allergen groups mapping on the five chromosomes in B. distachyon. High levels of anther-specific expression profiles were observed for the 24 identified putative allergen-encoding genes in Brachypodium. The genomic evidence suggests that gene encoding the group 5 allergen, the most potent trigger of hay fever and allergic asthma originated as a pollen specific orphan gene in a common grass ancestor of Brachypodium and Triticiae clades. Gene structure analysis showed that the putative allergen-encoding genes in Brachypodium either lack or contain reduced number of introns. Promoter analysis of the identified Brachypodium genes revealed the presence of specific cis-regulatory sequences likely responsible for high anther/pollen-specific expression. With the identification of putative allergen-encoding genes in Brachypodium, this study has also described some important plant gene families (e.g. expansin superfamily, EF-Hand family, profilins etc) for the first time in the model plant Brachypodium. Altogether, the present study provides new insights into structural characterization and evolution of pollen allergens and will further serve as a base for their functional characterization in related grass species. PMID:28103252

  2. Rooting gene trees without outgroups: EP rooting.

    PubMed

    Sinsheimer, Janet S; Little, Roderick J A; Lake, James A

    2012-01-01

    Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167-181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301-316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60-76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489-493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763-766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255-260).

  3. Rooting Gene Trees without Outgroups: EP Rooting

    PubMed Central

    Sinsheimer, Janet S.; Little, Roderick J. A.; Lake, James A.

    2012-01-01

    Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167–181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301–316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60–76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489–493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763–766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255–260). PMID:22593551

  4. The power and promise of RNA-seq in ecology and evolution.

    PubMed

    Todd, Erica V; Black, Michael A; Gemmell, Neil J

    2016-03-01

    Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in nonmodel species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilize biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritize sequencing depth over replication fail to capitalize on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesize progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike. © 2016 John Wiley & Sons Ltd.

  5. Reproduction, symbiosis, and the eukaryotic cell

    PubMed Central

    Godfrey-Smith, Peter

    2015-01-01

    This paper develops a conceptual framework for addressing questions about reproduction, individuality, and the units of selection in symbiotic associations, with special attention to the origin of the eukaryotic cell. Three kinds of reproduction are distinguished, and a possible evolutionary sequence giving rise to a mitochondrion-containing eukaryotic cell from an endosymbiotic partnership is analyzed as a series of transitions between each of the three forms of reproduction. The sequence of changes seen in this “egalitarian” evolutionary transition is compared with those that apply in “fraternal” transitions, such as the evolution of multicellularity in animals. PMID:26286983

  6. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  7. Bioinformatics analysis and genetic diversity of the poliovirus.

    PubMed

    Liu, Yanhan; Ma, Tengfei; Liu, Jianzhu; Zhao, Xiaona; Cheng, Ziqiang; Guo, Huijun; Wang, Shujing; Xu, Ruixue

    2014-12-01

    Poliomyelitis, a disease which can manifest as muscle paralysis, is caused by the poliovirus, which is a human enterovirus and member of the family Picornaviridae that usually transmits by the faecal-oral route. The viruses of the OPV (oral poliovirus attenuated-live vaccine) strains can mutate in the human intestine during replication and some of these mutations can lead to the recovery of serious neurovirulence. Informatics research of the poliovirus genome can be used to explain further the characteristics of this virus. In this study, sequences from 100 poliovirus isolates were acquired from GenBank. To determine the evolutionary relationship between the strains, we compared and analysed the sequences of the complete poliovirus genome and the VP1 region. The reconstructed phylogenetic trees for the complete sequences and the VP1 sequences were both divided into two branches, indicating that the genetic relationships of the whole poliovirus genome and the VP1 sequences are very similar. This branching indicates that the virulence and pathogenicity of poliomyelitis may be associated with the VP1 region. Sequence alignment of the VP1 region revealed numerous mutation sites in which mutation rates of >30 % were detected. In a group of strains recorded in the USA, mutation sites and mutation types were the same and this may be associated with their distribution in the evolutionary tree and their genetic relationship. In conclusion, the genetic evolutionary relationships of poliovirus isolate sequences are determined to a great extent by the VP1 protein, and poliovirus strains located on the same branch of the phylogenetic tree contain the same mutation spots and mutation types. Hence, the genetic characteristics of the VP1 region in the poliovirus genome should be analysed to identify the transmission route of poliovirus and provide the basis of viral immunity development. © 2014 The Authors.

  8. spa Typing and Multilocus Sequence Typing Show Comparable Performance in a Macroepidemiologic Study of Staphylococcus aureus in the United States

    PubMed Central

    O'Hara, F. Patrick; Suaya, Jose A.; Ray, G. Thomas; Baxter, Roger; Brown, Megan L.; Mera, Robertino M.; Close, Nicole M.; Thomas, Elizabeth

    2016-01-01

    A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants. PMID:26669861

  9. spa Typing and Multilocus Sequence Typing Show Comparable Performance in a Macroepidemiologic Study of Staphylococcus aureus in the United States.

    PubMed

    O'Hara, F Patrick; Suaya, Jose A; Ray, G Thomas; Baxter, Roger; Brown, Megan L; Mera, Robertino M; Close, Nicole M; Thomas, Elizabeth; Amrine-Madsen, Heather

    2016-01-01

    A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants.

  10. Complete Columbian mammoth mitogenome suggests interbreeding with woolly mammoths

    PubMed Central

    2011-01-01

    Background Late Pleistocene North America hosted at least two divergent and ecologically distinct species of mammoth: the periglacial woolly mammoth (Mammuthus primigenius) and the subglacial Columbian mammoth (Mammuthus columbi). To date, mammoth genetic research has been entirely restricted to woolly mammoths, rendering their genetic evolution difficult to contextualize within broader Pleistocene paleoecology and biogeography. Here, we take an interspecific approach to clarifying mammoth phylogeny by targeting Columbian mammoth remains for mitogenomic sequencing. Results We sequenced the first complete mitochondrial genome of a classic Columbian mammoth, as well as the first complete mitochondrial genome of a North American woolly mammoth. Somewhat contrary to conventional paleontological models, which posit that the two species were highly divergent, the M. columbi mitogenome we obtained falls securely within a subclade of endemic North American M. primigenius. Conclusions Though limited, our data suggest that the two species interbred at some point in their evolutionary histories. One potential explanation is that woolly mammoth haplotypes entered Columbian mammoth populations via introgression at subglacial ecotones, a scenario with compelling parallels in extant elephants and consistent with certain regional paleontological observations. This highlights the need for multi-genomic data to sufficiently characterize mammoth evolutionary history. Our results demonstrate that the use of next-generation sequencing technologies holds promise in obtaining such data, even from non-cave, non-permafrost Pleistocene depositional contexts. PMID:21627792

  11. Comparative transcriptomics of Entelegyne spiders (Araneae, Entelegynae), with emphasis on molecular evolution of orphan genes.

    PubMed

    Carlson, David E; Hedin, Marshal

    2017-01-01

    Next-generation sequencing technology is rapidly transforming the landscape of evolutionary biology, and has become a cost-effective and efficient means of collecting exome information for non-model organisms. Due to their taxonomic diversity, production of interesting venom and silk proteins, and the relative scarcity of existing genomic resources, spiders in particular are excellent targets for next-generation sequencing (NGS) methods. In this study, the transcriptomes of six entelegyne spider species from three genera (Cicurina travisae, C. vibora, Habronattus signatus, H. ustulatus, Nesticus bishopi, and N. cooperi) were sequenced and de novo assembled. Each assembly was assessed for quality and completeness and functionally annotated using gene ontology information. Approximately 100 transcripts with evidence of homology to venom proteins were discovered. After identifying more than 3,000 putatively orthologous genes across all six taxa, we used comparative analyses to identify 24 instances of positively selected genes. In addition, between ~ 550 and 1,100 unique orphan genes were found in each genus. These unique, uncharacterized genes exhibited elevated rates of amino acid substitution, potentially consistent with lineage-specific adaptive evolution. The data generated for this study represent a valuable resource for future phylogenetic and molecular evolutionary research, and our results provide new insight into the forces driving genome evolution in taxa that span the root of entelegyne spider phylogeny.

  12. Genomic Diversity and Evolution of the Lyssaviruses

    PubMed Central

    Delmas, Olivier; Holmes, Edward C.; Talbi, Chiraz; Larrous, Florence; Dacheux, Laurent; Bouchier, Christiane; Bourhy, Hervé

    2008-01-01

    Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as ‘Lagos Bat’. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses. PMID:18446239

  13. Combining Physicochemical and Evolutionary Information for Protein Contact Prediction

    PubMed Central

    Schneider, Michael; Brock, Oliver

    2014-01-01

    We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. PMID:25338092

  14. Internalin profiling and multilocus sequence typing suggest four Listeria innocua subgroups with different evolutionary distances from Listeria monocytogenes

    PubMed Central

    2010-01-01

    Background Ecological, biochemical and genetic resemblance as well as clear differences of virulence between L. monocytogenes and L. innocua make this bacterial clade attractive as a model to examine evolution of pathogenicity. This study was attempted to examine the population structure of L. innocua and the microevolution in the L. innocua-L. monocytogenes clade via profiling of 37 internalin genes and multilocus sequence typing based on the sequences of 9 unlinked genes gyrB, sigB, dapE, hisJ, ribC, purM, gap, tuf and betL. Results L. innocua was genetically monophyletic compared to L. monocytogenes, and comprised four subgroups. Subgroups A and B correlated with internalin types 1 and 3 (except the strain 0063 belonging to subgroup C) and internalin types 2 and 4 respectively. The majority of L. innocua strains belonged to these two subgroups. Subgroup A harbored a whole set of L. monocytogenes-L. innocua common and L. innocua-specific internalin genes, and displayed higher recombination rates than those of subgroup B, including the relative frequency of occurrence of recombination versus mutation (ρ/θ) and the relative effect of recombination versus point mutation (r/m). Subgroup A also exhibited a significantly smaller exterior/interior branch length ratio than expected under the coalescent model, suggesting a recent expansion of its population size. The phylogram based on the analysis with correction for recombination revealed that the time to the most recent common ancestor (TMRCA) of L. innocua subgroups A and B were similar. Additionally, subgroup D, which correlated with internalin type 5, branched off from the other three subgroups. All L. innocua strains lacked seventeen virulence genes found in L. monocytogenes (except for the subgroup D strain L43 harboring inlJ and two subgroup B strains bearing bsh) and were nonpathogenic to mice. Conclusions L. innocua represents a young species descending from L. monocytogenes and comprises four subgroups: two major subgroups A and B, and one atypical subgroup D serving as a link between L. monocytogenes and L. innocua in the evolutionary chain. Although subgroups A and B appeared at approximately the same time, subgroup A seems to have experienced a recent expansion of the population size with higher recombination frequency and effect than those of subgroup B, and might represent the possible evolutionary direction towards adaptation to enviroments. The evolutionary history in the L. monocytogenes-L. innocua clade represents a rare example of evolution towards reduced virulence of pathogens. PMID:20356375

  15. ESTIMATING THE RADIUS OF THE CONVECTIVE CORE OF MAIN-SEQUENCE STARS FROM OBSERVED OSCILLATION FREQUENCIES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Wuming, E-mail: yangwuming@bnu.edu.cn, E-mail: yangwuming@ynao.ac.cn

    The determination of the size of the convective core of main-sequence stars is usually dependent on the construction of models of stars. Here we introduce a method to estimate the radius of the convective core of main-sequence stars with masses between about 1.1 and 1.5 M {sub ⊙} from observed frequencies of low-degree p -modes. A formula is proposed to achieve the estimation. The values of the radius of the convective core of four known stars are successfully estimated by the formula. The radius of the convective core of KIC 9812850 estimated by the formula is 0.140 ± 0.028 Rmore » {sub ⊙}. In order to confirm this prediction, a grid of evolutionary models was computed. The value of the convective-core radius of the best-fit model of KIC 9812850 is 0.149 R {sub ⊙}, which is in good agreement with that estimated by the formula from observed frequencies. The formula aids in understanding the interior structure of stars directly from observed frequencies. The understanding is not dependent on the construction of models.« less

  16. Molecular Phylogenetics: Concepts for a Newcomer.

    PubMed

    Ajawatanawong, Pravech

    Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.

  17. Cocoa/Cotton Comparative Genomics

    USDA-ARS?s Scientific Manuscript database

    With genome sequence from two members of the Malvaceae family recently made available, we are exploring syntenic relationships, gene content, and evolutionary trajectories between the cacao and cotton genomes. An assembly of cacao (Theobroma cacao) using Illumina and 454 sequence technology yielded ...

  18. The Plasmodium gaboni genome illuminates allelic dimorphism of immunologically important surface antigens in P. falciparum.

    PubMed

    Roy, Scott William

    2015-12-01

    In the deadly human malaria parasite Plasmodium falciparum, several major merozoite surface proteins (MSPs) show a striking pattern of allelic diversity called allelic dimorphism (AD). In AD, the vast majority of observed alleles fall into two highly divergent allelic classes, with recombinant alleles being rare or not observed, presumably due to repression by natural selection (recombination suppression, or RS). The three AD loci, merozoite surface proteins (MSPs) 1, 2, and 6, along with MSP3, which also exhibits RS among four allelic classes, can be collectively called AD/RS. The causes of AD/RS and the evolutionary history of allelic diversity at these loci remain mysterious. The few available sequences from a single closely related chimpanzee parasite, P. reichenowi, have suggested that for 3/4 loci, AD/RS is an ancient state that has been retained in P. falciparum since well before the P. falciparum-P. reichenowi ancestor. On the other hand, based on comparative sequence analysis, we recently suggested that (i) AD/RS P. falciparum loci have undergone interallelic recombination over longer evolutionary times (on the timescale of recent speciation events), and thus (ii) AD/RS may be a recent phenomenon. The recent publication of genomic sequencing efforts for P. gaboni, an outgroup to P. falciparum and P. reichenowi, allows for improved reconstruction of the evolutionary history of these loci. In this work, I report genic sequence for P. gaboni for all four AD/RS P. falciparum loci (MSP1, 2, 3, and 6). Comparison of these sequences with available P. falciparum and P. reichenowi data strengthens the evidence for interallelic recombination over the evolutionary history of these species and also strengthens the case that AD/RS at these loci is ancient. Combined with previous results, these data provide evidence that AD/RS at different loci has evolved at several different times in the evolutionary history of P. falciparum: (i) before the P. gaboni-P. falciparum divergence, for much of MSP1 and MSP3; (ii) between the P. gaboni-P. falciparum and P. reichenowi-P. falciparum divergences, for the 5' end of the AD region of MSP6 and block 3 of MSP1; (iii) near the P. reichenowi-P. falciparum divergence, for the 3' end of the AD region of MSP6; and (iv) after the P. reichenowi-P. falciparum divergence, for MSP2. Based on these results, I suggest a new hypothesis for long-term evolutionary maintenance of AD/RS by recombination within allelic groups. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Evolutionary origin and demographic history of an ancient conifer (Juniperus microsperma) in the Qinghai-Tibetan Plateau

    PubMed Central

    Shang, Hui-Ying; Li, Zhong-Hu; Dong, Miao; Adams, Robert P.; Miehe, Georg; Opgenoorth, Lars; Mao, Kang-Shan

    2015-01-01

    All Qinghai-Tibetan Plateau (QTP) endemic species are assumed to have originated recently, although very rare species most likely diverged early. These ancient species provide an excellent model to examine the origin and evolution of QTP endemic plants in response to the QTP uplifts and the climate changes that followed in this high altitude region. In this study, we examined these hypotheses by employing sequence variation from multiple nuclear and chloroplast DNA of 239 individuals of Juniperus microsperma and its five congeners. Both phylogenetic and population genetic analyses revealed that J. microsperma diverged from its sister clade comprising two species with long isolation around the Early Miocene, which corresponds to early QTP uplift. Demographic modeling and coalescent tests suggest that J. microsperma experienced an obvious bottleneck event during the Quaternary when the global climate greatly oscillated. The results presented here support the hypotheses that the QTP uplifts and Quaternary climate changes played important roles in shaping the evolutionary history of this rare juniper. PMID:25977142

  20. Genome-wide single nucleotide polymorphisms (SNPs) for a model invasive ascidian Botryllus schlosseri.

    PubMed

    Gao, Yangchun; Li, Shiguo; Zhan, Aibin

    2018-04-01

    Invasive species cause huge damages to ecology, environment and economy globally. The comprehensive understanding of invasion mechanisms, particularly genetic bases of micro-evolutionary processes responsible for invasion success, is essential for reducing potential damages caused by invasive species. The golden star tunicate, Botryllus schlosseri, has become a model species in invasion biology, mainly owing to its high invasiveness nature and small well-sequenced genome. However, the genome-wide genetic markers have not been well developed in this highly invasive species, thus limiting the comprehensive understanding of genetic mechanisms of invasion success. Using restriction site-associated DNA (RAD) tag sequencing, here we developed a high-quality resource of 14,119 out of 158,821 SNPs for B. schlosseri. These SNPs were relatively evenly distributed at each chromosome. SNP annotations showed that the majority of SNPs (63.20%) were located at intergenic regions, and 21.51% and 14.58% were located at introns and exons, respectively. In addition, the potential use of the developed SNPs for population genomics studies was primarily assessed, such as the estimate of observed heterozygosity (H O ), expected heterozygosity (H E ), nucleotide diversity (π), Wright's inbreeding coefficient (F IS ) and effective population size (Ne). Our developed SNP resource would provide future studies the genome-wide genetic markers for genetic and genomic investigations, such as genetic bases of micro-evolutionary processes responsible for invasion success.

  1. An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure.

    PubMed

    Waldispühl, Jérôme; Ponty, Yann

    2011-11-01

    The analysis of the relationship between sequences and structures (i.e., how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases, and to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann-weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has difficulties to sample mutant sequences with low G+C-contents. In this article, we introduce an unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents significantly reduce the size of the evolutionary accessible mutational landscapes.

  2. Modeling coding-sequence evolution within the context of residue solvent accessibility.

    PubMed

    Scherrer, Michael P; Meyer, Austin G; Wilke, Claus O

    2012-09-12

    Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.

  3. Water lilies as emerging models for Darwin’s abominable mystery

    PubMed Central

    Chen, Fei; Liu, Xing; Yu, Cuiwei; Chen, Yuchu; Tang, Haibao; Zhang, Liangsheng

    2017-01-01

    Water lilies are not only highly favored aquatic ornamental plants with cultural and economic importance but they also occupy a critical evolutionary space that is crucial for understanding the origin and early evolutionary trajectory of flowering plants. The birth and rapid radiation of flowering plants has interested many scientists and was considered ‘an abominable mystery’ by Charles Darwin. In searching for the angiosperm evolutionary origin and its underlying mechanisms, the genome of Amborella has shed some light on the molecular features of one of the basal angiosperm lineages; however, little is known regarding the genetics and genomics of another basal angiosperm lineage, namely, the water lily. In this study, we reviewed current molecular research and note that water lily research has entered the genomic era. We propose that the genome of the water lily is critical for studying the contentious relationship of basal angiosperms and Darwin’s ‘abominable mystery’. Four pantropical water lilies, especially the recently sequenced Nymphaea colorata, have characteristics such as small size, rapid growth rate and numerous seeds and can act as the best model for understanding the origin of angiosperms. The water lily genome is also valuable for revealing the genetics of ornamental traits and will largely accelerate the molecular breeding of water lilies. PMID:28979789

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    MacDonald, James; Mullan, D. J.

    KIC 7177553 is a quadruple system containing two binaries of orbital periods 16.5 and 18 days. All components have comparable masses and are slowly rotating with spectral types of ∼G2V. The longer period binary is eclipsing with component masses and radii M {sub 1} = 1.043 ± 0.014 M {sub ⊙}, R {sub 1} = 0.940 ± 0.005 R {sub ⊙} and M {sub 2} = 0.986 ± 0.015 M {sub ⊙}, R {sub 2} = 0.941 ± 0.005 R {sub ⊙}. The essentially equal radii measurements are inconsistent with the two stars being on the man sequence at themore » same age using standard nonmagnetic stellar evolution models. Instead a consistent scenario is found if the stars are in their pre-main-sequence phase of evolution and have an age of 32–36 Myr. We have also computed evolutionary models of magnetic stars, but we find that our nonmagnetic models fit the empirical radii and effective temperatures better than the magnetic models.« less

  5. Genes with stable DNA methylation levels show higher evolutionary conservation than genes with fluctuant DNA methylation levels.

    PubMed

    Zhang, Ruijie; Lv, Wenhua; Luan, Meiwei; Zheng, Jiajia; Shi, Miao; Zhu, Hongjie; Li, Jin; Lv, Hongchao; Zhang, Mingming; Shang, Zhenwei; Duan, Lian; Jiang, Yongshuai

    2015-11-24

    Different human genes often exhibit different degrees of stability in their DNA methylation levels between tissues, samples or cell types. This may be related to the evolution of human genome. Thus, we compared the evolutionary conservation between two types of genes: genes with stable DNA methylation levels (SM genes) and genes with fluctuant DNA methylation levels (FM genes). For long-term evolutionary characteristics between species, we compared the percentage of the orthologous genes, evolutionary rate dn/ds and protein sequence identity. We found that the SM genes had greater percentages of the orthologous genes, lower dn/ds, and higher protein sequence identities in all the 21 species. These results indicated that the SM genes were more evolutionarily conserved than the FM genes. For short-term evolutionary characteristics among human populations, we compared the single nucleotide polymorphism (SNP) density, and the linkage disequilibrium (LD) degree in HapMap populations and 1000 genomes project populations. We observed that the SM genes had lower SNP densities, and higher degrees of LD in all the 11 HapMap populations and 13 1000 genomes project populations. These results mean that the SM genes had more stable chromosome genetic structures, and were more conserved than the FM genes.

  6. The sequence, and its evolutionary implications, of a Thermococcus celer protein associated with transcription

    NASA Technical Reports Server (NTRS)

    Kaine, B. P.; Mehr, I. J.; Woese, C. R.

    1994-01-01

    Through random search, a gene from Thermococcus celer has been identified and sequenced that appears to encode a transcription-associated protein (110 amino acid residues). The sequence has clear homology to approximately the last half of an open reading frame reported previously for Sulfolobus acidocaldarius [Langer, D. & Zillig, W. (1993) Nucleic Acids Res. 21, 2251]. The protein translations of these two archaeal genes in turn are homologs of a small subunit found in eukaryotic RNA polymerase I (A12.2) and the counterpart of this from RNA polymerase II (B12.6). Homology is also seen with the eukaryotic transcription factor TFIIS, but it involves only the terminal 45 amino acids of the archaeal proteins. Evolutionary implications of these homologies are discussed.

  7. Algorithm to find distant repeats in a single protein sequence

    PubMed Central

    Banerjee, Nirjhar; Sarani, Rangarajan; Ranjani, Chellamuthu Vasuki; Sowmiya, Govindaraj; Michael, Daliah; Balakrishnan, Narayanasamy; Sekar, Kanagaraj

    2008-01-01

    Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies. PMID:19052663

  8. Understanding sequence similarity and framework analysis between centromere proteins using computational biology.

    PubMed

    Doss, C George Priya; Chakrabarty, Chiranjib; Debajyoti, C; Debottam, S

    2014-11-01

    Certain mysteries pointing toward their recruitment pathways, cell cycle regulation mechanisms, spindle checkpoint assembly, and chromosome segregation process are considered the centre of attraction in cancer research. In modern times, with the established databases, ranges of computational platforms have provided a platform to examine almost all the physiological and biochemical evidences in disease-associated phenotypes. Using existing computational methods, we have utilized the amino acid residues to understand the similarity within the evolutionary variance of different associated centromere proteins. This study related to sequence similarity, protein-protein networking, co-expression analysis, and evolutionary trajectory of centromere proteins will speed up the understanding about centromere biology and will create a road map for upcoming researchers who are initiating their work of clinical sequencing using centromere proteins.

  9. Tunicate mitogenomics and phylogenetics: peculiarities of the Herdmania momus mitochondrial genome and support for the new chordate phylogeny

    PubMed Central

    2009-01-01

    Background Tunicates represent a key metazoan group as the sister-group of vertebrates within chordates. The six complete mitochondrial genomes available so far for tunicates have revealed distinctive features. Extensive gene rearrangements and particularly high evolutionary rates have been evidenced with regard to other chordates. This peculiar evolutionary dynamics has hampered the reconstruction of tunicate phylogenetic relationships within chordates based on mitogenomic data. Results In order to further understand the atypical evolutionary dynamics of the mitochondrial genome of tunicates, we determined the complete sequence of the solitary ascidian Herdmania momus. This genome from a stolidobranch ascidian presents the typical tunicate gene content with 13 protein-coding genes, 2 rRNAs and 24 tRNAs which are all encoded on the same strand. However, it also presents a novel gene arrangement, highlighting the extreme plasticity of gene order observed in tunicate mitochondrial genomes. Probabilistic phylogenetic inferences were conducted on the concatenation of the 13 mitochondrial protein-coding genes from representatives of major metazoan phyla. We show that whereas standard homogeneous amino acid models support an artefactual sister position of tunicates relative to all other bilaterians, the CAT and CAT+BP site- and time-heterogeneous mixture models place tunicates as the sister-group of vertebrates within monophyletic chordates. Moreover, the reference phylogeny indicates that tunicate mitochondrial genomes have experienced a drastic acceleration in their evolutionary rate that equally affects protein-coding and ribosomal-RNA genes. Conclusion This is the first mitogenomic study supporting the new chordate phylogeny revealed by recent phylogenomic analyses. It illustrates the beneficial effects of an increased taxon sampling coupled with the use of more realistic amino acid substitution models for the reconstruction of animal phylogeny. PMID:19922605

  10. Tunicate mitogenomics and phylogenetics: peculiarities of the Herdmania momus mitochondrial genome and support for the new chordate phylogeny.

    PubMed

    Singh, Tiratha Raj; Tsagkogeorga, Georgia; Delsuc, Frédéric; Blanquart, Samuel; Shenkar, Noa; Loya, Yossi; Douzery, Emmanuel Jp; Huchon, Dorothée

    2009-11-17

    Tunicates represent a key metazoan group as the sister-group of vertebrates within chordates. The six complete mitochondrial genomes available so far for tunicates have revealed distinctive features. Extensive gene rearrangements and particularly high evolutionary rates have been evidenced with regard to other chordates. This peculiar evolutionary dynamics has hampered the reconstruction of tunicate phylogenetic relationships within chordates based on mitogenomic data. In order to further understand the atypical evolutionary dynamics of the mitochondrial genome of tunicates, we determined the complete sequence of the solitary ascidian Herdmania momus. This genome from a stolidobranch ascidian presents the typical tunicate gene content with 13 protein-coding genes, 2 rRNAs and 24 tRNAs which are all encoded on the same strand. However, it also presents a novel gene arrangement, highlighting the extreme plasticity of gene order observed in tunicate mitochondrial genomes. Probabilistic phylogenetic inferences were conducted on the concatenation of the 13 mitochondrial protein-coding genes from representatives of major metazoan phyla. We show that whereas standard homogeneous amino acid models support an artefactual sister position of tunicates relative to all other bilaterians, the CAT and CAT+BP site- and time-heterogeneous mixture models place tunicates as the sister-group of vertebrates within monophyletic chordates. Moreover, the reference phylogeny indicates that tunicate mitochondrial genomes have experienced a drastic acceleration in their evolutionary rate that equally affects protein-coding and ribosomal-RNA genes. This is the first mitogenomic study supporting the new chordate phylogeny revealed by recent phylogenomic analyses. It illustrates the beneficial effects of an increased taxon sampling coupled with the use of more realistic amino acid substitution models for the reconstruction of animal phylogeny.

  11. Evaluating the Impact of Genomic Data and Priors on Bayesian Estimates of the Angiosperm Evolutionary Timescale.

    PubMed

    Foster, Charles S P; Sauquet, Hervê; van der Merwe, Marlien; McPherson, Hannah; Rossetto, Maurizio; Ho, Simon Y W

    2017-05-01

    The evolutionary timescale of angiosperms has long been a key question in biology. Molecular estimates of this timescale have shown considerable variation, being influenced by differences in taxon sampling, gene sampling, fossil calibrations, evolutionary models, and choices of priors. Here, we analyze a data set comprising 76 protein-coding genes from the chloroplast genomes of 195 taxa spanning 86 families, including novel genome sequences for 11 taxa, to evaluate the impact of models, priors, and gene sampling on Bayesian estimates of the angiosperm evolutionary timescale. Using a Bayesian relaxed molecular-clock method, with a core set of 35 minimum and two maximum fossil constraints, we estimated that crown angiosperms arose 221 (251-192) Ma during the Triassic. Based on a range of additional sensitivity and subsampling analyses, we found that our date estimates were generally robust to large changes in the parameters of the birth-death tree prior and of the model of rate variation across branches. We found an exception to this when we implemented fossil calibrations in the form of highly informative gamma priors rather than as uniform priors on node ages. Under all other calibration schemes, including trials of seven maximum age constraints, we consistently found that the earliest divergences of angiosperm clades substantially predate the oldest fossils that can be assigned unequivocally to their crown group. Overall, our results and experiments with genome-scale data suggest that reliable estimates of the angiosperm crown age will require increased taxon sampling, significant methodological changes, and new information from the fossil record. [Angiospermae, chloroplast, genome, molecular dating, Triassic.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. Complete nucleotide sequence of pig (Sus scrofa) mitochondrial genome and dating evolutionary divergence within Artiodactyla.

    PubMed

    Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C

    1999-08-05

    The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.

  13. Exploring Connectivity in Sequence Space of Functional RNA

    NASA Technical Reports Server (NTRS)

    Wei, Chenyu; Pohorille, Andrzej; Popovic, Milena; Ditzler, Mark

    2017-01-01

    Emergence of replicable genetic molecules was one of the marking points in the origin of life, evolution of which can be conceptualized as a walk through the space of all possible sequences. A theoretical concept of fitness landscape helps to understand evolutionary processes through assigning a value of fitness to each genotype. Then, evolution of a phenotype is viewed as a series of consecutive, single-point mutations. Natural selection biases evolution toward peaks of high fitness and away from valleys of low fitness. whereas neutral drift occurs in the sequence space without direction as mutations are introduced at random. Large networks of neutral or near-neutral mutations on a fitness landscape, especially for sufficiently long genomes, are possible or even inevitable. Their detection in experiments, however, has been elusive. Although a few near-neutral evolutionary pathways have been found, recent experimental evidence indicates landscapes consist of largely isolated islands. The generality of these results, however, is not clear, as the genome length or the fraction of functional molecules in the genotypic space might have been insufficient for the emergence of large, neutral networks. Thorough investigation on the structure of the fitness landscape is essential to understand the mechanisms of evolution of early genomes. RNA molecules are commonly assumed to play the pivotal role in the origin of genetic systems. They are widely believed to be early, if not the earliest, genetic and catalytic molecules, with abundant biochemical activities as aptamers and ribozymes, i.e. RNA molecules capable, respectively, to bind small molecules or catalyze chemical reactions. Here, we present results of our recent studies on the structure of the sequence space of RNA ligase ribozymes selected through in vitro evolution. Several hundred thousands of sequences active to a different degree were obtained by way of deep sequencing. Analysis of these sequences revealed several large clusters defined such that every sequence in a cluster can be reached from any other sequence in the same cluster through a series of single point mutations. Sequences in a single cluster appear to adopt more than one secondary structure. The mechanism of refolding within a single cluster was examined. To shed light on possible evolutionary paths in the space of ribozymes, the connectivity between clusters was investigated. The effect of length of RNA molecules on the structure of the fitness landscape and possible evolutionary paths was examined by way of comparing functional sequences of 20 and 80 nucleobases in length. It was found that sequences of different lengths shared secondary structure motifs that were presumed responsible for catalytic activity, with increasing complexity and global structural rearrangements emerging in longer molecules.

  14. Hidden Markov models and other machine learning approaches in computational molecular biology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baldi, P.

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Computational tools are increasingly needed to process the massive amounts of data, to organize and classify sequences, to detect weak similarities, to separate coding from non-coding regions, and reconstruct the underlying evolutionary history. The fundamental problem in machine learning is the same as in scientific reasoning in general, as well as statistical modeling: to come up with a good model for the data. In thismore » tutorial four classes of models are reviewed. They are: Hidden Markov models; artificial Neural Networks; Belief Networks; and Stochastic Grammars. When dealing with DNA and protein primary sequences, Hidden Markov models are one of the most flexible and powerful alignments and data base searches. In this tutorial, attention is focused on the theory of Hidden Markov Models, and how to apply them to problems in molecular biology.« less

  15. Probabilistic models of eukaryotic evolution: time for integration

    PubMed Central

    Lartillot, Nicolas

    2015-01-01

    In spite of substantial work and recent progress, a global and fully resolved picture of the macroevolutionary history of eukaryotes is still under construction. This concerns not only the phylogenetic relations among major groups, but also the general characteristics of the underlying macroevolutionary processes, including the patterns of gene family evolution associated with endosymbioses, as well as their impact on the sequence evolutionary process. All these questions raise formidable methodological challenges, calling for a more powerful statistical paradigm. In this direction, model-based probabilistic approaches have played an increasingly important role. In particular, improved models of sequence evolution accounting for heterogeneities across sites and across lineages have led to significant, although insufficient, improvement in phylogenetic accuracy. More recently, one main trend has been to move away from simple parametric models and stepwise approaches, towards integrative models explicitly considering the intricate interplay between multiple levels of macroevolutionary processes. Such integrative models are in their infancy, and their application to the phylogeny of eukaryotes still requires substantial improvement of the underlying models, as well as additional computational developments. PMID:26323768

  16. Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana.

    PubMed

    Liu, Yanan; Wang, Baoju; Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia

    2016-01-01

    The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model.

  17. Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana

    PubMed Central

    Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia

    2016-01-01

    The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model. PMID:27806133

  18. A cricket Gene Index: a genomic resource for studying neurobiology, speciation, and molecular evolution

    PubMed Central

    Danley, Patrick D; Mullen, Sean P; Liu, Fenglong; Nene, Vishvanath; Quackenbush, John; Shaw, Kerry L

    2007-01-01

    Background As the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. Results We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page Conclusion Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic resources for three distinct but overlapping fields of inquiry: neurobiology, speciation, and molecular evolution. PMID:17459168

  19. The nearly neutral and selection theories of molecular evolution under the fisher geometrical framework: substitution rate, population size, and complexity.

    PubMed

    Razeto-Barry, Pablo; Díaz, Javier; Vásquez, Rodrigo A

    2012-06-01

    The general theories of molecular evolution depend on relatively arbitrary assumptions about the relative distribution and rate of advantageous, deleterious, neutral, and nearly neutral mutations. The Fisher geometrical model (FGM) has been used to make distributions of mutations biologically interpretable. We explored an FGM-based molecular model to represent molecular evolutionary processes typically studied by nearly neutral and selection models, but in which distributions and relative rates of mutations with different selection coefficients are a consequence of biologically interpretable parameters, such as the average size of the phenotypic effect of mutations and the number of traits (complexity) of organisms. A variant of the FGM-based model that we called the static regime (SR) represents evolution as a nearly neutral process in which substitution rates are determined by a dynamic substitution process in which the population's phenotype remains around a suboptimum equilibrium fitness produced by a balance between slightly deleterious and slightly advantageous compensatory substitutions. As in previous nearly neutral models, the SR predicts a negative relationship between molecular evolutionary rate and population size; however, SR does not have the unrealistic properties of previous nearly neutral models such as the narrow window of selection strengths in which they work. In addition, the SR suggests that compensatory mutations cannot explain the high rate of fixations driven by positive selection currently found in DNA sequences, contrary to what has been previously suggested. We also developed a generalization of SR in which the optimum phenotype can change stochastically due to environmental or physiological shifts, which we called the variable regime (VR). VR models evolution as an interplay between adaptive processes and nearly neutral steady-state processes. When strong environmental fluctuations are incorporated, the process becomes a selection model in which evolutionary rate does not depend on population size, but is critically dependent on the complexity of organisms and mutation size. For SR as well as VR we found that key parameters of molecular evolution are linked by biological factors, and we showed that they cannot be fixed independently by arbitrary criteria, as has usually been assumed in previous molecular evolutionary models.

  20. The Nearly Neutral and Selection Theories of Molecular Evolution Under the Fisher Geometrical Framework: Substitution Rate, Population Size, and Complexity

    PubMed Central

    Razeto-Barry, Pablo; Díaz, Javier; Vásquez, Rodrigo A.

    2012-01-01

    The general theories of molecular evolution depend on relatively arbitrary assumptions about the relative distribution and rate of advantageous, deleterious, neutral, and nearly neutral mutations. The Fisher geometrical model (FGM) has been used to make distributions of mutations biologically interpretable. We explored an FGM-based molecular model to represent molecular evolutionary processes typically studied by nearly neutral and selection models, but in which distributions and relative rates of mutations with different selection coefficients are a consequence of biologically interpretable parameters, such as the average size of the phenotypic effect of mutations and the number of traits (complexity) of organisms. A variant of the FGM-based model that we called the static regime (SR) represents evolution as a nearly neutral process in which substitution rates are determined by a dynamic substitution process in which the population’s phenotype remains around a suboptimum equilibrium fitness produced by a balance between slightly deleterious and slightly advantageous compensatory substitutions. As in previous nearly neutral models, the SR predicts a negative relationship between molecular evolutionary rate and population size; however, SR does not have the unrealistic properties of previous nearly neutral models such as the narrow window of selection strengths in which they work. In addition, the SR suggests that compensatory mutations cannot explain the high rate of fixations driven by positive selection currently found in DNA sequences, contrary to what has been previously suggested. We also developed a generalization of SR in which the optimum phenotype can change stochastically due to environmental or physiological shifts, which we called the variable regime (VR). VR models evolution as an interplay between adaptive processes and nearly neutral steady-state processes. When strong environmental fluctuations are incorporated, the process becomes a selection model in which evolutionary rate does not depend on population size, but is critically dependent on the complexity of organisms and mutation size. For SR as well as VR we found that key parameters of molecular evolution are linked by biological factors, and we showed that they cannot be fixed independently by arbitrary criteria, as has usually been assumed in previous molecular evolutionary models. PMID:22426879

Top