Science.gov

Sample records for large ancestral genomes

  1. Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

    PubMed Central

    Miklós, István

    2009-01-01

    Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746

  2. Understanding Brassicaceae evolution through ancestral genome reconstruction.

    PubMed

    Murat, Florent; Louis, Alexandra; Maumus, Florian; Armero, Alix; Cooke, Richard; Quesneville, Hadi; Roest Crollius, Hugues; Salse, Jerome

    2015-12-10

    Brassicaceae is a family of green plants of high scientific and economic interest, including thale cress (Arabidopsis thaliana), cruciferous vegetables (cabbages) and rapeseed. We reconstruct an evolutionary framework of Brassicaceae composed of high-resolution ancestral karyotypes using the genomes of modern A. thaliana, Arabidopsis lyrata, Capsella rubella, Brassica rapa and Thellungiella parvula. The ancestral Brassicaceae karyotype (Brassicaceae lineages I and II) is composed of eight protochromosomes and 20,037 ordered and oriented protogenes. After speciation, it evolved into the ancestral Camelineae karyotype (eight protochromosomes and 22,085 ordered protogenes) and the proto-Calepineae karyotype (seven protochromosomes and 21,035 ordered protogenes) genomes. The three inferred ancestral karyotype genomes are shown here to be powerful tools to unravel the reticulated evolutionary history of extant Brassicaceae genomes regarding the fate of ancestral genes and genomic compartments, particularly centromeres and evolutionary breakpoints. This new resource should accelerate research in comparative genomics and translational research by facilitating the transfer of genomic information from model systems to species of agronomic interest.

  3. Regulatory genes in the ancestral chordate genomes.

    PubMed

    Satou, Yutaka; Wada, Shuichi; Sasakura, Yasunori; Satoh, Nori

    2008-12-01

    Changes or innovations in gene regulatory networks for the developmental program in the ancestral chordate genome appear to be a major component in the evolutionary process in which tadpole-type larvae, a unique characteristic of chordates, arose. These alterations may include new genetic interactions as well as the acquisition of new regulatory genes. Previous analyses of the Ciona genome revealed that many genes may have emerged after the divergence of the tunicate and vertebrate lineages. In this paper, we examined this possibility by examining a second non-vertebrate chordate genome. We conclude from this analysis that the ancient chordate included almost the same repertory of regulatory genes, but less redundancy than extant vertebrates, and that approximately 10% of vertebrate regulatory genes were innovated after the emergence of vertebrates. Thus, refined regulatory networks arose during vertebrate evolution mainly as preexisting regulatory genes multiplied rather than by generating new regulatory genes. The inferred regulatory gene sets of the ancestral chordate would be an important foundation for understanding how tadpole-type larvae, a unique characteristic of chordates, evolved.

  4. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates.

    PubMed

    Nakatani, Yoichiro; Takeda, Hiroyuki; Kohara, Yuji; Morishita, Shinichi

    2007-09-01

    Although several vertebrate genomes have been sequenced, little is known about the genome evolution of early vertebrates and how large-scale genomic changes such as the two rounds of whole-genome duplications (2R WGD) affected evolutionary complexity and novelty in vertebrates. Reconstructing the ancestral vertebrate genome is highly nontrivial because of the difficulty in identifying traces originating from the 2R WGD. To resolve this problem, we developed a novel method capable of pinning down remains of the 2R WGD in the human and medaka fish genomes using invertebrate tunicate and sea urchin genes to define ohnologs, i.e., paralogs produced by the 2R WGD. We validated the reconstruction using the chicken genome, which was not considered in the reconstruction step, and observed that many ancestral proto-chromosomes were retained in the chicken genome and had one-to-one correspondence to chicken microchromosomes, thereby confirming the reconstructed ancestral genomes. Our reconstruction revealed a contrast between the slow karyotype evolution after the second WGD and the rapid, lineage-specific genome reorganizations that occurred in the ancestral lineages of major taxonomic groups such as teleost fishes, amphibians, reptiles, and marsupials.

  5. Yeast Ancestral Genome Reconstructions: The Possibilities of Computational Methods

    NASA Astrophysics Data System (ADS)

    Tannier, Eric

    In 2006, a debate has risen on the question of the efficiency of bioinformatics methods to reconstruct mammalian ancestral genomes. Three years later, Gordon et al. (PLoS Genetics, 5(5), 2009) chose not to use automatic methods to build up the genome of a 100 million year old Saccharomyces cerevisiae ancestor. Their manually constructed ancestor provides a reference genome to test whether automatic methods are indeed unable to approach confident reconstructions. Adapting several methodological frameworks to the same yeast gene order data, I discuss the possibilities, differences and similarities of the available algorithms for ancestral genome reconstructions. The methods can be classified into two types: local and global. Studying the properties of both helps to clarify what we can expect from their usage. Both methods propose contiguous ancestral regions that come very close (> 95% identity) to the manually predicted ancestral yeast chromosomes, with a good coverage of the extant genomes.

  6. Deciphering the diploid ancestral genome of the Mesohexaploid Brassica rapa.

    PubMed

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A; Wang, Xiaowu

    2013-05-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers.

  7. Yeast ancestral genome reconstructions: the possibilities of computational methods II.

    PubMed

    Chauve, Cedric; Gavranovic, Haris; Ouangraoua, Aida; Tannier, Eric

    2010-09-01

    Since the availability of assembled eukaryotic genomes, the first one being a budding yeast, many computational methods for the reconstruction of ancestral karyotypes and gene orders have been developed. The difficulty has always been to assess their reliability, since we often miss a good knowledge of the true ancestral genomes to compare their results to, as well as a good knowledge of the evolutionary mechanisms to test them on realistic simulated data. In this study, we propose some measures of reliability of several kinds of methods, and apply them to infer and analyse the architectures of two ancestral yeast genomes, based on the sequence of seven assembled extant ones. The pre-duplication common ancestor of S. cerevisiae and C. glabrata has been inferred manually by Gordon et al. (Plos Genet. 2009). We show why, in this case, a good convergence of the methods is explained by some properties of the data, and why results are reliable. In another study, Jean et al. (J. Comput Biol. 2009) proposed an ancestral architecture of the last common ancestor of S. kluyveri, K. thermotolerans, K. lactis, A. gossypii, and Z. rouxii inferred by a computational method. In this case, we show that the dataset does not seem to contain enough information to infer a reliable architecture, and we construct a higher resolution dataset which gives a good reliability on a new ancestral configuration.

  8. Fast ancestral gene order reconstruction of genomes with unequal gene content.

    PubMed

    Feijão, Pedro; Araujo, Eloi

    2016-11-11

    During evolution, genomes are modified by large scale structural events, such as rearrangements, deletions or insertions of large blocks of DNA. Of particular interest, in order to better understand how this type of genomic evolution happens, is the reconstruction of ancestral genomes, given a phylogenetic tree with extant genomes at its leaves. One way of solving this problem is to assume a rearrangement model, such as Double Cut and Join (DCJ), and find a set of ancestral genomes that minimizes the number of events on the input tree. Since this problem is NP-hard for most rearrangement models, exact solutions are practical only for small instances, and heuristics have to be used for larger datasets. This type of approach can be called event-based. Another common approach is based on finding conserved structures between the input genomes, such as adjacencies between genes, possibly also assigning weights that indicate a measure of confidence or probability that this particular structure is present on each ancestral genome, and then finding a set of non conflicting adjacencies that optimize some given function, usually trying to maximize total weight and minimizing character changes in the tree. We call this type of methods homology-based. In previous work, we proposed an ancestral reconstruction method that combines homology- and event-based ideas, using the concept of intermediate genomes, that arise in DCJ rearrangement scenarios. This method showed better rate of correctly reconstructed adjacencies than other methods, while also being faster, since the use of intermediate genomes greatly reduces the search space. Here, we generalize the intermediate genome concept to genomes with unequal gene content, extending our method to account for gene insertions and deletions of any length. In many of the simulated datasets, our proposed method had better results than MLGO and MGRA, two state-of-the-art algorithms for ancestral reconstruction with unequal gene content

  9. Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

    PubMed Central

    Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

    2006-01-01

    Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We

  10. Ancestral components of admixed genomes in a Mexican cohort.

    PubMed

    Johnson, Nicholas A; Coram, Marc A; Shriver, Mark D; Romieu, Isabelle; Barsh, Gregory S; London, Stephanie J; Tang, Hua

    2011-12-01

    For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study "virtual genomes" of admixed individuals. We apply this approach to a cohort of 492 parent-offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations-Africa, Europe, and America-vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10-15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease

  11. Ancestral Components of Admixed Genomes in a Mexican Cohort

    PubMed Central

    Johnson, Nicholas A.; Coram, Marc A.; Shriver, Mark D.; Romieu, Isabelle; Barsh, Gregory S.; London, Stephanie J.; Tang, Hua

    2011-01-01

    For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study “virtual genomes” of admixed individuals. We apply this approach to a cohort of 492 parent–offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations—Africa, Europe, and America—vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10–15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease

  12. DeCoSTAR: Reconstructing the ancestral organization of genes or genomes using reconciled phylogenies.

    PubMed

    Duchemin, Wandrille; Anselmetti, Yoann; Patterson, Murray; Ponty, Yann; Berard, Severine; Chauve, Cedric; Scornavacca, Celine; Daubin, Vincent; Tannier, Eric

    2017-04-08

    DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann-Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously-contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria.

  13. DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies

    PubMed Central

    Anselmetti, Yoann; Patterson, Murray; Ponty, Yann; B�rard, S�verine; Chauve, Cedric; Scornavacca, Celine; Daubin, Vincent; Tannier, Eric

    2017-01-01

    DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann–Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability: http://pbil.univ-lyon1.fr/software/DeCoSTAR (Last accessed April 24, 2017). PMID:28402423

  14. Genome-Wide Inference of Ancestral Recombination Graphs

    PubMed Central

    Rasmussen, Matthew D.; Hubisz, Melissa J.; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  15. The "fossilized" mitochondrial genome of Liriodendron tulipifera: ancestral gene content and order, ancestral editing sites, and extraordinarily low mutation rate.

    PubMed

    Richardson, Aaron O; Rice, Danny W; Young, Gregory J; Alverson, Andrew J; Palmer, Jeffrey D

    2013-04-15

    The mitochondrial genomes of flowering plants vary greatly in size, gene content, gene order, mutation rate and level of RNA editing. However, the narrow phylogenetic breadth of available genomic data has limited our ability to reconstruct these traits in the ancestral flowering plant and, therefore, to infer subsequent patterns of evolution across angiosperms. We sequenced the mitochondrial genome of Liriodendron tulipifera, the first from outside the monocots or eudicots. This 553,721 bp mitochondrial genome has evolved remarkably slowly in virtually all respects, with an extraordinarily low genome-wide silent substitution rate, retention of genes frequently lost in other angiosperm lineages, and conservation of ancestral gene clusters. The mitochondrial protein genes in Liriodendron are the most heavily edited of any angiosperm characterized to date. Most of these sites are also edited in various other lineages, which allowed us to polarize losses of editing sites in other parts of the angiosperm phylogeny. Finally, we added comprehensive gene sequence data for two other magnoliids, Magnolia stellata and the more distantly related Calycanthus floridus, to measure rates of sequence evolution in Liriodendron with greater accuracy. The Magnolia genome has evolved at an even lower rate, revealing a roughly 5,000-fold range of synonymous-site divergence among angiosperms whose mitochondrial gene space has been comprehensively sequenced. Using Liriodendron as a guide, we estimate that the ancestral flowering plant mitochondrial genome contained 41 protein genes, 14 tRNA genes of mitochondrial origin, as many as 7 tRNA genes of chloroplast origin, >700 sites of RNA editing, and some 14 colinear gene clusters. Many of these gene clusters, genes and RNA editing sites have been variously lost in different lineages over the course of the ensuing ∽200 million years of angiosperm evolution.

  16. Differential loss of ancestral gene families as a source of genomic divergence in animals.

    PubMed Central

    Hughes, Austin L; Friedman, Robert

    2004-01-01

    A phylogenetic approach was used to reconstruct the pattern of an apparent loss of 2106 ancestral gene families in four animal genomes (Caenorhabditis elegans, Drosophila melanogaster, human and fugu). Substantially higher rates of loss of ancestral gene families were found in the invertebrates than in the vertebrates. These results indicate that the differential loss of ancestral gene families can be a significant factor in the evolutionary diversification of organisms. PMID:15101434

  17. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence

    PubMed Central

    2015-01-01

    Background We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. Results We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. Conclusion The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison. PMID:26450112

  18. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  19. Deciphering the Diploid Ancestral Genome of the Mesohexaploid Brassica rapa[C][W

    PubMed Central

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A.; Wang, Xiaowu

    2013-01-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers. PMID:23653472

  20. Mapping ancestral genomes with massive gene loss: A matrix sandwich problem

    PubMed Central

    Gavranović, Haris; Chauve, Cedric; Salse, Jérôme; Tannier, Eric

    2011-01-01

    Motivation: Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto–monocotyledon whose evolution involved massive gene loss following a whole genome duplication. Results: We propose a mapping approach based on the combinatorial notion of ‘sandwich consecutive ones matrix’, which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content. Availability: Upon request to the authors. Contact: haris.gavranovic@gmail.com; eric.tannier@inria.fr PMID:21685079

  1. Ancient hybridizations among the ancestral genomes of bread wheat.

    PubMed

    Marcussen, Thomas; Sandve, Simen R; Heier, Lise; Spannagl, Manuel; Pfeifer, Matthias; Jakobsen, Kjetill S; Wulff, Brande B H; Steuernagel, Burkhard; Mayer, Klaus F X; Olsen, Odd-Arne

    2014-07-18

    The allohexaploid bread wheat genome consists of three closely related subgenomes (A, B, and D), but a clear understanding of their phylogenetic history has been lacking. We used genome assemblies of bread wheat and five diploid relatives to analyze genome-wide samples of gene trees, as well as to estimate evolutionary relatedness and divergence times. We show that the A and B genomes diverged from a common ancestor ~7 million years ago and that these genomes gave rise to the D genome through homoploid hybrid speciation 1 to 2 million years later. Our findings imply that the present-day bread wheat genome is a product of multiple rounds of hybrid speciation (homoploid and polyploid) and lay the foundation for a new framework for understanding the wheat genome as a multilevel phylogenetic mosaic.

  2. Genomic evolution in domestic cattle: ancestral haplotypes and healthy beef.

    PubMed

    Williamson, Joseph F; Steele, Edward J; Lester, Susan; Kalai, Oscar; Millman, John A; Wolrige, Lindsay; Bayard, Dominic; McLure, Craig; Dawkins, Roger L

    2011-05-01

    We have identified numerous Ancestral Haplotypes encoding a 14-Mb region of Bota C19. Three are frequent in Simmental, Angus and Wagyu and have been conserved since common progenitor populations. Others are more relevant to the differences between these 3 breeds including fat content and distribution in muscle. SREBF1 and Growth Hormone, which have been implicated in the production of healthy beef, are included within these haplotypes. However, we conclude that alleles at these 2 loci are less important than other sequences within the haplotypes. Identification of breeds and hybrids is improved by using haplotypes rather than individual alleles.

  3. Whole genome profiling physical map and ancestral annotation of tobacco Hicks Broadleaf.

    PubMed

    Sierro, Nicolas; van Oeveren, Jan; van Eijk, Michiel J T; Martin, Florian; Stormo, Keith E; Peitsch, Manuel C; Ivanov, Nikolai V

    2013-09-01

    Genomics-based breeding of economically important crops such as banana, coffee, cotton, potato, tobacco and wheat is often hampered by genome size, polyploidy and high repeat content. We adapted sequence-based whole-genome profiling (WGP™) technology to obtain insight into the polyploidy of the model plant Nicotiana tabacum (tobacco). N. tabacum is assumed to originate from a hybridization event between ancestors of Nicotiana sylvestris and Nicotiana tomentosiformis approximately 200,000 years ago. This resulted in tobacco having a haploid genome size of 4500 million base pairs, approximately four times larger than the related tomato (Solanum lycopersicum) and potato (Solanum tuberosum) genomes. In this study, a physical map containing 9750 contigs of bacterial artificial chromosomes (BACs) was constructed. The mean contig size was 462 kbp, and the calculated genome coverage equaled the estimated tobacco genome size. We used a method for determination of the ancestral origin of the genome by annotation of WGP sequence tags. This assignment agreed with the ancestral annotation available from the tobacco genetic map, and may be used to investigate the evolution of homoeologous genome segments after polyploidization. The map generated is an essential scaffold for the tobacco genome. We propose the combination of WGP physical mapping technology and tag profiling of ancestral lines as a generally applicable method to elucidate the ancestral origin of genome segments of polyploid species. The physical mapping of genes and their origins will enable application of biotechnology to polyploid plants aimed at accelerating and increasing the precision of breeding for abiotic and biotic stress resistance.

  4. BAC libraries construction from the ancestral diploid genomes of the allotetraploid cultivated peanut

    PubMed Central

    Guimarães, Patricia M; Garsmeur, Olivier; Proite, Karina; Leal-Bertioli, Soraya CM; Seijo, Guilhermo; Chaine, Christian; Bertioli, David J; D'Hont, Angelique

    2008-01-01

    Background Cultivated peanut, Arachis hypogaea is an allotetraploid of recent origin, with an AABB genome. In common with many other polyploids, it seems that a severe genetic bottle-neck was imposed at the species origin, via hybridisation of two wild species and spontaneous chromosome duplication. Therefore, the study of the genome of peanut is hampered both by the crop's low genetic diversity and its polyploidy. In contrast to cultivated peanut, most wild Arachis species are diploid with high genetic diversity. The study of diploid Arachis genomes is therefore attractive, both to simplify the construction of genetic and physical maps, and for the isolation and characterization of wild alleles. The most probable wild ancestors of cultivated peanut are A. duranensis and A. ipaënsis with genome types AA and BB respectively. Results We constructed and characterized two large-insert libraries in Bacterial Artificial Chromosome (BAC) vector, one for each of the diploid ancestral species. The libraries (AA and BB) are respectively c. 7.4 and c. 5.3 genome equivalents with low organelle contamination and average insert sizes of 110 and 100 kb. Both libraries were used for the isolation of clones containing genetically mapped legume anchor markers (single copy genes), and resistance gene analogues. Conclusion These diploid BAC libraries are important tools for the isolation of wild alleles conferring resistances to biotic stresses, comparisons of orthologous regions of the AA and BB genomes with each other and with other legume species, and will facilitate the construction of a physical map. PMID:18230166

  5. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs

    PubMed Central

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2015-01-01

    To provide context for the diversifications of archosaurs, the group that includes crocodilians, dinosaurs and birds, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the relatively rapid evolution of bird genomes represents an autapomorphy within that clade. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these new data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  6. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs.

    PubMed

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2014-12-12

    To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. Copyright © 2014, American Association for the Advancement of Science.

  7. Ancestral alleles in the human genome based on population sequencing data.

    PubMed

    Park, Leeyoung

    2015-01-01

    Ancestral allele information is useful for genetics studies. Previously, the identification of ancestral alleles was primarily based on sequence alignments between species. Alternative ways to identify ancestral alleles were proposed in this study based on population sequencing data. The methods described here utilized the diversity between haplotypes harboring ancestral and newly emerged alleles. Simulations showed that these methods were reliable for identifying ancestral alleles when the variants had not aged too greatly. Application to the human genome sequencing data suggested the role of indels in maintaining the GC content in the human genome. The deletion-to-insertion ratios and GC proportions were correlated depending on the sizes of insertions and deletions in the direction of increasing GC content. There were GC-biased fixations in single base-pair insertions and AT-biased fixations in single base-pair deletions in the results based on the proposed methods. In the current study, GC-biased gene conversions in nucleotide substitutions were very slight or insignificant. In the variants of several quantitative trait loci (QTLs), slight GC-biased gene conversion was observed in nucleotide substitutions. For the QTL indels, insertions were observed more often than deletions, and deletion-biased fixation was observed, providing new insights into the evolution of functional genes.

  8. Ancestral Genomes, Sex, and the Population Structure of Trypanosoma cruzi

    PubMed Central

    Bastos-Rodrigues, Luciana; Gonçalves, Vanessa F; Teixeira, Santuza M. R; Chiari, Egler; Junqueira, Ângela C. V; Fernandes, Octavio; Macedo, Andréa M; Machado, Carlos Renato; Pena, Sérgio D. J

    2006-01-01

    Acquisition of detailed knowledge of the structure and evolution of Trypanosoma cruzi populations is essential for control of Chagas disease. We profiled 75 strains of the parasite with five nuclear microsatellite loci, 24Sα RNA genes, and sequence polymorphisms in the mitochondrial cytochrome oxidase subunit II gene. We also used sequences available in GenBank for the mitochondrial genes cytochrome B and NADH dehydrogenase subunit 1. A multidimensional scaling plot (MDS) based in microsatellite data divided the parasites into four clusters corresponding to T. cruzi I (MDS-cluster A), T. cruzi II (MDS-cluster C), a third group of T. cruzi strains (MDS-cluster B), and hybrid strains (MDS-cluster BH). The first two clusters matched respectively mitochondrial clades A and C, while the other two belonged to mitochondrial clade B. The 24Sα rDNA and microsatellite profiling data were combined into multilocus genotypes that were analyzed by the haplotype reconstruction program PHASE. We identified 141 haplotypes that were clearly distributed into three haplogroups (X, Y, and Z). All strains belonging to T. cruzi I (MDS-cluster A) were Z/Z, the T. cruzi II strains (MDS-cluster C) were Y/Y, and those belonging to MDS-cluster B (unclassified T. cruzi) had X/X haplogroup genotypes. The strains grouped in the MDS-cluster BH were X/Y, confirming their hybrid character. Based on these results we propose the following minimal scenario for T. cruzi evolution. In a distant past there were at a minimum three ancestral lineages that we may call, respectively, T. cruzi I, T. cruzi II, and T. cruzi III. At least two hybridization events involving T. cruzi II and T. cruzi III produced evolutionarily viable progeny. In both events, the mitochondrial recipient (as identified by the mitochondrial clade of the hybrid strains) was T. cruzi II and the mitochondrial donor was T. cruzi III. PMID:16609729

  9. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    PubMed

    Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

    2015-04-01

    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.

  10. High-density marker profiling confirms ancestral genomes of Avena species and identifies D-genome chromosomes of hexaploid oat.

    PubMed

    Yan, Honghai; Bekele, Wubishet A; Wight, Charlene P; Peng, Yuanying; Langdon, Tim; Latta, Robert G; Fu, Yong-Bi; Diederichsen, Axel; Howarth, Catherine J; Jellen, Eric N; Boyle, Brian; Wei, Yuming; Tinker, Nicholas A

    2016-11-01

    Genome analysis of 27 oat species identifies ancestral groups, delineates the D genome, and identifies ancestral origin of 21 mapped chromosomes in hexaploid oat. We investigated genomic relationships among 27 species of the genus Avena using high-density genetic markers revealed by genotyping-by-sequencing (GBS). Two methods of GBS analysis were used: one based on tag-level haplotypes that were previously mapped in cultivated hexaploid oat (A. sativa), and one intended to sample and enumerate tag-level haplotypes originating from all species under investigation. Qualitatively, both methods gave similar predictions regarding the clustering of species and shared ancestral genomes. Furthermore, results were consistent with previous phylogenies of the genus obtained with conventional approaches, supporting the robustness of whole genome GBS analysis. Evidence is presented to justify the final and definitive classification of the tetraploids A. insularis, A. maroccana (=A. magna), and A. murphyi as containing D-plus-C genomes, and not A-plus-C genomes, as is most often specified in past literature. Through electronic painting of the 21 chromosome representations in the hexaploid oat consensus map, we show how the relative frequency of matches between mapped hexaploid-derived haplotypes and AC (DC)-genome tetraploids vs. A- and C-genome diploids can accurately reveal the genome origin of all hexaploid chromosomes, including the approximate positions of inter-genome translocations. Evidence is provided that supports the continued classification of a diverged B genome in AB tetraploids, and it is confirmed that no extant A-genome diploids, including A. canariensis, are similar enough to the D genome of tetraploid and hexaploid oat to warrant consideration as a D-genome diploid.

  11. Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.

    PubMed

    Schopp, Pascal; Müller, Dominik; Technow, Frank; Melchinger, Albrecht E

    2017-01-01

    Synthetics play an important role in quantitative genetic research and plant breeding, but few studies have investigated the application of genomic prediction (GP) to these populations. Synthetics are generated by intermating a small number of parents ([Formula: see text] and thereby possess unique genetic properties, which make them especially suited for systematic investigations of factors contributing to the accuracy of GP. We generated synthetics in silico from [Formula: see text]2 to 32 maize (Zea mays L.) lines taken from an ancestral population with either short- or long-range linkage disequilibrium (LD). In eight scenarios differing in relatedness of the training and prediction sets and in the types of data used to calculate the relationship matrix (QTL, SNPs, tag markers, and pedigree), we investigated the prediction accuracy (PA) of Genomic best linear unbiased prediction (GBLUP) and analyzed contributions from pedigree relationships captured by SNP markers, as well as from cosegregation and ancestral LD between QTL and SNPs. The effects of training set size [Formula: see text] and marker density were also studied. Sampling few parents ([Formula: see text]) generates substantial sample LD that carries over into synthetics through cosegregation of alleles at linked loci. For fixed [Formula: see text], [Formula: see text] influences PA most strongly. If the training and prediction set are related, using [Formula: see text] parents yields high PA regardless of ancestral LD because SNPs capture pedigree relationships and Mendelian sampling through cosegregation. As [Formula: see text] increases, ancestral LD contributes more information, while other factors contribute less due to lower frequencies of closely related individuals. For unrelated prediction sets, only ancestral LD contributes information and accuracies were poor and highly variable for [Formula: see text] due to large sample LD. For large [Formula: see text], achieving moderate accuracy requires

  12. Monotreme IGF2 expression and ancestral origin of genomic imprinting.

    PubMed

    Killian, J K; Nolan, C M; Stewart, N; Munday, B L; Andersen, N A; Nicol, S; Jirtle, R L

    2001-08-15

    IGF2 (insulin-like growth factor 2) and M6P/IGF2R (mannose 6-phosphate/insulin-like growth factor 2 receptor) are imprinted in marsupials and eutherians but not in birds. These results along with the absence of M6P/IGF2R imprinting in the egg-laying monotremes indicate that the parental imprinting of fetal growth-regulatory genes may be unique to viviparous mammals. In this investigation, we have cloned IGF2 from two monotreme mammals, the platypus and echidna, to further investigate the origin of imprinting. We report herein that like M6P/IGF2R, IGF2 is not imprinted in monotremes. Thus, although IGF2 encodes for a highly conserved growth factor in chordates, it is only imprinted in therian mammals. These findings support a concurrent origin of IGF2 and M6P/IGF2R imprinting in the late Jurassic/early Cretaceous period. The absence of imprinting in monotremes, despite apparent interparental conflicts over maternal-offspring exchange, argues that a fortuitous congruency of genetic and epigenetic events may have limited the phylogenetic breadth of genomic imprinting to therian mammals. J. Exp. Zool. (Mol. Dev. Evol.) 291:205-212, 2001.

  13. Analyses of Charophyte Chloroplast Genomes Help Characterize the Ancestral Chloroplast Genome of Land Plants

    PubMed Central

    Civáň, Peter; Foster, Peter G.; Embley, Martin T.; Séneca, Ana; Cox, Cymon J.

    2014-01-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  14. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    PubMed

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.

  15. Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

    PubMed

    Jeong, Young-Min; Kim, Namshin; Ahn, Byung Ohg; Oh, Mijin; Chung, Won-Hyong; Chung, Hee; Jeong, Seongmun; Lim, Ki-Byung; Hwang, Yoon-Jung; Kim, Goon-Bo; Baek, Seunghoon; Choi, Sang-Bong; Hyung, Dae-Jin; Lee, Seung-Won; Sohn, Seong-Han; Kwon, Soo-Jin; Jin, Mina; Seol, Young-Joo; Chae, Won Byoung; Choi, Keun Jin; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan

    2016-07-01

    This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

  16. The mitochondrial genome structure of Xenoturbella bocki (phylum Xenoturbellida) is ancestral within the deuterostomes

    PubMed Central

    Bourlat, Sarah J; Rota-Stabelli, Omar; Lanfear, Robert; Telford, Maximilian J

    2009-01-01

    Background Mitochondrial genome comparisons contribute in multiple ways when inferring animal relationships. As well as primary sequence data, rare genomic changes such as gene order, shared gene boundaries and genetic code changes, which are unlikely to have arisen through convergent evolution, are useful tools in resolving deep phylogenies. Xenoturbella bocki is a morphologically simple benthic marine worm recently found to belong among the deuterostomes. Here we present analyses comparing the Xenoturbella bocki mitochondrial gene order, genetic code and control region to those of other metazoan groups. Results The complete mitochondrial genome sequence of Xenoturbella bocki was determined. The gene order is most similar to that of the chordates and the hemichordates, indicating that this conserved mitochondrial gene order might be ancestral to the deuterostome clade. Using data from all phyla of deuterostomes, we infer the ancestral mitochondrial gene order for this clade. Using inversion and breakpoint analyses of metazoan mitochondrial genomes, we test conflicting hypotheses for the phylogenetic placement of Xenoturbella and find a closer affinity to the hemichordates than to other metazoan groups. Comparative analyses of the control region reveal similarities in the transcription initiation and termination sites and origin of replication of Xenoturbella with those of the vertebrates. Phylogenetic analyses of the mitochondrial sequence indicate a weakly supported placement as a basal deuterostome, a result that may be the effect of compositional bias. Conclusion The mitochondrial genome of Xenoturbella bocki has a very conserved gene arrangement in the deuterostome group, strikingly similar to that of the hemichordates and the chordates, and thus to the ancestral deuterostome gene order. Similarity to the hemichordates in particular is suggested by inversion and breakpoint analysis. Finally, while phylogenetic analyses of the mitochondrial sequences support a

  17. Mitochondrial Genome of Palpitomonas bilix: Derived Genome Structure and Ancestral System for Cytochrome c Maturation

    PubMed Central

    Nishimura, Yuki; Tanifuji, Goro; Kamikawa, Ryoma; Yabuki, Akinori; Hashimoto, Tetsuo; Inagaki, Yuji

    2016-01-01

    We here reported the mitochondrial (mt) genome of one of the heterotrophic microeukaryotes related to cryptophytes, Palpitomonas bilix. The P. bilix mt genome was found to be a linear molecule composed of “single copy region” (∼16 kb) and repeat regions (∼30 kb) arranged in an inverse manner at both ends of the genome. Linear mt genomes with large inverted repeats are known for three distantly related eukaryotes (including P. bilix), suggesting that this particular mt genome structure has emerged at least three times in the eukaryotic tree of life. The P. bilix mt genome contains 47 protein-coding genes including ccmA, ccmB, ccmC, and ccmF, which encode protein subunits involved in the system for cytochrome c maturation inherited from a bacterium (System I). We present data indicating that the phylogenetic relatives of P. bilix, namely, cryptophytes, goniomonads, and kathablepharids, utilize an alternative system for cytochrome c maturation, which has most likely emerged during the evolution of eukaryotes (System III). To explain the distribution of Systems I and III in P. bilix and its phylogenetic relatives, two scenarios are possible: (i) System I was replaced by System III on the branch leading to the common ancestor of cryptophytes, goniomonads, and kathablepharids, and (ii) the two systems co-existed in their common ancestor, and lost differentially among the four descendants. PMID:27604877

  18. Exploring the diploid wheat ancestral A genome through sequence comparison at the high-molecular-weight glutenin locus region.

    PubMed

    Dong, Lingli; Huo, Naxin; Wang, Yi; Deal, Karin; Luo, Ming-Cheng; Wang, Daowen; Anderson, Olin D; Gu, Yong Qiang

    2012-12-01

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurately assembly of its genome. To explore the utility of ancestral diploid species of polyploid wheat, sequence variation of T. urartu (A(u)A(u)) was analyzed by comparing its 277-kb large genomic region carrying the important Glu-1 locus with the homologous regions from the A genomes of the diploid T. monococcum (A(m)A(m)), tetraploid T. turgidum (AABB), and hexaploid T. aestivum (AABBDD). Our results revealed that in addition to a high degree of the gene collinearity, nested retroelement structures were also considerably conserved among the A(u) genome and the A genomes in polyploid wheats, suggesting that the majority of the repetitive sequences in the A genomes of polyploid wheats originated from the diploid A(u) genome. The difference in the compared region between A(u) and A is mainly caused by four differential TE insertion and two deletion events between these genomes. The estimated divergence time of A genomes calculated on nucleotide substitution rate in both shared TEs and collinear genes further supports the closer evolutionary relationship of A to A(u) than to A(m). The structure conservation in the repetitive regions promoted us to develop repeat junction markers based on the A(u) sequence for mapping the A genome in hexaploid wheat. Eighty percent of these repeat junction markers were successfully mapped to the corresponding region in hexaploid wheat, suggesting that T. urartu could serve as a useful resource for developing molecular markers for genetic and breeding studies in hexaploid wheat.

  19. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations.

    PubMed

    Omberg, Larsson; Salit, Jacqueline; Hackett, Neil; Fuller, Jennifer; Matthew, Rebecca; Chouchane, Lotfi; Rodriguez-Flores, Juan L; Bustamante, Carlos; Crystal, Ronald G; Mezey, Jason G

    2012-06-26

    Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information. By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.

  20. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral "Salicoid" Genome Duplication.

    PubMed

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-06-27

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the "salicoid" duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants.

  1. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral “Salicoid” Genome Duplication

    PubMed Central

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the “salicoid” duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  2. Exploiting ancestral mammalian genomes for the prediction of human transcription factor binding sites

    PubMed Central

    2012-01-01

    Background The computational prediction of Transcription Factor Binding Sites (TFBS) remains a challenge due to their short length and low information content. Comparative genomics approaches that simultaneously consider several related species and favor sites that have been conserved throughout evolution improve the accuracy (specificity) of the predictions but are limited due to a phenomenon called binding site turnover, where sequence evolution causes one TFBS to replace another in the same region. In parallel to this development, an increasing number of mammalian genomes are now sequenced and it is becoming possible to infer, to a surprisingly high degree of accuracy, ancestral mammalian sequences. Results We propose a TFBS prediction approach that makes use of the availability of inferred ancestral mammalian genomes to improve its accuracy. This method aims to identify binding loci, which are regions of a few hundred base pairs that have preserved their potential to bind a given transcription factor over evolutionary time. After proposing a neutral evolutionary model of predicted TFBS counts in a DNA region of a given length, we use it to identify regions that have preserved the number of predicted TFBS they contain to an unexpected degree given their divergence. The approach is applied to human chromosome 1 and shows significant gains in accuracy as compared to both existing single-species and multi-species TFBS prediction approaches, in particular for transcription factors that are subject to high turnover rates. Availability The source code and predictions made by the program are available at http://www.cs.mcgill.ca/~blanchem/bindingLoci. PMID:23281809

  3. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations

    PubMed Central

    2012-01-01

    Background Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries. Results Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information. Conclusions By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought. PMID:22734698

  4. Comparative Genomics of Large Mitochondria in Placozoans

    PubMed Central

    Signorovitch, Ana Y; Buss, Leo W; Dellaporta, Stephen L

    2007-01-01

    The first sequenced mitochondrial genome of a placozoan, Trichoplax adhaerens, challenged the conventional wisdom that a compact mitochondrial genome is a common feature among all animals. Three additional placozoan mitochondrial genomes representing highly divergent clades have been sequenced to determine whether the large Trichoplax mtDNA is a shared feature among members of the phylum Placozoa or a uniquely derived condition. All three mitochondrial genomes were found to be very large, 32- to 37-kb, circular molecules, having the typical 12 respiratory chain genes, 24 tRNAs, rnS, and rnL. They share with the Trichoplax mitochondrial genome the absence of atp8, atp9, and all ribosomal protein genes, the presence of several cox1 introns, and a large open reading frame containing an intron group I LAGLIDADG endonuclease domain. The differences in mtDNA size within Placozoa are due to variation in intergenic spacer regions and the presence or absence of long open reading frames of unknown function. Phylogenetic analyses of the 12 respiratory chain genes support the monophyly of Placozoa. The similarities in composition and structure between the three mitochondrial genomes reported here and that of Trichoplax's mtDNA suggest that their uncompacted state is a shared ancestral feature to other nonmetazoans while their gene content is a derived feature shared only among the Metazoa. PMID:17222063

  5. Evolutionary convergence on highly-conserved 3' intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome.

    PubMed

    Irimia, Manuel; Roy, Scott William

    2008-08-08

    The presence of spliceosomal introns in eukaryotes raises a range of questions about genomic evolution. Along with the fundamental mysteries of introns' initial proliferation and persistence, the evolutionary forces acting on intron sequences remain largely mysterious. Intron number varies across species from a few introns per genome to several introns per gene, and the elements of intron sequences directly implicated in splicing vary from degenerate to strict consensus motifs. We report a 50-species comparative genomic study of intron sequences across most eukaryotic groups. We find two broad and striking patterns. First, we find that some highly intron-poor lineages have undergone evolutionary convergence to strong 3' consensus intron structures. This finding holds for both branch point sequence and distance between the branch point and the 3' splice site. Interestingly, this difference appears to exist within the genomes of green alga of the genus Ostreococcus, which exhibit highly constrained intron sequences through most of the intron-poor genome, but not in one much more intron-dense genomic region. Second, we find evidence that ancestral genomes contained highly variable branch point sequences, similar to more complex modern intron-rich eukaryotic lineages. In addition, ancestral structures are likely to have included polyT tails similar to those in metazoans and plants, which we found in a variety of protist lineages. Intriguingly, intron structure evolution appears to be quite different across lineages experiencing different types of genome reduction: whereas lineages with very few introns tend towards highly regular intronic sequences, lineages with very short introns tend towards highly degenerate sequences. Together, these results attest to the complex nature of ancestral eukaryotic splicing, the qualitatively different evolutionary forces acting on intron structures across modern lineages, and the impressive evolutionary malleability of eukaryotic gene

  6. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure.

    PubMed

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P

    2016-02-09

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.

  7. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure

    PubMed Central

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P.

    2016-01-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  8. Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution.

    PubMed

    Murat, Florent; Xu, Jian-Hong; Tannier, Eric; Abrouk, Michael; Guilhot, Nicolas; Pont, Caroline; Messing, Joachim; Salse, Jérôme

    2010-11-01

    The comparison of the chromosome numbers of today's species with common reconstructed paleo-ancestors has led to intense speculation of how chromosomes have been rearranged over time in mammals. However, similar studies in plants with respect to genome evolution as well as molecular mechanisms leading to mosaic synteny blocks have been lacking due to relevant examples of evolutionary zooms from genomic sequences. Such studies require genomes of species that belong to the same family but are diverged to fall into different subfamilies. Our most important crops belong to the family of the grasses, where a number of genomes have now been sequenced. Based on detailed paleogenomics, using inference from n = 5-12 grass ancestral karyotypes (AGKs) in terms of gene content and order, we delineated sequence intervals comprising a complete set of junction break points of orthologous regions from rice, maize, sorghum, and Brachypodium genomes, representing three different subfamilies and different polyploidization events. By focusing on these sequence intervals, we could show that the chromosome number variation/reduction from the n = 12 common paleo-ancestor was driven by nonrandom centric double-strand break repair events. It appeared that the centromeric/telomeric illegitimate recombination between nonhomologous chromosomes led to nested chromosome fusions (NCFs) and synteny break points (SBPs). When intervals comprising NCFs were compared in their structure, we concluded that SBPs (1) were meiotic recombination hotspots, (2) corresponded to high sequence turnover loci through repeat invasion, and (3) might be considered as hotspots of evolutionary novelty that could act as a reservoir for producing adaptive phenotypes.

  9. Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution

    PubMed Central

    Murat, Florent; Xu, Jian-Hong; Tannier, Eric; Abrouk, Michael; Guilhot, Nicolas; Pont, Caroline; Messing, Joachim; Salse, Jérôme

    2010-01-01

    The comparison of the chromosome numbers of today's species with common reconstructed paleo-ancestors has led to intense speculation of how chromosomes have been rearranged over time in mammals. However, similar studies in plants with respect to genome evolution as well as molecular mechanisms leading to mosaic synteny blocks have been lacking due to relevant examples of evolutionary zooms from genomic sequences. Such studies require genomes of species that belong to the same family but are diverged to fall into different subfamilies. Our most important crops belong to the family of the grasses, where a number of genomes have now been sequenced. Based on detailed paleogenomics, using inference from n = 5–12 grass ancestral karyotypes (AGKs) in terms of gene content and order, we delineated sequence intervals comprising a complete set of junction break points of orthologous regions from rice, maize, sorghum, and Brachypodium genomes, representing three different subfamilies and different polyploidization events. By focusing on these sequence intervals, we could show that the chromosome number variation/reduction from the n = 12 common paleo-ancestor was driven by nonrandom centric double-strand break repair events. It appeared that the centromeric/telomeric illegitimate recombination between nonhomologous chromosomes led to nested chromosome fusions (NCFs) and synteny break points (SBPs). When intervals comprising NCFs were compared in their structure, we concluded that SBPs (1) were meiotic recombination hotspots, (2) corresponded to high sequence turnover loci through repeat invasion, and (3) might be considered as hotspots of evolutionary novelty that could act as a reservoir for producing adaptive phenotypes. PMID:20876790

  10. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  11. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants.

    PubMed

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-11-01

    MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11-14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14-16 Type II MADS-box genes. The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS-box genes for the development of gymnosperms. This study is

  12. Exploring the diploid wheat ancestral A genome through sequence comparison at the High-Molecular-Weight glutenin locus region

    USDA-ARS?s Scientific Manuscript database

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurate assembly of its genome. To explore the utility of ancestral diploid species o...

  13. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

    DOE PAGES

    van Baren, Marijke J.; Bachy, Charles; Reistetter, Emily Nahas; ...

    2016-03-31

    Prasinophytes are widespread marine green algae that are related to plants. Abundance of the genus Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these organisms are important for marine ecology and understanding Virdiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb Micromonas commoda (RCC299) shows they share ≤ 8,142 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequencedmore » eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26%) GC splice donors. Micromonas has more genus-specific protein families (19%) than other genome sequenced prasinophytes (11%). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and most plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other claasses retain the entire PG pathway, like moss and glaucophyte algae. Multiple vascular plants that share a unique bi-domain protein also have the pathway, except the Penicillin-Binding-Protein. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in the PG-pathway retention and implicate a role in chloroplast structure of division in several extant Vridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their extensive divergence. PG biosynthesis genes from the cyanobacterial endosymbiont that became the

  14. Genomic organization of the crested ibis MHC provides new insight into ancestral avian MHC structure

    PubMed Central

    Chen, Li-Cheng; Lan, Hong; Sun, Li; Deng, Yan-Li; Tang, Ke-Yi; Wan, Qiu-Hong

    2015-01-01

    The major histocompatibility complex (MHC) plays an important role in immune response. Avian MHCs are not well characterized, only reporting highly compact Galliformes MHCs and extensively fragmented zebra finch MHC. We report the first genomic structure of an endangered Pelecaniformes (crested ibis) MHC containing 54 genes in three regions spanning ~500 kb. In contrast to the loose BG (26 loci within 265 kb) and Class I (11 within 150) genomic structures, the Core Region is condensed (17 within 85). Furthermore, this Region exhibits a COL11A2 gene, followed by four tandem MHC class II αβ dyads retaining two suites of anciently duplicated “αβ” lineages. Thus, the crested ibis MHC structure is entirely different from the known avian MHC architectures but similar to that of mammalian MHCs, suggesting that the fundamental structure of ancestral avian class II MHCs should be “COL11A2-IIαβ1-IIαβ2.” The gene structures, residue characteristics, and expression levels of the five class I genes reveal inter-locus functional divergence. However, phylogenetic analysis indicates that these five genes generate a well-supported intra-species clade, showing evidence for recent duplications. Our analyses suggest dramatic structural variation among avian MHC lineages, help elucidate avian MHC evolution, and provide a foundation for future conservation studies. PMID:25608659

  15. Minimal Conflicting Sets for the Consecutive Ones Property in Ancestral Genome Reconstruction

    NASA Astrophysics Data System (ADS)

    Chauve, Cedric; Haus, Utz-Uwe; Stephen, Tamon; You, Vivija P.

    A binary matrix has the Consecutive Ones Property (C1P) if its columns can be ordered in such a way that all 1’s on each row are consecutive. A Minimal Conflicting Set is a set of rows that does not have the C1P, but every proper subset has the C1P. Such submatrices have been considered in comparative genomics applications, but very little is known about their combinatorial structure and efficient algorithms to compute them. We first describe an algorithm that detects rows that belong to Minimal Conflicting Sets. This algorithm has a polynomial time complexity when the number of 1s in each row of the considered matrix is bounded by a constant. Next, we show that the problem of computing all Minimal Conflicting Sets can be reduced to the joint generation of all minimal true clause and maximal false clauses for some monotone boolean function. We use these methods in preliminary experiments on simulated data related to ancestral genome reconstruction.

  16. Ancestral genome reconstruction identifies the evolutionary basis for trait acquisition in polyphosphate accumulating bacteria

    PubMed Central

    Oyserman, Ben O; Moya, Francisco; Lawson, Christopher E; Garcia, Antonio L; Vogt, Mark; Heffernen, Mitchell; Noguera, Daniel R; McMahon, Katherine D

    2016-01-01

    The evolution of complex traits is hypothesized to occur incrementally. Identifying the transitions that lead to extant complex traits may provide a better understanding of the genetic nature of the observed phenotype. A keystone functional group in wastewater treatment processes are polyphosphate accumulating organisms (PAOs), however the evolution of the PAO phenotype has yet to be explicitly investigated and the specific metabolic traits that discriminate non-PAO from PAO are currently unknown. Here we perform the first comprehensive investigation on the evolution of the PAO phenotype using the model uncultured organism Candidatus Accumulibacter phosphatis (Accumulibacter) through ancestral genome reconstruction, identification of horizontal gene transfer, and a kinetic/stoichiometric characterization of Accumulibacter Clade IIA. The analysis of Accumulibacter's last common ancestor identified 135 laterally derived genes, including genes involved in glycogen, polyhydroxyalkanoate, pyruvate and NADH/NADPH metabolisms, as well as inorganic ion transport and regulatory mechanisms. In contrast, pathways such as the TCA cycle and polyphosphate metabolism displayed minimal horizontal gene transfer. We show that the transition from non-PAO to PAO coincided with horizontal gene transfer within Accumulibacter's core metabolism; likely alleviating key kinetic and stoichiometric bottlenecks, such as anaerobically linking glycogen degradation to polyhydroxyalkanoate synthesis. These results demonstrate the utility of investigating the derived genome of a lineage to identify key transitions leading to an extant complex phenotype. PMID:27128993

  17. Ancestral genome reconstruction identifies the evolutionary basis for trait acquisition in polyphosphate accumulating bacteria.

    PubMed

    Oyserman, Ben O; Moya, Francisco; Lawson, Christopher E; Garcia, Antonio L; Vogt, Mark; Heffernen, Mitchell; Noguera, Daniel R; McMahon, Katherine D

    2016-12-01

    The evolution of complex traits is hypothesized to occur incrementally. Identifying the transitions that lead to extant complex traits may provide a better understanding of the genetic nature of the observed phenotype. A keystone functional group in wastewater treatment processes are polyphosphate accumulating organisms (PAOs), however the evolution of the PAO phenotype has yet to be explicitly investigated and the specific metabolic traits that discriminate non-PAO from PAO are currently unknown. Here we perform the first comprehensive investigation on the evolution of the PAO phenotype using the model uncultured organism Candidatus Accumulibacter phosphatis (Accumulibacter) through ancestral genome reconstruction, identification of horizontal gene transfer, and a kinetic/stoichiometric characterization of Accumulibacter Clade IIA. The analysis of Accumulibacter's last common ancestor identified 135 laterally derived genes, including genes involved in glycogen, polyhydroxyalkanoate, pyruvate and NADH/NADPH metabolisms, as well as inorganic ion transport and regulatory mechanisms. In contrast, pathways such as the TCA cycle and polyphosphate metabolism displayed minimal horizontal gene transfer. We show that the transition from non-PAO to PAO coincided with horizontal gene transfer within Accumulibacter's core metabolism; likely alleviating key kinetic and stoichiometric bottlenecks, such as anaerobically linking glycogen degradation to polyhydroxyalkanoate synthesis. These results demonstrate the utility of investigating the derived genome of a lineage to identify key transitions leading to an extant complex phenotype.

  18. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    SciTech Connect

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  19. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

    SciTech Connect

    van Baren, Marijke J.; Bachy, Charles; Reistetter, Emily Nahas; Purvine, Samuel O.; Grimwood, Jane; Sudek, Sebastian; Yu, Hang; Poirier, Camille; Deerinck, Thomas J.; Kuo, Alan; Grigoriev, Igor V.; Wong, Chee -Hong; Smith, Richard D.; Callister, Stephen J.; Wei, Chia -Lin; Schmutz, Jeremy; Worden, Alexandra Z.

    2016-03-31

    Prasinophytes are widespread marine green algae that are related to plants. Abundance of the genus Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these organisms are important for marine ecology and understanding Virdiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb Micromonas commoda (RCC299) shows they share ≤ 8,142 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequenced eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26%) GC splice donors. Micromonas has more genus-specific protein families (19%) than other genome sequenced prasinophytes (11%). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and most plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other claasses retain the entire PG pathway, like moss and glaucophyte algae. Multiple vascular plants that share a unique bi-domain protein also have the pathway, except the Penicillin-Binding-Protein. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in the PG-pathway retention and implicate a role in chloroplast structure of division in several extant Vridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their extensive divergence. PG biosynthesis genes from the

  20. The mitochondrial genome of the onychophoran Opisthopatus cinctipes (Peripatopsidae) reflects the ancestral mitochondrial gene arrangement of Panarthropoda and Ecdysozoa.

    PubMed

    Braband, Anke; Cameron, Stephen L; Podsiadlowski, Lars; Daniels, Savel R; Mayer, Georg

    2010-10-01

    The ancestral genome composition in Onychophora (velvet worms) is unknown since only a single species of Peripatidae has been studied thus far, which shows a highly derived gene order with numerous translocated genes. Due to this lack of information from Onychophora, it is difficult to infer the ancestral mitochondrial gene arrangement patterns for Panarthropoda and Ecdysozoa. Hence, we analyzed the complete mitochondrial genome of the onychophoran Opisthopatus cinctipes, a representative of Peripatopsidae. Our data show that O. cinctipes possesses a highly conserved gene order, similar to that found in various arthropods. By comparing our results to those from different outgroups, we reconstruct the ancestral gene arrangement in Panarthropoda and Ecdysozoa. Our phylogenetic analysis of protein-coding gene sequences from 60 protostome species (including outgroups) provides some support for the sister group relationship of Onychophora and Arthropoda, which was not recovered by using a single species of Peripatidae, Epiperipatus biolleyi, in a previous study. A comparison of the strand-specific bias between onychophorans, arthropods, and a priapulid suggests that the peripatid E. biolleyi is less suitable for phylogenetic analyses of Ecdysozoa using mitochondrial genomic data than the peripatopsid O. cinctipes.

  1. Genome Size and GC Content Evolution of Festuca: Ancestral Expansion and Subsequent Reduction

    PubMed Central

    Šmarda, Petr; Bureš, Petr; Horová, Lucie; Foggi, Bruno; Rossi, Graziano

    2008-01-01

    Background and Aims Plant evolution is well known to be frequently associated with remarkable changes in genome size and composition; however, the knowledge of long-term evolutionary dynamics of these processes still remains very limited. Here a study is made of the fine dynamics of quantitative genome evolution in Festuca (fescue), the largest genus in Poaceae (grasses). Methods Using flow cytometry (PI, DAPI), measurements were made of DNA content (2C-value), monoploid genome size (Cx-value), average chromosome size (C/n-value) and cytosine + guanine (GC) content of 101 Festuca taxa and 14 of their close relatives. The results were compared with the existing phylogeny based on ITS and trnL-F sequences. Key Results The divergence of the fescue lineage from related Poeae was predated by about a 2-fold monoploid genome and chromosome size enlargement, and apparent GC content enrichment. The backward reduction of these parameters, running parallel in both main evolutionary lineages of fine-leaved and broad-leaved fescues, appears to diverge among the existing species groups. The most dramatic reductions are associated with the most recently and rapidly evolving groups which, in combination with recent intraspecific genome size variability, indicate that the reduction process is probably ongoing and evolutionarily young. This dynamics may be a consequence of GC-rich retrotransposon proliferation and removal. Polyploids derived from parents with a large genome size and high GC content (mostly allopolyploids) had smaller Cx- and C/n-values and only slightly deviated from parental GC content, whereas polyploids derived from parents with small genome and low GC content (mostly autopolyploids) generally had a markedly increased GC content and slightly higher Cx- and C/n-values. Conclusions The present study indicates the high potential of general quantitative characters of the genome for understanding the long-term processes of genome evolution, testing evolutionary

  2. Phylogenomics of nonavian reptiles and the structure of the ancestral amniote genome.

    PubMed

    Shedlock, Andrew M; Botka, Christopher W; Zhao, Shaying; Shetty, Jyoti; Zhang, Tingting; Liu, Jun S; Deschavanne, Patrick J; Edwards, Scott V

    2007-02-20

    We report results of a megabase-scale phylogenomic analysis of the Reptilia, the sister group of mammals. Large-scale end-sequence scanning of genomic clones of a turtle, alligator, and lizard reveals diverse, mammal-like landscapes of retroelements and simple sequence repeats (SSRs) not found in the chicken. Several global genomic traits, including distinctive phylogenetic lineages of CR1-like long interspersed elements (LINEs) and a paucity of A-T rich SSRs, characterize turtles and archosaur genomes, whereas higher frequencies of tandem repeats and a lower global GC content reveal mammal-like features in Anolis. Nonavian reptile genomes also possess a high frequency of diverse and novel 50-bp unit tandem duplications not found in chicken or mammals. The frequency distributions of approximately 65,000 8-mer oligonucleotides suggest that rates of DNA-word frequency change are an order of magnitude slower in reptiles than in mammals. These results suggest a diverse array of interspersed and SSRs in the common ancestor of amniotes and a genomic conservatism and gradual loss of retroelements in reptiles that culminated in the minimalist chicken genome. The sequences reported in this paper have been deposited in the GenBank database (accession nos. CZ 250707-CZ 257443 and DX 390731-DX 389174).

  3. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus.

    PubMed

    Vakirlis, Nikolaos; Sarilar, Véronique; Drillon, Guénola; Fleiss, Aubin; Agier, Nicolas; Meyniel, Jean-Philippe; Blanpain, Lou; Carbone, Alessandra; Devillers, Hugo; Dubois, Kenny; Gillet-Markowska, Alexandre; Graziani, Stéphane; Huu-Vang, Nguyen; Poirel, Marion; Reisser, Cyrielle; Schott, Jonathan; Schacherer, Joseph; Lafontaine, Ingrid; Llorente, Bertrand; Neuvéglise, Cécile; Fischer, Gilles

    2016-07-01

    Reconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements, and protein divergence into a single evolutionary framework. © 2016 Vakirlis et al.; Published by Cold Spring Harbor Laboratory Press.

  4. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus

    PubMed Central

    Vakirlis, Nikolaos; Sarilar, Véronique; Drillon, Guénola; Fleiss, Aubin; Agier, Nicolas; Meyniel, Jean-Philippe; Blanpain, Lou; Carbone, Alessandra; Devillers, Hugo; Dubois, Kenny; Gillet-Markowska, Alexandre; Graziani, Stéphane; Huu-Vang, Nguyen; Poirel, Marion; Reisser, Cyrielle; Schott, Jonathan; Schacherer, Joseph; Lafontaine, Ingrid; Llorente, Bertrand; Neuvéglise, Cécile; Fischer, Gilles

    2016-01-01

    Reconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements, and protein divergence into a single evolutionary framework. PMID:27247244

  5. Vertebrate codon bias indicates a highly GC-rich ancestral genome.

    PubMed

    Nabiyouni, Maryam; Prakash, Ashwin; Fedorov, Alexei

    2013-04-25

    Two factors are thought to have contributed to the origin of codon usage bias in eukaryotes: 1) genome-wide mutational forces that shape overall GC-content and create context-dependent nucleotide bias, and 2) positive selection for codons that maximize efficient and accurate translation. Particularly in vertebrates, these two explanations contradict each other and cloud the origin of codon bias in the taxon. On the one hand, mutational forces fail to explain GC-richness (~60%) of third codon positions, given the GC-poor overall genomic composition among vertebrates (~40%). On the other hand, positive selection cannot easily explain strict regularities in codon preferences. Large-scale bioinformatic assessment, of nucleotide composition of coding and non-coding sequences in vertebrates and other taxa, suggests a simple possible resolution for this contradiction. Specifically, we propose that the last common vertebrate ancestor had a GC-rich genome (~65% GC). The data suggest that whole-genome mutational bias is the major driving force for generating codon bias. As the bias becomes prominent, it begins to affect translation and can result in positive selection for optimal codons. The positive selection can, in turn, significantly modulate codon preferences.

  6. Comparative genome-scale reconstruction of gapless metabolic networks for present and ancestral species.

    PubMed

    Pitkänen, Esa; Jouhten, Paula; Hou, Jian; Syed, Muhammad Fahad; Blomberg, Peter; Kludas, Jana; Oja, Merja; Holm, Liisa; Penttilä, Merja; Rousu, Juho; Arvas, Mikko

    2014-02-01

    We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.

  7. A linear mitochondrial genome of Cyclospora cayetanensis (Eimeriidae, Eucoccidiorida, Coccidiasina, Apicomplexa) suggests the ancestral start position within mitochondrial genomes of eimeriid coccidia.

    PubMed

    Ogedengbe, Mosun E; Qvarnstrom, Yvonne; da Silva, Alexandre J; Arrowood, Michael J; Barta, John R

    2015-05-01

    The near complete mitochondrial genome for Cyclospora cayetanensis is 6184 bp in length with three protein-coding genes (Cox1, Cox3, CytB) and numerous lsrDNA and ssrDNA fragments. Gene arrangements were conserved with other coccidia in the Eimeriidae, but the C. cayetanensis mitochondrial genome is not circular-mapping. Terminal transferase tailing and nested PCR completed the 5'-terminus of the genome starting with a 21 bp A/T-only region that forms a potential stem-loop. Regions homologous to the C. cayetanensis mitochondrial genome 5'-terminus are found in all eimeriid mitochondrial genomes available and suggest this may be the ancestral start of eimeriid mitochondrial genomes. Copyright © 2015 Australian Society for Parasitology Inc. All rights reserved.

  8. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage.

    PubMed

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H; Hugenholtz, Philip

    2016-04-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0-1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms.

  9. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage

    PubMed Central

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H.; Hugenholtz, Philip

    2016-01-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0–1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms. PMID:26615204

  10. Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing.

    PubMed

    Bodian, Dale L; McCutcheon, Justine N; Kothiyal, Prachi; Huddleston, Kathi C; Iyer, Ramaswamy K; Vockley, Joseph G; Niederhuber, John E

    2014-01-01

    Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.

  11. Evaluation of the TREX1 gene in a large multi-ancestral lupus cohort

    PubMed Central

    Namjou, Bahram; Kothari, Parul H.; Kelly, Jennifer A.; Glenn, Stuart B.; Ojwang, Joshua O.; Adler, Adam; Alarcón-Riquelme, Marta E.; Gallant, Caroline J.; Boackle, Susan A.; Criswell, Lindsey A.; Kimberly, Robert P.; Brown, Elizabeth; Edberg, Jeffrey; Stevens, Anne M.; Jacob, Chaim O.; Tsao, Betty P.; Gilkeson, Gary S.; Kamen, Diane L.; Merrill, Joan T.; Petri, Michelle; Goldman, Rosalind Ramsey; Vila, Luis M.; Anaya, Juan-Manuel; Niewold, Timothy B.; Martin, Javier; Pons-Estel, Bernardo A.; Sabio, Jose M.; Callejas, Jose L.; Vyse, Timothy J.; Bae, Sang-Cheol; Perrino, Fred W.; Freedman, Barry I.; Scofield, R. Hal; Moser, Kathy L.; Gaffney, Patrick M.; James, Judith A.; Langefeld, Carl D.; Kaufman, Kenneth M.; Harley, John B.; Atkinson, John P.

    2011-01-01

    Systemic Lupus Erythematosus (SLE) is a prototypic autoimmune disorder with a complex pathogenesis in which genetic, hormonal and environmental factors play a role. Rare mutations in the TREX1 gene, the major mammalian 3′-5′ exonuclease, have been reported in sporadic SLE cases. Some of these mutations have also been identified in a rare pediatric neurologic condition featuring an inflammatory encephalopathy known as Aicardi-Goutières syndrome (AGS). We sought to investigate the frequency of these mutations in a large multi-ancestral cohort of SLE cases and controls. Methods Forty single-nucleotide polymorphisms (SNPs), including both common and rare variants, across the TREX1 gene were evaluated in ∼8370 patients with SLE and ∼7490 control subjects. Stringent quality control procedures were applied and principal components and admixture proportions were calculated to identify outliers for removal from analysis. Population-based case-control association analyses were performed. P values, false discovery rate q values, and odds ratios with 95% confidence intervals were calculated. Results The estimated frequency of TREX1 mutations in our lupus cohort was 0.5%. Five heterozygous mutations were detected at the Y305C polymorphism in European lupus cases but none were observed in European controls. Five African cases incurred heterozygous mutations at the E266G polymorphism and, again, none were observed in the African controls. A rare homozygous R114H mutation was identified in one Asian SLE patient whereas all genotypes at this mutation in previous reports for SLE were heterozygous. Analysis of common TREX1 SNPs (MAF >10%) revealed a relatively common risk haplotype in European SLE patients with neurologic manifestations, especially seizures, with a frequency of 58% in lupus cases compared to 45% in normal controls (p=0.0008, OR=1.73, 95% CI=1.25-2.39). Finally, the presence or absence of specific autoantibodies in certain populations produced significant

  12. Genomic structure and evolution of the ancestral chromosome fusion site in 2q13-2q14.1 and paralogous regions on other human chromosomes.

    PubMed

    Fan, Yuxin; Linardopoulou, Elena; Friedman, Cynthia; Williams, Eleanor; Trask, Barbara J

    2002-11-01

    Human chromosome 2 was formed by the head-to-head fusion of two ancestral chromosomes that remained separate in other primates. Sequences that once resided near the ends of the ancestral chromosomes are now interstitially located in 2q13-2q14.1. Portions of these sequences had duplicated to other locations prior to the fusion. Here we present analyses of the genomic structure and evolutionary history of >600 kb surrounding the fusion site and closely related sequences on other human chromosomes. Sequence blocks that closely flank the inverted arrays of degenerate telomere repeats marking the fusion site are duplicated at many, primarily subtelomeric, locations. In addition, large portions of a 168-kb centromere-proximal block are duplicated at 9pter, 9p11.2, and 9q13, with 98%-99% average sequence identity. A 67-kb block on the distal side of the fusion site is highly homologous to sequences at 22qter. A third ~100-kb segment is 96% identical to a region in 2q11.2. By integrating data on the extent and similarity of these paralogous blocks, including the presence of phylogenetically informative repetitive elements, with observations of their chromosomal distribution in nonhuman primates, we infer the order of the duplications that led to their current arrangement. Several of these duplicated blocks may be associated with breakpoints of inversions that occurred during primate evolution and of recurrent chromosome rearrangements in humans.

  13. Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice.

    PubMed

    Brozynska, Marta; Copetti, Dario; Furtado, Agnelo; Wing, Rod A; Crayn, Darren; Fox, Glen; Ishikawa, Ryuji; Henry, Robert J

    2016-11-27

    The related A genome species of the Oryza genus are the effective gene pool for rice. Here, we report draft genomes for two Australian wild A genome taxa: O. rufipogon-like population, referred to as Taxon A, and O. meridionalis-like population, referred to as Taxon B. These two taxa were sequenced and assembled by integration of short- and long-read next-generation sequencing (NGS) data to create a genomic platform for a wider rice gene pool. Here, we report that, despite the distinct chloroplast genome, the nuclear genome of the Australian Taxon A has a sequence that is much closer to that of domesticated rice (O. sativa) than to the other Australian wild populations. Analysis of 4643 genes in the A genome clade showed that the Australian annual, O. meridionalis, and related perennial taxa have the most divergent (around 3 million years) genome sequences relative to domesticated rice. A test for admixture showed possible introgression into the Australian Taxon A (diverged around 1.6 million years ago) especially from the wild indica/O. nivara clade in Asia. These results demonstrate that northern Australia may be the centre of diversity of the A genome Oryza and suggest the possibility that this might also be the centre of origin of this group and represent an important resource for rice improvement.

  14. Genomic Structure and Evolution of the Ancestral Chromosome Fusion Site in 2q13–2q14.1 and Paralogous Regions on Other Human Chromosomes

    PubMed Central

    Fan, Yuxin; Linardopoulou, Elena; Friedman, Cynthia; Williams, Eleanor; Trask, Barbara J.

    2002-01-01

    Human chromosome 2 was formed by the head-to-head fusion of two ancestral chromosomes that remained separate in other primates. Sequences that once resided near the ends of the ancestral chromosomes are now interstitially located in 2q13–2q14.1. Portions of these sequences had duplicated to other locations prior to the fusion. Here we present analyses of the genomic structure and evolutionary history of >600 kb surrounding the fusion site and closely related sequences on other human chromosomes. Sequence blocks that closely flank the inverted arrays of degenerate telomere repeats marking the fusion site are duplicated at many, primarily subtelomeric, locations. In addition, large portions of a 168-kb centromere-proximal block are duplicated at 9pter, 9p11.2, and 9q13, with 98%–99% average sequence identity. A 67-kb block on the distal side of the fusion site is highly homologous to sequences at 22qter. A third ∼100-kb segment is 96% identical to a region in 2q11.2. By integrating data on the extent and similarity of these paralogous blocks, including the presence of phylogenetically informative repetitive elements, with observations of their chromosomal distribution in nonhuman primates, we infer the order of the duplications that led to their current arrangement. Several of these duplicated blocks may be associated with breakpoints of inversions that occurred during primate evolution and of recurrent chromosome rearrangements in humans. [Supplemental material is available online at http://www.genome.org. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: T. Newman, C. Harris, and J. Young.] PMID:12421751

  15. Reconstructing ancestral gene content by coevolution.

    PubMed

    Tuller, Tamir; Birin, Hadas; Gophna, Uri; Kupiec, Martin; Ruppin, Eytan

    2010-01-01

    Inferring the gene content of ancestral genomes is a fundamental challenge in molecular evolution. Due to the statistical nature of this problem, ancestral genomes inferred by the maximum likelihood (ML) or the maximum-parsimony (MP) methods are prone to considerable error rates. In general, these errors are difficult to abolish by using longer genomic sequences or by analyzing more taxa. This study describes a new approach for improving ancestral genome reconstruction, the ancestral coevolver (ACE), which utilizes coevolutionary information to improve the accuracy of such reconstructions over previous approaches. The principal idea is to reduce the potentially large solution space by choosing a single optimal (or near optimal) solution that is in accord with the coevolutionary relationships between protein families. Simulation experiments, both on artificial and real biological data, show that ACE yields a marked decrease in error rate compared with ML or MP. Applied to a large data set (95 organisms, 4873 protein families, and 10,000 coevolutionary relationships), some of the ancestral genomes reconstructed by ACE were remarkably different in their gene content from those reconstructed by ML or MP alone (more than 10% in some nodes). These reconstructions, while having almost similar likelihood/parsimony scores as those obtained with ML/MP, had markedly higher concordance with the coevolutionary information. Specifically, when ACE was implemented to improve the results of ML, it added a large number of proteins to those encoded by LUCA (last universal common ancestor), most of them ribosomal proteins and components of the F(0)F(1)-type ATP synthase/ATPases, complexes that are vital in most living organisms. Our analysis suggests that LUCA appears to have been bacterial-like and had a genome size similar to the genome sizes of many extant organisms.

  16. Phylogenomics of primates and their ancestral populations

    PubMed Central

    Siepel, Adam

    2009-01-01

    Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics. PMID:19801602

  17. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs.

    PubMed

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-02-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs-Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa.

  18. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs

    PubMed Central

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-01-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs—Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa. PMID:26419336

  19. Genetic divergence and admixture of ancestral genome groups in the sugarcane variety 'RB867515' (Saccharum spp).

    PubMed

    Maranho, G B; Maranho, R C; Desordi, R; das Neves, A F; Mangolin, C A; Machado, M F P S

    2016-12-02

    We analyzed 80 plants of the sugarcane (Saccharum spp) variety 'RB867515' in order to investigate its diversity and genetic structure at the molecular level. Four simple sequence repeat (SSR) loci (UGSM51, SMC1237, SEGMS1069, and UGSM38) and five expressed sequence tag (EST)-SSR loci (ESTA68, ESTB92, ESTB145, ESTC66, and ESTC84) were used as molecular markers. The polymorphic loci rate was 66.6%. A total of 17 alleles and an average of 1.88 alleles/locus were detected. The number of alleles in the EST-SSR loci was lower than the number of alleles in the SSRs of non-expressed loci. The mean observed heterozygosity among the nine SSR loci was 0.3291. Genetic structure analysis showed that 'RB867515' contains alleles from three ancestral groups (K = 3), but there is little admixing of alleles in the same plant (from 0.8 to 17.3%); only 1.88% of the plants shared alleles from two or three groups. ESTB92, ESTC84, and UGSM38 were monomorphic, but there was evidence of polymorphism in ESTA68, ESTB145, ESTC66, UGSM51, SMC1237, and SEGMS1069, indicating that 'RB867515' has variability at the molecular level and the potential to be used as a parent in breeding programs. The molecular variability observed in 'RB867515' indicates that the clone terminology that is used to identify this cultivar is inconsistent with the original meaning of "clone", which is defined as a sample of genetically identical plants.

  20. The Mitochondrial Genome of the Guanaco Louse, Microthoracius praelongiceps: Insights into the Ancestral Mitochondrial Karyotype of Sucking Lice (Anoplura, Insecta)

    PubMed Central

    Li, Hu; Barker, Stephen C.

    2017-01-01

    Fragmented mitochondrial (mt) genomes have been reported in 11 species of sucking lice (suborder Anoplura) that infest humans, chimpanzees, pigs, horses, and rodents. There is substantial variation among these lice in mt karyotype: the number of minichromosomes of a species ranges from 9 to 20; the number of genes in a minichromosome ranges from 1 to 8; gene arrangement in a minichromosome differs between species, even in the same genus. We sequenced the mt genome of the guanaco louse, Microthoracius praelongiceps, to help establish the ancestral mt karyotype for sucking lice and understand how fragmented mt genomes evolved. The guanaco louse has 12 mt minichromosomes; each minichromosome has 2–5 genes and a non-coding region. The guanaco louse shares many features with rodent lice in mt karyotype, more than with other sucking lice. The guanaco louse, however, is more closely related phylogenetically to human lice, chimpanzee lice, pig lice, and horse lice than to rodent lice. By parsimony analysis of shared features in mt karyotype, we infer that the most recent common ancestor of sucking lice, which lived ∼75 Ma, had 11 minichromosomes; each minichromosome had 1–6 genes and a non-coding region. As sucking lice diverged, split of mt minichromosomes occurred many times in the lineages leading to the lice of humans, chimpanzees, and rodents whereas merger of minichromosomes occurred in the lineage leading to the lice of pigs and horses. Together, splits and mergers of minichromosomes created a very complex and dynamic mt genome organization in the sucking lice. PMID:28164215

  1. The complete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: the ancestral gene arrangement in Lepidoptera

    PubMed Central

    2012-01-01

    Background Lepidoptera encompasses more than 160,000 described species that have been classified into 45–48 superfamilies. The previously determined Lepidoptera mitochondrial genomes (mitogenomes) are limited to six superfamilies of the lineage Ditrysia. Compared with the ancestral insect gene order, these mitogenomes all contain a tRNA rearrangement. To gain new insights into Lepidoptera mitogenome evolution, we sequenced the mitogenomes of two ghost moths that belong to the non-ditrysian lineage Hepialoidea and conducted a comparative mitogenomic analysis across Lepidoptera. Results The mitogenomes of Thitarodes renzhiensis and T. yunnanensis are 16,173 bp and 15,816 bp long with an A + T content of 81.28 % and 82.34 %, respectively. Both mitogenomes include 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and the A + T-rich region. Different tandem repeats in the A + T-rich region mainly account for the size difference between the two mitogenomes. All the protein-coding genes start with typical mitochondrial initiation codons, except for cox1 (CGA) and nad1 (TTG) in both mitogenomes. The anticodon of trnS(AGN) in T. renzhiensis and T. yunnanensis is UCU instead of the mostly used GCU in other sequenced Lepidoptera mitogenomes. The 1,584-bp sequence from rrnS to nad2 was also determined for an unspecified ghost moth (Thitarodes sp.), which has no repetitive sequence in the A + T-rich region. All three Thitarodes species possess the ancestral gene order with trnI-trnQ-trnM located between the A + T-rich region and nad2, which is different from the gene order trnM-trnI-trnQ in all previously sequenced Lepidoptera species. The formerly identified conserved elements of Lepidoptera mitogenomes (i.e. the motif ‘ATAGA’ and poly-T stretch in the A + T-rich region and the long intergenic spacer upstream of nad2) are absent in the Thitarodes mitogenomes. Conclusion The mitogenomes of T. renzhiensis and T

  2. The Demosponge Amphimedon queenslandica: Reconstructing the Ancestral Metazoan Genome and Deciphering the Origin of Animal Multicellularity.

    PubMed

    Degnan, Bernard M; Adamska, Maja; Craigie, Alina; Degnan, Sandie M; Fahey, Bryony; Gauthier, Marie; Hooper, John N A; Larroux, Claire; Leys, Sally P; Lovas, Erica; Richards, Gemma S

    2008-12-01

    INTRODUCTIONSponges are one of the earliest branching metazoans. In addition to undergoing complex development and differentiation, they can regenerate via stem cells and can discern self from nonself ("allorecognition"), making them a useful comparative model for a range of metazoan-specific processes. Molecular analyses of these processes have the potential to reveal ancient homologies shared among all living animals and critical genomic innovations that underpin metazoan multicellularity. Amphimedon queenslandica (Porifera, Demospongiae, Haplosclerida, Niphatidae) is the first poriferan representative to have its genome sequenced, assembled, and annotated. Amphimedon exemplifies many sessile and sedentary marine invertebrates (e.g., corals, ascidians, bryozoans): They disperse during a planktonic larval phase, settle in the vicinity of conspecifics, ward off potential competitors (including incompatible genotypes), and ensure that brooded eggs are fertilized by conspecific sperm. Using genomic and expressed sequence tag (EST) resources from Amphimedon, functional genomic approaches can be applied to a wide range of ecological and population genetic processes, including fertilization, dispersal, and colonization dynamics, host-symbiont interactions, and secondary metabolite production. Unlike most other sponges, Amphimedon produce hundreds of asynchronously developing embryos and larvae year-round in distinct, easily accessible brood chambers. Embryogenesis gives rise to larvae with at least a dozen cell types that are segregated into three layers and patterned along the body axis. In this article, we describe some of the methods currently available for studying A. queenslandica, focusing on the analysis of embryos, larvae, and post-larvae.

  3. Ancient human genomes suggest three ancestral populations for present-day Europeans.

    PubMed

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H; Schraiber, Joshua G; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M; Brisighelli, Francesca; Busby, George B J; Cali, Francesco; Churnosov, Mikhail; Cole, David E C; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A; Gallego Romero, Irene; Gubina, Marina; Hammer, Michael; Henn, Brenna M; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl A; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G; Ruiz-Linares, Andres; Tishkoff, Sarah A; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-09-18

    We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages.

  4. Ancient human genomes suggest three ancestral populations for present-day Europeans

    PubMed Central

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H.; Schraiber, Joshua G.; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I.; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A.; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M.; Brisighelli, Francesca; Busby, George B. J.; Cali, Francesco; Churnosov, Mikhail; Cole, David E. C.; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A.; Romero, Irene Gallego; Gubina, Marina; Hammer, Michael; Henn, Brenna M.; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R.; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W.; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B.; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G.; Ruiz-Linares, Andres; Tishkoff, Sarah A.; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E.; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-01-01

    We sequenced the genomes of a ~7,000 year old farmer from Germany and eight ~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other ancient genomes1–4 with 2,345 contemporary humans to show that most present Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations’ deep relationships and show that EEF had ~44% ancestry from a “Basal Eurasian” population that split prior to the diversification of other non-African lineages. PMID:25230663

  5. Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires

    PubMed Central

    2011-01-01

    Background Genome size and complexity, as measured by the number of genes or protein domains, is remarkably similar in most extant eukaryotes and generally exhibits no correlation with their morphological complexity. Underlying trends in the evolution of the functional content and capabilities of different eukaryotic genomes might be hidden by simultaneous gains and losses of genes. Results We reconstructed the domain repertoires of putative ancestral species at major divergence points, including the last eukaryotic common ancestor (LECA). We show that, surprisingly, during eukaryotic evolution domain losses in general outnumber domain gains. Only at the base of the animal and the vertebrate sub-trees do domain gains outnumber domain losses. The observed gain/loss balance has a distinct functional bias, most strikingly seen during animal evolution, where most of the gains represent domains involved in regulation and most of the losses represent domains with metabolic functions. This trend is so consistent that clustering of genomes according to their functional profiles results in an organization similar to the tree of life. Furthermore, our results indicate that metabolic functions lost during animal evolution are likely being replaced by the metabolic capabilities of symbiotic organisms such as gut microbes. Conclusions While protein domain gains and losses are common throughout eukaryote evolution, losses oftentimes outweigh gains and lead to significant differences in functional profiles. Results presented here provide additional arguments for a complex last eukaryotic common ancestor, but also show a general trend of losses in metabolic capabilities and gain in regulatory complexity during the rise of animals. PMID:21241503

  6. Genesis of the vertebrate FoxP subfamily member genes occurred during two ancestral whole genome duplication events.

    PubMed

    Song, Xiaowei; Tang, Yezhong; Wang, Yajun

    2016-08-22

    The vertebrate FoxP subfamily genes play important roles in the construction of essential functional modules involved in physiological and developmental processes. To explore the adaptive evolution of functional modules associated with the FoxP subfamily member genes, it is necessary to study the gene duplication process. We detected four member genes of the FoxP subfamily in sea lampreys (a representative species of jawless vertebrates) through genome screenings and phylogenetic analyses. Reliable paralogons (i.e. paralogous chromosome segments) have rarely been detected in scaffolds of FoxP subfamily member genes in sea lampreys due to the considerable existence of HTH_Tnp_Tc3_2 transposases. However, these transposases did not alter gene numbers of the FoxP subfamily in sea lampreys. The coincidence between the "1-4" gene duplication pattern of FoxP subfamily genes from invertebrates to vertebrates and two rounds of ancestral whole genome duplication (1R- and 2R-WGD) events reveal that the FoxP subfamily of vertebrates was quadruplicated in the 1R- and 2R-WGD events. Furthermore, we deduced that a synchronous gene duplication process occurred for the FoxP subfamily and for three linked gene families/subfamilies (i.e. MIT family, mGluR group III and PLXNA subfamily) in the 1R- and 2R-WGD events using phylogenetic analyses and mirror-dendrogram methods (i.e. algorithms to test protein-protein interactions). Specifically, the ancestor of FoxP1 and FoxP3 and the ancestor of FoxP2 and FoxP4 were generated in 1R-WGD event. In the subsequent 2R-WGD event, these two ancestral genes were changed into FoxP1, FoxP2, FoxP3 and FoxP4. The elucidation of these gene duplication processes shed light on the phylogenetic relationships between functional modules of the FoxP subfamily member genes.

  7. Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma

    PubMed Central

    2011-01-01

    Background Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Results Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocladium virens, teleomorph Hypocrea virens), and a comparison with Trichoderma reesei (teleomorph Hypocrea jecorina). These three Trichoderma species display a remarkable conservation of gene order (78 to 96%), and a lack of active mobile elements probably due to repeat-induced point mutation. Several gene families are expanded in the two mycoparasitic species relative to T. reesei or other ascomycetes, and are overrepresented in non-syntenic genome regions. A phylogenetic analysis shows that T. reesei and T. virens are derived relative to T. atroviride. The mycoparasitism-specific genes thus arose in a common Trichoderma ancestor but were subsequently lost in T. reesei. Conclusions The data offer a better understanding of mycoparasitism, and thus enforce the development of improved biocontrol strains for efficient and environmentally friendly protection of plants. PMID:21501500

  8. The vertebrate makorin ubiquitin ligase gene family has been shaped by large-scale duplication and retroposition from an ancestral gonad-specific, maternal-effect gene

    PubMed Central

    2010-01-01

    Background Members of the makorin (mkrn) gene family encode RING/C3H zinc finger proteins with U3 ubiquitin ligase activity. Although these proteins have been described in a variety of eukaryotes such as plants, fungi, invertebrates and vertebrates including human, almost nothing is known about their structural and functional evolution. Results Via partial sequencing of a testis cDNA library from the poeciliid fish Xiphophorus maculatus, we have identified a new member of the makorin gene family, that we called mkrn4. In addition to the already described mkrn1 and mkrn2, mkrn4 is the third example of a makorin gene present in both tetrapods and ray-finned fish. However, this gene was not detected in mouse and rat, suggesting its loss in the lineage leading to rodent murids. Mkrn2 and mkrn4 are located in large ancient duplicated regions in tetrapod and fish genomes, suggesting the possible involvement of ancestral vertebrate-specific genome duplication in the formation of these genes. Intriguingly, many mkrn1 and mkrn2 intronless retrocopies have been detected in mammals but not in other vertebrates, most of them corresponding to pseudogenes. The nature and number of zinc fingers were found to be conserved in Mkrn1 and Mkrn2 but much more variable in Mkrn4, with lineage-specific differences. RT-qPCR analysis demonstrated a highly gonad-biased expression pattern for makorin genes in medaka and zebrafish (ray-finned fishes) and amphibians, but a strong relaxation of this specificity in birds and mammals. All three mkrn genes were maternally expressed before zygotic genome activation in both medaka and zebrafish early embryos. Conclusion Our analysis demonstrates that the makorin gene family has evolved through large-scale duplication and subsequent lineage-specific retroposition-mediated duplications in vertebrates. From the three major vertebrate mkrn genes, mkrn4 shows the highest evolutionary dynamics, with lineage-specific loss of zinc fingers and even complete

  9. Reconstruction of the ancestral marsupial karyotype from comparative gene maps

    PubMed Central

    2013-01-01

    Background The increasing number of assembled mammalian genomes makes it possible to compare genome organisation across mammalian lineages and reconstruct chromosomes of the ancestral marsupial and therian (marsupial and eutherian) mammals. However, the reconstruction of ancestral genomes requires genome assemblies to be anchored to chromosomes. The recently sequenced tammar wallaby (Macropus eugenii) genome was assembled into over 300,000 contigs. We previously devised an efficient strategy for mapping large evolutionarily conserved blocks in non-model mammals, and applied this to determine the arrangement of conserved blocks on all wallaby chromosomes, thereby permitting comparative maps to be constructed and resolve the long debated issue between a 2n = 14 and 2n = 22 ancestral marsupial karyotype. Results We identified large blocks of genes conserved between human and opossum, and mapped genes corresponding to the ends of these blocks by fluorescence in situ hybridization (FISH). A total of 242 genes was assigned to wallaby chromosomes in the present study, bringing the total number of genes mapped to 554 and making it the most densely cytogenetically mapped marsupial genome. We used these gene assignments to construct comparative maps between wallaby and opossum, which uncovered many intrachromosomal rearrangements, particularly for genes found on wallaby chromosomes X and 3. Expanding comparisons to include chicken and human permitted the putative ancestral marsupial (2n = 14) and therian mammal (2n = 19) karyotypes to be reconstructed. Conclusions Our physical mapping data for the tammar wallaby has uncovered the events shaping marsupial genomes and enabled us to predict the ancestral marsupial karyotype, supporting a 2n = 14 ancestor. Futhermore, our predicted therian ancestral karyotype has helped to understand the evolution of the ancestral eutherian genome. PMID:24261750

  10. Exploring Population Admixture Dynamics via Empirical and Simulated Genome-wide Distribution of Ancestral Chromosomal Segments

    PubMed Central

    Jin, Wenfei; Wang, Sijia; Wang, Haifeng; Jin, Li; Xu, Shuhua

    2012-01-01

    The processes of genetic admixture determine the haplotype structure and linkage disequilibrium patterns of the admixed population, which is important for medical and evolutionary studies. However, most previous studies do not consider the inherent complexity of admixture processes. Here we proposed two approaches to explore population admixture dynamics, and we demonstrated, by analyzing genome-wide empirical and simulated data, that the approach based on the distribution of chromosomal segments of distinct ancestry (CSDAs) was more powerful than that based on the distribution of individual ancestry proportions. Analysis of 1,890 African Americans showed that a continuous gene flow model, in which the African American population continuously received gene flow from European populations over about 14 generations, best explained the admixture dynamics of African Americans among several putative models. Interestingly, we observed that some African Americans had much more European ancestry than the simulated samples, indicating substructures of local ancestries in African Americans that could have been caused by individuals from some particular lineages having repeatedly admixed with people of European ancestry. In contrast, the admixture dynamics of Mexicans could be explained by a gradual admixture model in which the Mexican population continuously received gene flow from both European and Amerindian populations over about 24 generations. Our results also indicated that recent gene flows from Sub-Saharan Africans have contributed to the gene pool of Middle Eastern populations such as Mozabite, Bedouin, and Palestinian. In summary, this study not only provides approaches to explore population admixture dynamics, but also advances our understanding on population history of African Americans, Mexicans, and Middle Eastern populations. PMID:23103229

  11. Genome-wide association study and ancestral origins of the slick-hair coat in tropically adapted cattle

    PubMed Central

    Huson, Heather J.; Kim, Eui-Soo; Godfrey, Robert W.; Olson, Timothy A.; McClure, Matthew C.; Chase, Chad C.; Rizzi, Rita; O'Brien, Ana M. P.; Van Tassell, Curt P.; Garcia, José F.; Sonstegard, Tad S.

    2014-01-01

    The slick hair coat (SLICK) is a dominantly inherited trait typically associated with tropically adapted cattle that are from Criollo descent through Spanish colonization of cattle into the New World. The trait is of interest relative to climate change, due to its association with improved thermo-tolerance and subsequent increased productivity. Previous studies localized the SLICK locus to a 4 cM region on chromosome (BTA) 20 and identified signatures of selection in this region derived from Senepol cattle. The current study compares three slick-haired Criollo-derived breeds including Senepol, Carora, and Romosinuano and three additional slick-haired cross-bred lineages to non-slick ancestral breeds. Genome-wide association (GWA), haplotype analysis, signatures of selection, runs of homozygosity (ROH), and identity by state (IBS) calculations were used to identify a 0.8 Mb (37.7–38.5 Mb) consensus region for the SLICK locus on BTA20 in which contains SKP2 and SPEF2 as possible candidate genes. Three specific haplotype patterns are identified in slick individuals, all with zero frequency in non-slick individuals. Admixture analysis identified common genetic patterns between the three slick breeds at the SLICK locus. Principal component analysis (PCA) and admixture results show Senepol and Romosinuano sharing a higher degree of genetic similarity to one another with a much lesser degree of similarity to Carora. Variation in GWA, haplotype analysis, and IBS calculations with accompanying population structure information supports potentially two mutations, one common to Senepol and Romosinuano and another in Carora, effecting genes contained within our refined location for the SLICK locus. PMID:24808908

  12. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species.

    PubMed

    Huguet-Tapia, Jose C; Lefebure, Tristan; Badger, Jonathan H; Guan, Dongli; Pettis, Gregg S; Stanhope, Michael J; Loria, Rosemary

    2016-01-29

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer.

  13. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species

    PubMed Central

    Huguet-Tapia, Jose C.; Lefebure, Tristan; Badger, Jonathan H.; Guan, Dongli; Stanhope, Michael J.

    2016-01-01

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer. PMID:26826232

  14. Beyond editing to writing large genomes.

    PubMed

    Chari, Raj; Church, George M

    2017-08-30

    Recent exponential advances in genome sequencing and engineering technologies have enabled an unprecedented level of interrogation into the impact of DNA variation (genotype) on cellular function (phenotype). Furthermore, these advances have also prompted realistic discussion of writing and radically re-writing complex genomes. In this Perspective, we detail the motivation for large-scale engineering, discuss the progress made from such projects in bacteria and yeast and describe how various genome-engineering technologies will contribute to this effort. Finally, we describe the features of an ideal platform and provide a roadmap to facilitate the efficient writing of large genomes.

  15. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    PubMed Central

    Liu, George E; Matukumalli, Lakshmi K; Sonstegard, Tad S; Shade, Larry L; Van Tassell, Curtis P

    2006-01-01

    Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9) change/site/year) was approximately half of the overall rate (1.9–2.0 × 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies. PMID:16759380

  16. Ancestral Gene Flow and Parallel Organellar Genome Capture Result in Extreme Phylogenomic Discord in a Lineage of Angiosperms.

    PubMed

    Folk, Ryan A; Mandel, Jennifer R; Freudenstein, John V

    2016-09-16

    While hybridization has recently received a resurgence of attention from systematists and evolutionary biologists, there remains a dearth of case studies on ancient, diversified hybrid lineages-clades of organisms that originated through reticulation. Studies on these groups are valuable in that they would speak to the long-term phylogenetic success of lineages following gene flow between species. We present a phylogenomic view of Heuchera, long known for frequent hybridization, incorporating all three independent genomes: targeted nuclear (~400,000 bp), plastid (~160,000 bp), and mitochondrial (~470,000 bp) data. We analyze these data using multiple concatenation and coalescence strategies. The nuclear phylogeny is consistent with previous work and with morphology, confidently suggesting a monophyletic Heuchera By contrast, analyses of both organellar genomes recover a grossly polyphyletic Heuchera,consisting of three primary clades with relationships extensively rearranged within these as well. A minority of nuclear loci also exhibit phylogenetic discord; yet these topologies remarkably never resemble the pattern of organellar loci and largely present low levels of discord inter alia Two independent estimates of the coalescent branch length of the ancestor of Heuchera using nuclear data suggest rare or nonexistent incomplete lineage sorting with related clades, inconsistent with the observed gross polyphyly of organellar genomes (confirmed by simulation of gene trees under the coalescent). These observations, in combination with previous work, strongly suggest hybridization as the cause of this phylogenetic discord. [Ancient hybridization; chloroplast capture; incongruence; phylogenomics; reticulation.].

  17. Analysis of simple sequence repeat (SSR) structure and sequence within Epichloë endophyte genomes reveals impacts on gene structure and insights into ancestral hybridization events.

    PubMed

    Clayton, William; Eaton, Carla Jane; Dupont, Pierre-Yves; Gillanders, Tim; Cameron, Nick; Saikia, Sanjay; Scott, Barry

    2017-01-01

    Epichloë grass endophytes comprise a group of filamentous fungi of both sexual and asexual species. Known for the beneficial characteristics they endow upon their grass hosts, the identification of these endophyte species has been of great interest agronomically and scientifically. The use of simple sequence repeat loci and the variation in repeat elements has been used to rapidly identify endophyte species and strains, however, little is known of how the structure of repeat elements changes between species and strains, and where these repeat elements are located in the fungal genome. We report on an in-depth analysis of the structure and genomic location of the simple sequence repeat locus B10, commonly used for Epichloë endophyte species identification. The B10 repeat was found to be located within an exon of a putative bZIP transcription factor, suggesting possible impacts on polypeptide sequence and thus protein function. Analysis of this repeat in the asexual endophyte hybrid Epichloë uncinata revealed that the structure of B10 alleles reflects the ancestral species that hybridized to give rise to this species. Understanding the structure and sequence of these simple sequence repeats provides a useful set of tools for readily distinguishing strains and for gaining insights into the ancestral species that have undergone hybridization events.

  18. Anchoring genome sequence to chromosomes of the central bearded dragon (Pogona vitticeps) enables reconstruction of ancestral squamate macrochromosomes and identifies sequence content of the Z chromosome.

    PubMed

    Deakin, Janine E; Edwards, Melanie J; Patel, Hardip; O'Meally, Denis; Lian, Jinmin; Stenhouse, Rachael; Ryan, Sam; Livernois, Alexandra M; Azad, Bhumika; Holleley, Clare E; Li, Qiye; Georges, Arthur

    2016-06-10

    Squamates (lizards and snakes) are a speciose lineage of reptiles displaying considerable karyotypic diversity, particularly among lizards. Understanding the evolution of this diversity requires comparison of genome organisation between species. Although the genomes of several squamate species have now been sequenced, only the green anole lizard has any sequence anchored to chromosomes. There is only limited gene mapping data available for five other squamates. This makes it difficult to reconstruct the events that have led to extant squamate karyotypic diversity. The purpose of this study was to anchor the recently sequenced central bearded dragon (Pogona vitticeps) genome to chromosomes to trace the evolution of squamate chromosomes. Assigning sequence to sex chromosomes was of particular interest for identifying candidate sex determining genes. By using two different approaches to map conserved blocks of genes, we were able to anchor approximately 42 % of the dragon genome sequence to chromosomes. We constructed detailed comparative maps between dragon, anole and chicken genomes, and where possible, made broader comparisons across Squamata using cytogenetic mapping information for five other species. We show that squamate macrochromosomes are relatively well conserved between species, supporting findings from previous molecular cytogenetic studies. Macrochromosome diversity between members of the Toxicofera clade has been generated by intrachromosomal, and a small number of interchromosomal, rearrangements. We reconstructed the ancestral squamate macrochromosomes by drawing upon comparative cytogenetic mapping data from seven squamate species and propose the events leading to the arrangements observed in representative species. In addition, we assigned over 8 Mbp of sequence containing 219 genes to the Z chromosome, providing a list of genes to begin testing as candidate sex determining genes. Anchoring of the dragon genome has provided substantial insight into

  19. Ancestral genomic duplication of the insulin gene in tilapia: An analysis of possible implications for clinical islet xenotransplantation using donor islets from transgenic tilapia expressing a humanized insulin gene

    PubMed Central

    Hrytsenko, Olga; Pohajdak, Bill; Wright, James R.

    2016-01-01

    ABSTRACT Tilapia, a teleost fish, have multiple large anatomically discrete islets which are easy to harvest, and when transplanted into diabetic murine recipients, provide normoglycemia and mammalian-like glucose tolerance profiles. Tilapia insulin differs structurally from human insulin which could preclude their use as islet donors for xenotransplantation. Therefore, we produced transgenic tilapia with islets expressing a humanized insulin gene. It is now known that fish genomes may possess an ancestral duplication and so tilapia may have a second insulin gene. Therefore, we cloned, sequenced, and characterized the tilapia insulin 2 transcript and found that its expression is negligible in islets, is not islet-specific, and would not likely need to be silenced in our transgenic fish. PMID:27222321

  20. Evidence for common ancestral origin of a recurring BRCA1 genomic rearrangement identified in high-risk Hispanic families.

    PubMed

    Weitzel, Jeffrey N; Lagos, Veronica I; Herzog, Josef S; Judkins, Thaddeus; Hendrickson, Brant; Ho, Jason S; Ricker, Charité N; Lowstuter, Katrina J; Blazer, Kathleen R; Tomlinson, Gail; Scholl, Tom

    2007-08-01

    Large rearrangements account for 8% to 15% of deleterious BRCA mutations, although none have been characterized previously in individuals of Mexican ancestry. DNA from 106 Hispanic patients without an identifiable BRCA mutation by exonic sequence analysis was subjected to multiplexed quantitative differential PCR. One case of Native American and African American ancestry was identified via multiplex ligation-dependent probe amplification. Long-range PCR was used to confirm deletion events and to clone and sequence genomic breakpoints. Splicing patterns were derived by sequencing cDNA from reverse transcription-PCR of lymphoblastoid cell line RNA. Haplotype analysis was conducted for recurrent mutations. The same deletion of BRCA1 exons 9 through 12 was identified in five unrelated families. Long-range PCR and sequencing indicated a deletion event of 14.7 kb. A 3-primer PCR assay was designed based on the deletion breakpoints, identified within an AluSp element in intron 8 and an AluSx element in intron 12. Haplotype analysis confirmed common ancestry. Analysis of cDNA showed direct splicing of exons 8 to 13, resulting in a frameshift mutation and predicted truncation of the BRCA1 protein. We identified and characterized a novel large BRCA1 deletion in five unrelated families-four of Mexican ancestry and one of African and Native American ancestry, suggesting the possibility of founder effect of Amerindian or Mestizo origin. This BRCA1 rearrangement was detected in 3.8% (4 of 106) of BRCA sequence-negative Hispanic families. An assay for this mutation should be considered for sequence-negative high-risk Hispanic patients.

  1. Remnants of the Legume Ancestral Genome Preserved in Gene-Rich Regions: Insights from Lupinus angustifolius Physical, Genetic, and Comparative Mapping.

    PubMed

    Książkiewicz, Michał; Zielezinski, Andrzej; Wyrwa, Katarzyna; Szczepaniak, Anna; Rychel, Sandra; Karlowski, Wojciech; Wolko, Bogdan; Naganowska, Barbara

    The narrow-leafed lupin (Lupinus angustifolius) was recently considered as a legume reference species. Genetic resources have been developed, including a draft genome sequence, linkage maps, nuclear DNA libraries, and cytogenetic chromosome-specific landmarks. Here, we used a complex approach, involving DNA fingerprinting, sequencing, genetic mapping, and molecular cytogenetics, to localize and analyze L. angustifolius gene-rich regions (GRRs). A L. angustifolius genomic bacterial artificial chromosome (BAC) library was screened with short sequence repeat (SSR)-based probes. Selected BACs were fingerprinted and assembled into contigs. BAC-end sequence (BES) annotation allowed us to choose clones for sequencing, targeting GRRs. Additionally, BESs were aligned to the scaffolds of the genome sequence. The genetic map was supplemented with 35 BES-derived markers, distributed in 14 linkage groups and tagging 37 scaffolds. The identified GRRs had an average gene density of 19.6 genes/100 kb and physical-to-genetic distance ratios of 11 to 109 kb/cM. Physical and genetic mapping was supported by multi-BAC-fluorescence in situ hybridization (FISH), and five new linkage groups were assigned to the chromosomes. Syntenic links to the genome sequences of five legume species (Medicago truncatula, Glycine max, Lotus japonicus, Phaseolus vulgaris, and Cajanus cajan) were identified. The comparative mapping of the two largest lupin GRRs provides novel evidence for ancient duplications in all of the studied species. These regions are conserved among representatives of the main clades of Papilionoideae. Furthermore, despite the complex evolution of legumes, some segments of the nuclear genome were not substantially modified and retained their quasi-ancestral structures. Cytogenetic markers anchored in these regions constitute a platform for heterologous mapping of legume genomes.

  2. Genome-Wide Identification of the Mutation Underlying Fleece Variation and Discriminating Ancestral Hairy Species from Modern Woolly Sheep

    PubMed Central

    Cano, Margarita; Drouilhet, Laurence; Plisson-Petit, Florence; Bardou, Philippe; Fabre, Stéphane; Servin, Bertrand; Sarry, Julien; Woloszyn, Florent; Mulsant, Philippe; Foulquier, Didier; Carrière, Fabien; Aletru, Mathias; Rodde, Nathalie; Cauet, Stéphane; Bouchez, Olivier; Pirson, Maarten; Tosser-Klopp, Gwenola; Allain, Daniel

    2017-01-01

    Abstract The composition and structure of fleece variation observed in mammals is a consequence of a strong selective pressure for fiber production after domestication. In sheep, fleece variation discriminates ancestral species carrying a long and hairy fleece from modern domestic sheep (Ovis aries) owning a short and woolly fleece. Here, we report that the “woolly” allele results from the insertion of an antisense EIF2S2 retrogene (called asEIF2S2) into the 3′ UTR of the IRF2BP2 gene leading to an abnormal IRF2BP2 transcript. We provide evidence that this chimeric IRF2BP2/asEIF2S2 messenger 1) targets the genuine sense EIF2S2 RNA and 2) creates a long endogenous double-stranded RNA which alters the expression of both EIF2S2 and IRF2BP2 mRNA. This represents a unique example of a phenotype arising via a RNA-RNA hybrid, itself generated through a retroposition mechanism. Our results bring new insights on the sheep population history thanks to the identification of the molecular origin of an evolutionary phenotypic variation. PMID:28379502

  3. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth.

    PubMed

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-12-22

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker.

  4. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-01-01

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker. PMID:26689832

  5. The Large Genome Constraint Hypothesis: Evolution, Ecology and Phenotype

    PubMed Central

    KNIGHT, CHARLES A.; MOLINARI, NICOLE A.; PETROV, DMITRI A.

    2005-01-01

    • Background and Aims If large genomes are truly saturated with unnecessary ‘junk’ DNA, it would seem natural that there would be costs associated with accumulation and replication of this excess DNA. Here we examine the available evidence to support this hypothesis, which we term the ‘large genome constraint’. We examine the large genome constraint at three scales: evolution, ecology, and the plant phenotype. • Scope In evolution, we tested the hypothesis that plant lineages with large genomes are diversifying more slowly. We found that genera with large genomes are less likely to be highly specious – suggesting a large genome constraint on speciation. In ecology, we found that species with large genomes are under-represented in extreme environments – again suggesting a large genome constraint for the distribution and abundance of species. Ultimately, if these ecological and evolutionary constraints are real, the genome size effect must be expressed in the phenotype and confer selective disadvantages. Therefore, in phenotype, we review data on the physiological correlates of genome size, and present new analyses involving maximum photosynthetic rate and specific leaf area. Most notably, we found that species with large genomes have reduced maximum photosynthetic rates – again suggesting a large genome constraint on plant performance. Finally, we discuss whether these phenotypic correlations may help explain why species with large genomes are trimmed from the evolutionary tree and have restricted ecological distributions. • Conclusion Our review tentatively supports the large genome constraint hypothesis. PMID:15596465

  6. Genome-wide Association Study Identifies HLA 8.1 Ancestral Haplotype Alleles as Major Genetic Risk Factors for Myositis Phenotypes

    PubMed Central

    Miller, Frederick W.; Chen, Wei; O’Hanlon, Terrance P.; Cooper, Robert G.; Vencovsky, Jiri; Rider, Lisa G.; Danko, Katalin; Wedderburn, Lucy R.; Lundberg, Ingrid E.; Pachman, Lauren M.; Reed, Ann M.; Ytterberg, Steven R.; Padyukov, Leonid; Selva-O’Callaghan, Albert; Radstake, Timothy R.; Isenberg, David A.; Chinoy, Hector; Ollier, William E.R.; Scheet, Paul; Peng, Bo; Lee, Annette; Byun, Jinyoung; Lamb, Janine A.; Gregersen, Peter K.; Amos, Christopher I.

    2016-01-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis; 473 juvenile dermatomyositis; 532 polymyositis; and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P < 5 × 10−8) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1haplotype comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations. PMID:26291516

  7. Genome-wide association study identifies HLA 8.1 ancestral haplotype alleles as major genetic risk factors for myositis phenotypes.

    PubMed

    Miller, F W; Chen, W; O'Hanlon, T P; Cooper, R G; Vencovsky, J; Rider, L G; Danko, K; Wedderburn, L R; Lundberg, I E; Pachman, L M; Reed, A M; Ytterberg, S R; Padyukov, L; Selva-O'Callaghan, A; Radstake, T R; Isenberg, D A; Chinoy, H; Ollier, W E R; Scheet, P; Peng, B; Lee, A; Byun, J; Lamb, J A; Gregersen, P K; Amos, C I

    2015-10-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis, 473 juvenile dermatomyositis, 532 polymyositis and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl-tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P<5×10(-8)) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1 comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations.

  8. The diversity of class II transposable elements in mammalian genomes has arisen from ancestral phylogenetic splits during ancient waves of proliferation through the genome.

    PubMed

    Hellen, Elizabeth H B; Brookfield, John F Y

    2013-01-01

    DNA transposons make up 3% of the human genome, approximately the same percentage as genes. However, because of their inactivity, they are often ignored in favor of the more abundant, active, retroelements. Despite this relative ignominy, there are a number of interesting questions to be asked of these transposon families. One particular question relates to the timing of proliferation and inactivation of elements in a family. Does an ongoing process of turnover occur, or is the process more akin to a life cycle for the family, with elements proliferating rapidly before deactivation at a later date? We answer this question by tracing back to the most recent common ancestor (MRCA) of each modern transposon family, using two different methods. The first method identifies the MRCA of the species in which a family of transposon fossils can still be found, which we assume will have existed soon after the true origin date of the transposon family. The second method uses molecular dating techniques to predict the age of the MRCA element from which all elements found in a modern genome are descended. Independent data from five pairs of species are used in the molecular dating analysis: human-chimpanzee, human-orangutan, dog-panda, dog-cat, and cow-pig. Orthologous pairs of elements from host species pairs are included, and the divergence dates of these species are used to constrain the analysis. We discover that, in general, the times to element common ancestry for a given family are the same for the different species pairs, suggesting that there has been no order-specific process of turnover. Furthermore, for most families, the ages of the common ancestor of the host species and of that of the elements are similar, suggesting a life cycle model for the proliferation of transposons. Where these two ages differ, in families found only in Primates and Rodentia, for example, we find that the host species date is later than that of the common ancestor of the elements, implying

  9. The Diversity of Class II Transposable Elements in Mammalian Genomes Has Arisen from Ancestral Phylogenetic Splits during Ancient Waves of Proliferation through the Genome

    PubMed Central

    Hellen, Elizabeth H.B.; Brookfield, John F.Y.

    2013-01-01

    DNA transposons make up 3% of the human genome, approximately the same percentage as genes. However, because of their inactivity, they are often ignored in favor of the more abundant, active, retroelements. Despite this relative ignominy, there are a number of interesting questions to be asked of these transposon families. One particular question relates to the timing of proliferation and inactivation of elements in a family. Does an ongoing process of turnover occur, or is the process more akin to a life cycle for the family, with elements proliferating rapidly before deactivation at a later date? We answer this question by tracing back to the most recent common ancestor (MRCA) of each modern transposon family, using two different methods. The first method identifies the MRCA of the species in which a family of transposon fossils can still be found, which we assume will have existed soon after the true origin date of the transposon family. The second method uses molecular dating techniques to predict the age of the MRCA element from which all elements found in a modern genome are descended. Independent data from five pairs of species are used in the molecular dating analysis: human–chimpanzee, human–orangutan, dog–panda, dog–cat, and cow–pig. Orthologous pairs of elements from host species pairs are included, and the divergence dates of these species are used to constrain the analysis. We discover that, in general, the times to element common ancestry for a given family are the same for the different species pairs, suggesting that there has been no order-specific process of turnover. Furthermore, for most families, the ages of the common ancestor of the host species and of that of the elements are similar, suggesting a life cycle model for the proliferation of transposons. Where these two ages differ, in families found only in Primates and Rodentia, for example, we find that the host species date is later than that of the common ancestor of the elements

  10. Large insert environmental genomic library production.

    PubMed

    Taupp, Marcus; Lee, Sangwon; Hawley, Alyse; Yang, Jinshu; Hallam, Steven J

    2009-09-23

    The vast majority of microbes in nature currently remain inaccessible to traditional cultivation methods. Over the past decade, culture-independent environmental genomic (i.e. metagenomic) approaches have emerged, enabling researchers to bridge this cultivation gap by capturing the genetic content of indigenous microbial communities directly from the environment. To this end, genomic DNA libraries are constructed using standard albeit artful laboratory cloning techniques. Here we describe the construction of a large insert environmental genomic fosmid library with DNA derived from the vertical depth continuum of a seasonally hypoxic fjord. This protocol is directly linked to a series of connected protocols including coastal marine water sampling [1], large volume filtration of microbial biomass [2] and a DNA extraction and purification protocol [3]. At the outset, high quality genomic DNA is end-repaired with the creation of 5 -phosphorylated blunt ends. End-repaired DNA is subjected to pulsed-field gel electrophoresis (PFGE) for size selection and gel extraction is performed to recover DNA fragments between 30 and 60 thousand base pairs (Kb) in length. Size selected DNA is purified away from the PFGE gel matrix and ligated to the phosphatase-treated blunt-end fosmid CopyControl vector pCC1 (EPICENTRE http://www.epibio.com/item.asp?ID=385). Linear concatemers of pCC1 and insert DNA are subsequently headfull packaged into phage particles by lambda terminase, with subsequent infection of phage-resistant E. coli cells. Successfully transduced clones are recovered on LB agar plates under antibiotic selection and archived in 384-well plate format using an automated colony picking robot (Qpix2, GENETIX). The current protocol draws from various sources including the CopyControl Fosmid Library Production Kit from EPICENTRE and the published works of multiple research groups [4-7]. Each step is presented with best practice in mind. Whenever possible we highlight subtleties

  11. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic Stress Disorder Enters the Age of Large-Scale Genomic Collaboration

    PubMed Central

    Logue, Mark W; Amstadter, Ananda B; Baker, Dewleen G; Duncan, Laramie; Koenen, Karestan C; Liberzon, Israel; Miller, Mark W; Morey, Rajendra A; Nievergelt, Caroline M; Ressler, Kerry J; Smith, Alicia K; Smoller, Jordan W; Stein, Murray B; Sumner, Jennifer A; Uddin, Monica

    2015-01-01

    The development of posttraumatic stress disorder (PTSD) is influenced by genetic factors. Although there have been some replicated candidates, the identification of risk variants for PTSD has lagged behind genetic research of other psychiatric disorders such as schizophrenia, autism, and bipolar disorder. Psychiatric genetics has moved beyond examination of specific candidate genes in favor of the genome-wide association study (GWAS) strategy of very large numbers of samples, which allows for the discovery of previously unsuspected genes and molecular pathways. The successes of genetic studies of schizophrenia and bipolar disorder have been aided by the formation of a large-scale GWAS consortium: the Psychiatric Genomics Consortium (PGC). In contrast, only a handful of GWAS of PTSD have appeared in the literature to date. Here we describe the formation of a group dedicated to large-scale study of PTSD genetics: the PGC-PTSD. The PGC-PTSD faces challenges related to the contingency on trauma exposure and the large degree of ancestral genetic diversity within and across participating studies. Using the PGC analysis pipeline supplemented by analyses tailored to address these challenges, we anticipate that our first large-scale GWAS of PTSD will comprise over 10 000 cases and 30 000 trauma-exposed controls. Following in the footsteps of our PGC forerunners, this collaboration—of a scope that is unprecedented in the field of traumatic stress—will lead the search for replicable genetic associations and new insights into the biological underpinnings of PTSD. PMID:25904361

  12. Comparative genome maps of the pangolin, hedgehog, sloth, anteater and human revealed by cross-species chromosome painting: further insight into the ancestral karyotype and genome evolution of eutherian mammals.

    PubMed

    Yang, Fengtang; Graphodatsky, Alexander S; Li, Tangliang; Fu, Beiyuan; Dobigny, Gauthier; Wang, Jinghuan; Perelman, Polina L; Serdukova, Natalya A; Su, Weiting; O'Brien, Patricia Cm; Wang, Yingxiang; Ferguson-Smith, Malcolm A; Volobouev, Vitaly; Nie, Wenhui

    2006-01-01

    To better understand the evolution of genome organization of eutherian mammals, comparative maps based on chromosome painting have been constructed between human and representative species of three eutherian orders: Xenarthra, Pholidota, and Eulipotyphla, as well as between representative species of the Carnivora and Pholidota. These maps demonstrate the conservation of such syntenic segment associations as HSA3/21, 4/8, 7/16, 12/22, 14/15 and 16/19 in Eulipotyphla, Pholidota and Xenarthra and thus further consolidate the notion that they form part of the ancestral karyotype of the eutherian mammals. Our study has revealed many potential ancestral syntenic associations of human chromosomal segments that serve to link the families as well as orders within the major superordinial eutherian clades defined by molecular markers. The HSA2/8 and 7/10 associations could be the cytogenetic signatures that unite the Xenarthrans, while the HSA1/19p could be a putative signature that links the Afrotheria and Xenarthra. But caution is required in the interpretation of apparently shared syntenic associations as detailed analyses also show examples of apparent convergent evolution that differ in breakpoints and extent of the involved segments.

  13. Towards the delineation of the ancestral eutherian genome organization: comparative genome maps of human and the African elephant (Loxodonta africana) generated by chromosome painting.

    PubMed

    Frönicke, Lutz; Wienberg, Johannes; Stone, Gary; Adams, Lisa; Stanyon, Roscoe

    2003-07-07

    This study presents a whole-genome comparison of human and a representative of the Afrotherian clade, the African elephant, generated by reciprocal Zoo-FISH. An analysis of Afrotheria genomes is of special interest, because recent DNA sequence comparisons identify them as the oldest placental mammalian clade. Complete sets of whole-chromosome specific painting probes for the African elephant and human were constructed by degenerate oligonucleotide-primed PCR amplification of flow-sorted chromosomes. Comparative genome maps are presented based on their hybridization patterns. These maps show that the elephant has a moderately rearranged chromosome complement when compared to humans. The human paint probes identified 53 evolutionary conserved segments on the 27 autosomal elephant chromosomes and the X chromosome. Reciprocal experiments with elephant probes delineated 68 conserved segments in the human genome. The comparison with a recent aardvark and elephant Zoo-FISH study delineates new chromosomal traits which link the two Afrotherian species phylogenetically. In the absence of any morphological evidence the chromosome painting data offer the first non-DNA sequence support for an Afrotherian clade. The comparative human and elephant genome maps provide new insights into the karyotype organization of the proto-afrotherian, the ancestor of extant placental mammals, which most probably consisted of 2n=46 chromosomes.

  14. Speciation as a sieve for ancestral polymorphism.

    PubMed

    Guerrero, Rafael F; Hahn, Matthew W

    2017-08-09

    Because they are considered rare, balanced polymorphisms are often discounted as crucial constituents of genome-wide variation in sequence diversity. Despite its perceived rarity, however, long-term balancing selection can elevate genetic diversity and significantly affect observed divergence between species. Here, we discuss how ancestral balanced polymorphisms can be "sieved" by the speciation process, which sorts them unequally across descendant lineages. After speciation, ancestral balancing selection is revealed by genomic regions of high divergence between species. This signature, which resembles that of other evolutionary processes, can potentially confound genomic studies of population divergence and inferences of "islands of speciation." © 2017 John Wiley & Sons Ltd.

  15. Core-SINE blocks comprise a large fraction of monotreme genomes; implications for vertebrate chromosome evolution.

    PubMed

    Kirby, Patrick J; Greaves, Ian K; Koina, Edda; Waters, Paul D; Marshall Graves, Jennifer A

    2007-01-01

    The genomes of the egg-laying platypus and echidna are of particular interest because monotremes are the most basal mammal group. The chromosomal distribution of an ancient family of short interspersed repeats (SINEs), the core-SINEs, was investigated to better understand monotreme genome organization and evolution. Previous studies have identified the core-SINE as the predominant SINE in the platypus genome, and in this study we quantified, characterized and localized subfamilies. Dot blot analysis suggested that a very large fraction (32% of the platypus and 16% of the echidna genome) is composed of Mon core-SINEs. Core-SINE-specific primers were used to amplify PCR products from platypus and echidna genomic DNA. Sequence analysis suggests a common consensus sequence Mon 1-B, shared by platypus and echidna, as well as platypus-specific Mon 1-C and echidna specific Mon 1-D consensus sequences. FISH mapping of the Mon core-SINE products to platypus metaphase spreads demonstrates that the Mon-1C subfamily is responsible for the striking Mon core-SINE accumulation in the distal regions of the six large autosomal pairs and the largest X chromosome. This unusual distribution highlights the dichotomy between the seven large chromosome pairs and the 19 smaller pairs in the monotreme karyotype, which has some similarity to the macro- and micro-chromosomes of birds and reptiles, and suggests that accumulation of repetitive sequences may have enlarged small chromosomes in an ancestral vertebrate. In the forthcoming sequence of the platypus genome there are still large gaps, and the extensive Mon core-SINE accumulation on the distal regions of the six large autosomal pairs may provide one explanation for this missing sequence.

  16. Eukaryotic large nucleo-cytoplasmic DNA viruses: Clusters of orthologous genes and reconstruction of viral genome evolution

    PubMed Central

    2009-01-01

    Background The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes. Results A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions. Conclusions The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses. PMID:20017929

  17. Simplified DGS procedure for large-scale genome structural study.

    PubMed

    Jung, Yong-Chul; Xu, Jia; Chen, Jun; Kim, Yeong; Winchester, David; Wang, San Ming

    2009-11-01

    Ditag genome scanning (DGS) uses next-generation DNA sequencing to sequence the ends of ditag fragments produced by restriction enzymes. These sequences are compared to known genome sequences to determine their structure. In order to use DGS for large-scale genome structural studies, we have substantially revised the original protocol by replacing the in vivo genomic DNA cloning with in vitro adaptor ligation, eliminating the ditag concatemerization steps, and replacing the 454 sequencer with Solexa or SOLiD sequencers for ditag sequence collection. This revised protocol further increases genome coverage and resolution and allows DGS to be used to analyze multiple genomes simultaneously.

  18. A consensus map in cultivated hexaploid oat reveals conserved grass synteny with substantial sub-genome rearrangement

    USDA-ARS?s Scientific Manuscript database

    Hexaploid oat (Avena sativa, 2n = 6x = 42) is a member of the Poaceae family with a very large genome (~13 Gb) containing 21 chromosome pairs: seven from each of two similar ancestral diploids (A and D) and seven from a more diverged ancestral diploid (C). Physical rearrangements among ancestral oat...

  19. Precision Editing of Large Animal Genomes

    PubMed Central

    Tan, Wenfang (Spring); Carlson, Daniel F.; Walton, Mark W.; Fahrenkrug, Scott C.; Hackett, Perry B.

    2013-01-01

    Transgenic animals are an important source of protein and nutrition for most humans and will play key roles in satisfying the increasing demand for food in an ever-increasing world population. The past decade has experienced a revolution in the development of methods that permit the introduction of specific alterations to complex genomes. This precision will enhance genome-based improvement of farm animals for food production. Precision genetics also will enhance the development of therapeutic biomaterials and models of human disease as resources for the development of advanced patient therapies. PMID:23084873

  20. GDC 2: Compression of large collections of genomes

    PubMed Central

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  1. GDC 2: Compression of large collections of genomes.

    PubMed

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-06-25

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about.

  2. Identification of large-scale genomic variation in cancer genomes using in silico reference models

    PubMed Central

    Killcoyne, Sarah; del Sol, Antonio

    2016-01-01

    Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed. PMID:26264669

  3. Exon capture optimization in amphibians with large genomes.

    PubMed

    McCartney-Melstad, Evan; Mount, Genevieve G; Shaffer, H Bradley

    2016-09-01

    Gathering genomic-scale data efficiently is challenging for nonmodel species with large, complex genomes. Transcriptome sequencing is accessible for organisms with large genomes, and sequence capture probes can be designed from such mRNA sequences to enrich and sequence exonic regions. Maximizing enrichment efficiency is important to reduce sequencing costs, but relatively few data exist for exon capture experiments in nonmodel organisms with large genomes. Here, we conducted a replicated factorial experiment to explore the effects of several modifications to standard protocols that might increase sequence capture efficiency for amphibians and other taxa with large, complex genomes. Increasing the amounts of c0 t-1 repetitive sequence blocker and individual input DNA used in target enrichment reactions reduced the rates of PCR duplication. This reduction led to an increase in the percentage of unique reads mapping to target sequences, essentially doubling overall efficiency of the target capture from 10.4% to nearly 19.9% and rendering target capture experiments more efficient and affordable. Our results indicate that target capture protocols can be modified to efficiently screen vertebrates with large genomes, including amphibians. © 2016 John Wiley & Sons Ltd.

  4. The Mitochondrial Genomes of Nuttalliella namaqua (Ixodoidea: Nuttalliellidae) and Argas africolumbae (Ixodoidae: Argasidae): Estimation of Divergence Dates for the Major Tick Lineages and Reconstruction of Ancestral Blood-Feeding Characters

    PubMed Central

    Mans, Ben J.; de Klerk, Daniel; Pienaar, Ronel; de Castro, Minique H.; Latif, Abdalla A.

    2012-01-01

    Ixodida are composed of hard (Ixodidae), soft (Argasidae) and the monotypic Nuttalliellidae (Nuttalliella namaqua) tick families. Nuclear 18S rRNA analysis suggested that N. namaqua was the closest extant relative to the last common ancestral tick lineage. The mitochondrial genomes of N. namaqua and Argas africolumbae were determined using next generation sequencing and de novo assembly to investigate this further. The latter was included since previous estimates on the divergence times of argasids lacked data for this major genus. Mitochondrial gene order for both was identical to that of the Argasidae and Prostriata. Bayesian analysis of the COI, Cytb, ND1, ND2 and ND4 genes confirmed the monophyly of ticks, the basal position of N. namaqua to the other tick families and the accepted systematic relationships of the other tick genera. Molecular clock estimates were derived for the divergence of the major tick lineages and supported previous estimates on the origins of ticks in the Carboniferous. N. namaqua larvae fed successfully on lizards and mice in a prolonged manner similar to many argasids and all ixodids. Excess blood meal-derived water was secreted via the salivary glands, similar to ixodids. We propose that this prolonged larval feeding style eventually gave rise to the long feeding periods that typify the single larval, nymphal and adult stages of ixodid ticks and the associated secretion of water via the salivary glands. Ancestral reconstruction of characters involved in blood-feeding indicates that most of the characteristics unique to either hard or soft tick families were present in the ancestral tick lineage. PMID:23145176

  5. The mitochondrial genomes of Nuttalliella namaqua (Ixodoidea: Nuttalliellidae) and Argas africolumbae (Ixodoidae: Argasidae): estimation of divergence dates for the major tick lineages and reconstruction of ancestral blood-feeding characters.

    PubMed

    Mans, Ben J; de Klerk, Daniel; Pienaar, Ronel; de Castro, Minique H; Latif, Abdalla A

    2012-01-01

    Ixodida are composed of hard (Ixodidae), soft (Argasidae) and the monotypic Nuttalliellidae (Nuttalliella namaqua) tick families. Nuclear 18S rRNA analysis suggested that N. namaqua was the closest extant relative to the last common ancestral tick lineage. The mitochondrial genomes of N. namaqua and Argas africolumbae were determined using next generation sequencing and de novo assembly to investigate this further. The latter was included since previous estimates on the divergence times of argasids lacked data for this major genus. Mitochondrial gene order for both was identical to that of the Argasidae and Prostriata. Bayesian analysis of the COI, Cytb, ND1, ND2 and ND4 genes confirmed the monophyly of ticks, the basal position of N. namaqua to the other tick families and the accepted systematic relationships of the other tick genera. Molecular clock estimates were derived for the divergence of the major tick lineages and supported previous estimates on the origins of ticks in the Carboniferous. N. namaqua larvae fed successfully on lizards and mice in a prolonged manner similar to many argasids and all ixodids. Excess blood meal-derived water was secreted via the salivary glands, similar to ixodids. We propose that this prolonged larval feeding style eventually gave rise to the long feeding periods that typify the single larval, nymphal and adult stages of ixodid ticks and the associated secretion of water via the salivary glands. Ancestral reconstruction of characters involved in blood-feeding indicates that most of the characteristics unique to either hard or soft tick families were present in the ancestral tick lineage.

  6. Large-scale and global features of complex genomic signals

    NASA Astrophysics Data System (ADS)

    Cristea, Paul D.

    2003-10-01

    The paper briefly reviews the methodology of the symbolic nucleic sequence conversion into genomic signals and presents large scale and global features of the resulting genomic signals. Whole chromosomes or whole genomes are converted into complex signals and phase analysis is performed. The phase, cumulated phase and unwrapped phase of genomic signals are studied as tools for revealing important features of to the first and second order statistics of nucleotide distribution along DNA strands. It is shown that the unwrapped phase displays an almost linear variation along whole chromosomes. The property holds for all the investigated genomes, being shared by both prokaryotes and eukaryotes, while the magnitude and sign of the unwrapped phase slope is specific for each taxon and chromosome. The comparison between the behavior of the cumulated phase and of the unwrapped phase across the putative origins and termini of the replichores suggests a model of the 'patchy' structure of the chromosomes.

  7. A high-density SNP genotyping array for Brassica napus and its ancestral diploid species based on optimised selection of single-locus markers in the allotetraploid genome.

    PubMed

    Clarke, Wayne E; Higgins, Erin E; Plieske, Joerg; Wieseke, Ralf; Sidebottom, Christine; Khedikar, Yogendra; Batley, Jacqueline; Edwards, Dave; Meng, Jinling; Li, Ruiyuan; Lawley, Cynthia Taylor; Pauquet, Jérôme; Laga, Benjamin; Cheung, Wing; Iniguez-Luy, Federico; Dyrszka, Emmanuelle; Rae, Stephen; Stich, Benjamin; Snowdon, Rod J; Sharpe, Andrew G; Ganal, Martin W; Parkin, Isobel A P

    2016-10-01

    The Brassica napus Illumina array provides genome-wide markers linked to the available genome sequence, a significant tool for genetic analyses of the allotetraploid B. napus and its progenitor diploid genomes. A high-density single nucleotide polymorphism (SNP) Illumina Infinium array, containing 52,157 markers, was developed for the allotetraploid Brassica napus. A stringent selection process employing the short probe sequence for each SNP assay was used to limit the majority of the selected markers to those represented a minimum number of times across the highly replicated genome. As a result approximately 60 % of the SNP assays display genome-specificity, resolving as three clearly separated clusters (AA, AB, and BB) when tested with a diverse range of B. napus material. This genome specificity was supported by the analysis of the diploid ancestors of B. napus, whereby 26,504 and 29,720 markers were scorable in B. oleracea and B. rapa, respectively. Forty-four percent of the assayed loci on the array were genetically mapped in a single doubled-haploid B. napus population allowing alignment of their physical and genetic coordinates. Although strong conservation of the two positions was shown, at least 3 % of the loci were genetically mapped to a homoeologous position compared to their presumed physical position in the respective genome, underlying the importance of genetic corroboration of locus identity. In addition, the alignments identified multiple rearrangements between the diploid and tetraploid Brassica genomes. Although mostly attributed to genome assembly errors, some are likely evidence of rearrangements that occurred since the hybridisation of the progenitor genomes in the B. napus nucleus. Based on estimates for linkage disequilibrium decay, the array is a valuable tool for genetic fine mapping and genome-wide association studies in B. napus and its progenitor genomes.

  8. Ancestral hierarchy and conflict.

    PubMed

    Boehm, Christopher

    2012-05-18

    Ancestral Pan, the shared predecessor of humans, bonobos, and chimpanzees, lived in social dominance hierarchies that created conflict through individual and coalitional competition. This ancestor had male and female mediators, but individuals often reconciled independently. An evolutionary trajectory is traced from this ancestor to extant hunter-gatherers, whose coalitional behavior results in suppressed dominance and competition, except in mate competition. A territorial ancestral Pan would not have engaged in intensive warfare if we consider bonobo behavior, but modern human foragers have the potential for full-scale war. Although hunter-gatherers are able to resolve conflicts preemptively, they also use mechanisms, such as truces and peace pacts, to mitigate conflict when the costs become too high. Today, humans retain the genetic underpinnings of both conflict and conflict management; thus, we retain the potential for both war and peace.

  9. Territorial Polymers and Large Scale Genome Organization

    NASA Astrophysics Data System (ADS)

    Grosberg, Alexander

    2012-02-01

    Chromatin fiber in interphase nucleus represents effectively a very long polymer packed in a restricted volume. Although polymer models of chromatin organization were considered, most of them disregard the fact that DNA has to stay not too entangled in order to function properly. One polymer model with no entanglements is the melt of unknotted unconcatenated rings. Extensive simulations indicate that rings in the melt at large length (monomer numbers) N approach the compact state, with gyration radius scaling as N^1/3, suggesting every ring being compact and segregated from the surrounding rings. The segregation is consistent with the known phenomenon of chromosome territories. Surface exponent β (describing the number of contacts between neighboring rings scaling as N^β) appears only slightly below unity, β 0.95. This suggests that the loop factor (probability to meet for two monomers linear distance s apart) should decay as s^-γ, where γ= 2 - β is slightly above one. The later result is consistent with HiC data on real human interphase chromosomes, and does not contradict to the older FISH data. The dynamics of rings in the melt indicates that the motion of one ring remains subdiffusive on the time scale well above the stress relaxation time.

  10. Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

    PubMed

    Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

    2014-12-01

    Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits.

  11. Large-scale data mining pilot project in human genome

    SciTech Connect

    Musick, R.; Fidelis, R.; Slezak, T.

    1997-05-01

    This whitepaper briefly describes a new, aggressive effort in large- scale data Livermore National Labs. The implications of `large- scale` will be clarified Section. In the short term, this effort will focus on several @ssion-critical questions of Genome project. We will adapt current data mining techniques to the Genome domain, to quantify the accuracy of inference results, and lay the groundwork for a more extensive effort in large-scale data mining. A major aspect of the approach is that we will be fully-staffed data warehousing effort in the human Genome area. The long term goal is strong applications- oriented research program in large-@e data mining. The tools, skill set gained will be directly applicable to a wide spectrum of tasks involving a for large spatial and multidimensional data. This includes applications in ensuring non-proliferation, stockpile stewardship, enabling Global Ecology (Materials Database Industrial Ecology), advancing the Biosciences (Human Genome Project), and supporting data for others (Battlefield Management, Health Care).

  12. Large-scale investigation of genomic markers for severe periodontitis.

    PubMed

    Suzuki, Asami; Ji, Guijin; Numabe, Yukihiro; Ishii, Keisuke; Muramatsu, Masaaki; Kamoi, Kyuichi

    2004-09-01

    The purpose of the present study was to investigate the genomic markers for periodontitis, using large-scale single-nucleotide polymorphism (SNP) association studies comparing healthy volunteers and patients with periodontitis. Genomic DNA was obtained from 19 healthy volunteers and 22 patients with severe periodontitis, all of whom were Japanese. The subjects were genotyped at 637 SNPs in 244 genes on a large scale, using the TaqMan polymerase chain reaction (PCR) system. Statistically significant differences in allele and genotype frequencies were analyzed with Fisher's exact test. We found statistically significant differences (P < 0.01) between the healthy volunteers and patients with severe periodontitis in the following genes; gonadotropin-releasing hormone 1 (GNRH1), phosphatidylinositol 3-kinase regulatory 1 (PIK3R1), dipeptidylpeptidase 4 (DPP4), fibrinogen-like 2 (FGL2), and calcitonin receptor (CALCR). These results suggest that SNPs in the GNRH1, PIK3R1, DPP4, FGL2, and CALCR genes are genomic markers for severe periodontitis. Our findings indicate the necessity of analyzing SNPs in genes on a large scale (i.e., genome-wide approach), to identify genomic markers for periodontitis.

  13. EUPAN enables pan-genome studies of a large number of eukaryotic genomes.

    PubMed

    Hu, Zhiqiang; Sun, Chen; Lu, Kuang-Chen; Chu, Xixia; Zhao, Yue; Lu, Jinyuan; Shi, Jianxin; Wei, Chaochun

    2017-08-01

    Pan-genome analyses are routinely carried out for bacteria to interpret the within-species gene presence/absence variations (PAVs). However, pan-genome analyses are rare for eukaryotes due to the large sizes and higher complexities of their genomes. Here we proposed EUPAN, a eukaryotic pan-genome analysis toolkit, enabling automatic large-scale eukaryotic pan-genome analyses and detection of gene PAVs at a relatively low sequencing depth. In the previous studies, we demonstrated the effectiveness and high accuracy of EUPAN in the pan-genome analysis of 453 rice genomes, in which we also revealed widespread gene PAVs among individual rice genomes. Moreover, EUPAN can be directly applied to the current re-sequencing projects primarily focusing on single nucleotide polymorphisms. EUPAN is implemented in Perl, R and C ++. It is supported under Linux and preferred for a computer cluster with LSF and SLURM job scheduling system. EUPAN together with its standard operating procedure (SOP) is freely available for non-commercial use (CC BY-NC 4.0) at http://cgm.sjtu.edu.cn/eupan/index.html . ccwei@sjtu.edu.cn or jianxin.shi@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.

  14. Asymptotic Distributions of Coalescence Times and Ancestral Lineage Numbers for Populations with Temporally Varying Size

    PubMed Central

    Chen, Hua; Chen, Kun

    2013-01-01

    The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages n − An(t) follows a Poisson distribution, and as m → n, n(n−1)Tm/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference. PMID:23666939

  15. Asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size.

    PubMed

    Chen, Hua; Chen, Kun

    2013-07-01

    The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages n - An(t) follows a Poisson distribution, and as m → n, $$n\\left(n-1\\right){T}_{m}/2N\\left(0\\right)$$ follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference.

  16. The maize genome as a model for efficient sequence analysis of large plant genomes.

    PubMed

    Rabinowicz, Pablo D; Bennetzen, Jeffrey L

    2006-04-01

    The genomes of flowering plants vary in size from about 0.1 to over 100 gigabase pairs (Gbp), mostly because of polyploidy and variation in the abundance of repetitive elements in intergenic regions. High-quality sequences of the relatively small genomes of Arabidopsis (0.14 Gbp) and rice (0.4 Gbp) have now been largely completed. The sequencing of plant genomes that have a more representative size (the mean for flowering plant genomes is 5.6 Gbp) has been seen as a daunting task, partly because of their size and partly because of the numerous highly conserved repeats. Nevertheless, creative strategies and powerful new tools have been generated recently in the plant genetics community, so that sequencing large plant genomes is now a realistic possibility. Maize (2.4-2.7 Gbp) will be the first gigabase-size plant genome to be sequenced using these novel approaches. Pilot studies on maize indicate that the new gene-enrichment, gene-finishing and gene-orientation technologies are efficient, robust and comprehensive. These strategies will succeed in sequencing the gene-space of large genome plants, and in locating all of these genes and adjacent sequences on the genetic and physical maps.

  17. Types and rates of sequence evolution at the high-molecular-weight glutenin locus in hexaploid wheat and its ancestral genomes.

    PubMed

    Gu, Yong Qiang; Salse, Jérôme; Coleman-Derr, Devin; Dupin, Adeline; Crossman, Curt; Lazo, Gerard R; Huo, Naxin; Belcram, Harry; Ravel, Catherine; Charmet, Gilles; Charles, Mathieu; Anderson, Olin D; Chalhoub, Boulos

    2006-11-01

    The Glu-1 locus, encoding the high-molecular-weight glutenin protein subunits, controls bread-making quality in hexaploid wheat (Triticum aestivum) and represents a recently evolved region unique to Triticeae genomes. To understand the molecular evolution of this locus region, three orthologous Glu-1 regions from the three subgenomes of a single hexaploid wheat species were sequenced, totaling 729 kb of sequence. Comparing each Glu-1 region with its corresponding homologous region from the D genome of diploid wheat, Aegilops tauschii, and the A and B genomes of tetraploid wheat, Triticum turgidum, revealed that, in addition to the conservation of microsynteny in the genic regions, sequences in the intergenic regions, composed of blocks of nested retroelements, are also generally conserved, although a few nonshared retroelements that differentiate the homologous Glu-1 regions were detected in each pair of the A and D genomes. Analysis of the indel frequency and the rate of nucleotide substitution, which represent the most frequent types of sequence changes in the Glu-1 regions, demonstrated that the two A genomes are significantly more divergent than the two B genomes, further supporting the hypothesis that hexaploid wheat may have more than one tetraploid ancestor.

  18. Ancestral gene synteny reconstruction improves extant species scaffolding.

    PubMed

    Anselmetti, Yoann; Berry, Vincent; Chauve, Cedric; Chateau, Annie; Tannier, Eric; Bérard, Sèverine

    2015-01-01

    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes.

  19. Ancestral gene synteny reconstruction improves extant species scaffolding

    PubMed Central

    2015-01-01

    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes. PMID:26450761

  20. Towards defining the chloroviruses: a genomic journey through a genus of large DNA viruses

    PubMed Central

    2013-01-01

    Background Giant viruses in the genus Chlorovirus (family Phycodnaviridae) infect eukaryotic green microalgae. The prototype member of the genus, Paramecium bursaria chlorella virus 1, was sequenced more than 15 years ago, and to date there are only 6 fully sequenced chloroviruses in public databases. Presented here are the draft genome sequences of 35 additional chloroviruses (287 – 348 Kb/319 – 381 predicted protein encoding genes) collected across the globe; they infect one of three different green algal species. These new data allowed us to analyze the genomic landscape of 41 chloroviruses, which revealed some remarkable features about these viruses. Results Genome colinearity, nucleotide conservation and phylogenetic affinity were limited to chloroviruses infecting the same host, confirming the validity of the three previously known subgenera. Clues for the existence of a fourth new subgenus indicate that the boundaries of chlorovirus diversity are not completely determined. Comparison of the chlorovirus phylogeny with that of the algal hosts indicates that chloroviruses have changed hosts in their evolutionary history. Reconstruction of the ancestral genome suggests that the last common chlorovirus ancestor had a slightly more diverse protein repertoire than modern chloroviruses. However, more than half of the defined chlorovirus gene families have a potential recent origin (after Chlorovirus divergence), among which a portion shows compositional evidence for horizontal gene transfer. Only a few of the putative acquired proteins had close homologs in databases raising the question of the true donor organism(s). Phylogenomic analysis identified only seven proteins whose genes were potentially exchanged between the algal host and the chloroviruses. Conclusion The present evaluation of the genomic evolution pattern suggests that chloroviruses differ from that described in the related Poxviridae and Mimiviridae. Our study shows that the fixation of algal host

  1. Large-scale genomic analysis of ovarian carcinomas.

    PubMed

    Gorringe, Kylie L; Campbell, Ian G

    2009-04-01

    Epithelial ovarian cancers are typified by frequent genomic aberrations that have been difficult to unravel. Recently, high-resolution array technologies have provided the first glimpse of the remarkable complexity of these aberrations with some ovarian cancers containing hundreds of copy number breakpoints, micro-deletions and amplifications. Many of these alterations contain cancer-related genes suggesting that the majority is disease-associated and not just the product of random genomic instability. Future developments such as next-generation sequencing and integrated analysis of data from multiple array platforms on large numbers of samples are poised to revolutionize our understanding of this complex disease.

  2. Sequence Evidence in the Archaeal Genomes that tRNAs Emerged Through the Combination of Ancestral Genes as 5′ and 3′ tRNA Halves

    PubMed Central

    Fujishima, Kosuke; Sugahara, Junichi; Tomita, Masaru; Kanai, Akio

    2008-01-01

    The discovery of separate 5′ and 3′ halves of transfer RNA (tRNA) molecules—so-called split tRNA—in the archaeal parasite Nanoarchaeum equitans made us wonder whether ancestral tRNA was encoded on 1 or 2 genes. We performed a comprehensive phylogenetic analysis of tRNAs in 45 archaeal species to explore the relationship between the three types of tRNAs (nonintronic, intronic and split). We classified 1953 mature tRNA sequences into 22 clusters. All split tRNAs have shown phylogenetic relationships with other tRNAs possessing the same anticodon. We also mimicked split tRNA by artificially separating the tRNA sequences of 7 primitive archaeal species at the anticodon and analyzed the sequence similarity and diversity of the 5′ and 3′ tRNA halves. Network analysis revealed specific characteristics of and topological differences between the 5′ and 3′ tRNA halves: the 5′ half sequences were categorized into 6 distinct groups with a sequence similarity of >80%, while the 3′ half sequences were categorized into 9 groups with a higher sequence similarity of >88%, suggesting different evolutionary backgrounds of the 2 halves. Furthermore, the combinations of 5′ and 3′ halves corresponded with the variation of amino acids in the codon table. We found not only universally conserved combinations of 5′–3′ tRNA halves in tRNAiMet, tRNAThr, tRNAIle, tRNAGly, tRNAGln, tRNAGlu, tRNAAsp, tRNALys, tRNAArg and tRNALeu but also phylum-specific combinations in tRNAPro, tRNAAla, and tRNATrp. Our results support the idea that tRNA emerged through the combination of separate genes and explain the sequence diversity that arose during archaeal tRNA evolution. PMID:18286179

  3. Next-generation sequencing and large genome assemblies.

    PubMed

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-06-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

  4. Next-generation sequencing and large genome assemblies

    PubMed Central

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-01-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed. PMID:22676195

  5. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    SciTech Connect

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan; DiFazio, Steven P; Tuskan, Gerald A

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel and fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.

  6. Ancestral Relationships Using Metafounders: Finite Ancestral Populations and Across Population Relationships.

    PubMed

    Legarra, Andres; Christensen, Ole F; Vitezica, Zulma G; Aguilar, Ignacio; Misztal, Ignacy

    2015-06-01

    Recent use of genomic (marker-based) relationships shows that relationships exist within and across base population (breeds or lines). However, current treatment of pedigree relationships is unable to consider relationships within or across base populations, although such relationships must exist due to finite size of the ancestral population and connections between populations. This complicates the conciliation of both approaches and, in particular, combining pedigree with genomic relationships. We present a coherent theoretical framework to consider base population in pedigree relationships. We suggest a conceptual framework that considers each ancestral population as a finite-sized pool of gametes. This generates across-individual relationships and contrasts with the classical view which each population is considered as an infinite, unrelated pool. Several ancestral populations may be connected and therefore related. Each ancestral population can be represented as a "metafounder," a pseudo-individual included as founder of the pedigree and similar to an "unknown parent group." Metafounders have self- and across relationships according to a set of parameters, which measure ancestral relationships, i.e., homozygozities within populations and relationships across populations. These parameters can be estimated from existing pedigree and marker genotypes using maximum likelihood or a method based on summary statistics, for arbitrarily complex pedigrees. Equivalences of genetic variance and variance components between the classical and this new parameterization are shown. Segregation variance on crosses of populations is modeled. Efficient algorithms for computation of relationship matrices, their inverses, and inbreeding coefficients are presented. Use of metafounders leads to compatibility of genomic and pedigree relationship matrices and to simple computing algorithms. Examples and code are given. Copyright © 2015 by the Genetics Society of America.

  7. Ancestral Relationships Using Metafounders: Finite Ancestral Populations and Across Population Relationships

    PubMed Central

    Legarra, Andres; Christensen, Ole F.; Vitezica, Zulma G.; Aguilar, Ignacio; Misztal, Ignacy

    2015-01-01

    Recent use of genomic (marker-based) relationships shows that relationships exist within and across base population (breeds or lines). However, current treatment of pedigree relationships is unable to consider relationships within or across base populations, although such relationships must exist due to finite size of the ancestral population and connections between populations. This complicates the conciliation of both approaches and, in particular, combining pedigree with genomic relationships. We present a coherent theoretical framework to consider base population in pedigree relationships. We suggest a conceptual framework that considers each ancestral population as a finite-sized pool of gametes. This generates across-individual relationships and contrasts with the classical view which each population is considered as an infinite, unrelated pool. Several ancestral populations may be connected and therefore related. Each ancestral population can be represented as a “metafounder,” a pseudo-individual included as founder of the pedigree and similar to an “unknown parent group.” Metafounders have self- and across relationships according to a set of parameters, which measure ancestral relationships, i.e., homozygozities within populations and relationships across populations. These parameters can be estimated from existing pedigree and marker genotypes using maximum likelihood or a method based on summary statistics, for arbitrarily complex pedigrees. Equivalences of genetic variance and variance components between the classical and this new parameterization are shown. Segregation variance on crosses of populations is modeled. Efficient algorithms for computation of relationship matrices, their inverses, and inbreeding coefficients are presented. Use of metafounders leads to compatibility of genomic and pedigree relationship matrices and to simple computing algorithms. Examples and code are given. PMID:25873631

  8. Distinguishing Recent Admixture from Ancestral Population Structure

    PubMed Central

    Slatkin, Montgomery

    2017-01-01

    We develop and test two methods for distinguishing between recent admixture and ancestral population structure as explanations for greater similarity of one of two populations to an outgroup population. This problem arose when Neanderthals were found to be slightly more similar to nonAfrican than to African populations. The excess similarity is consistent with both recent admixture from Neanderthals into the ancestors of nonAfricans and subdivision in the ancestral population. Although later studies showed that there had been recent admixture, distinguishing between these two classes of models will be important in other situations, particularly when high-coverage genomes cannot be obtained for all populations. One of our two methods is based on the properties of the doubly conditioned frequency spectrum combined with the unconditional frequency spectrum. This method does not require a linkage map and can be used when there is relatively low coverage. The second method uses the extent of linkage disequilibrium among closely linked markers. PMID:28186554

  9. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    PubMed

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution.

  10. Ancestral polyploidy in seed plants and angiosperms.

    PubMed

    Jiao, Yuannian; Wickett, Norman J; Ayyampalayam, Saravanaraj; Chanderbali, André S; Landherr, Lena; Ralph, Paula E; Tomsho, Lynn P; Hu, Yi; Liang, Haiying; Soltis, Pamela S; Soltis, Douglas E; Clifton, Sandra W; Schlarbaum, Scott E; Schuster, Stephan C; Ma, Hong; Leebens-Mack, Jim; dePamphilis, Claude W

    2011-05-05

    Whole-genome duplication (WGD), or polyploidy, followed by gene loss and diploidization has long been recognized as an important evolutionary force in animals, fungi and other organisms, especially plants. The success of angiosperms has been attributed, in part, to innovations associated with gene or whole-genome duplications, but evidence for proposed ancient genome duplications pre-dating the divergence of monocots and eudicots remains equivocal in analyses of conserved gene order. Here we use comprehensive phylogenomic analyses of sequenced plant genomes and more than 12.6 million new expressed-sequence-tag sequences from phylogenetically pivotal lineages to elucidate two groups of ancient gene duplications-one in the common ancestor of extant seed plants and the other in the common ancestor of extant angiosperms. Gene duplication events were intensely concentrated around 319 and 192 million years ago, implicating two WGDs in ancestral lineages shortly before the diversification of extant seed plants and extant angiosperms, respectively. Significantly, these ancestral WGDs resulted in the diversification of regulatory genes important to seed and flower development, suggesting that they were involved in major innovations that ultimately contributed to the rise and eventual dominance of seed plants and angiosperms.

  11. Are palaeoscolecids ancestral ecdysozoans?

    PubMed

    Harvey, Thomas H P; Dong, Xiping; Donoghue, Philip C J

    2010-01-01

    The reconstruction of ancestors is a central aim of comparative anatomy and evolutionary developmental biology, not least in attempts to understand the relationship between developmental and organismal evolution. Inferences based on living taxa can and should be tested against the fossil record, which provides an independent and direct view onto historical character combinations. Here, we consider the nature of the last common ancestor of living ecdysozoans through a detailed analysis of palaeoscolecids, an early and extinct group of introvert-bearing worms that have been proposed to be ancestral ecdysozoans. In a review of palaeoscolecid anatomy, including newly resolved details of the internal and external cuticle structure, we identify specific characters shared with various living nematoid and scalidophoran worms, but not with panarthropods. Considered within a formal cladistic context, these characters provide most overall support for a stem-priapulid affinity, meaning that palaeoscolecids are far-removed from the ecdysozoan ancestor. We conclude that previous interpretations in which palaeoscolecids occupy a deeper position in the ecdysozoan tree lack particular morphological support and rely instead on a paucity of preserved characters. This bears out a more general point that fossil taxa may appear plesiomorphic merely because they preserve only plesiomorphies, rather than the mélange of primitive and derived characters anticipated of organisms properly allocated to a position deep within animal phylogeny.

  12. Genomic evidence for a large-Z effect.

    PubMed

    Ellegren, Hans

    2009-01-22

    The 'large-X effect' suggests that sex chromosomes play a disproportionate role in adaptive evolution. Theoretical work indicates that this effect may be most pronounced in genetic systems with female heterogamety under both good-genes and Fisher's runaway models of sexual selection (males ZZ, females ZW). Here, I use a comparative genomic approach (alignments of several thousands of chicken-zebra finch-human-mouse-opossum orthologues) to show that avian Z-linked genes are highly overrepresented among those bird-mammalian orthologues that show evidence of accelerated rate of functional evolution in birds relative to mammals; the data suggest a twofold excess of such genes on the Z chromosome. A reciprocal analysis of genes accelerated in mammals found no evidence for an excess of X-linkage. This would be compatible with theoretical expectations for differential selection on sex-linked genes under male and female heterogamety, although the power in this case was not sufficient to statistically show that 'large-Z' was more pronounced than 'large-X'. Accelerated Z-linked genes include a variety of functional categories and are characterized by higher non-synonymous to synonymous substitution rate ratios than both accelerated autosomal and non-accelerated genes. This points at a genomic 'large-Z effect', which is widespread and of general significance for adaptive divergence in birds.

  13. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    SciTech Connect

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  14. The Mitochondrial Genome of the Leaf-Cutter Ant Atta laevigata: A Mitogenome with a Large Number of Intergenic Spacers

    PubMed Central

    Rodovalho, Cynara de Melo; Lyra, Mariana Lúcio; Ferro, Milene; Bacci, Maurício

    2014-01-01

    In this paper we describe the nearly complete mitochondrial genome of the leaf-cutter ant Atta laevigata, assembled using transcriptomic libraries from Sanger and Illumina next generation sequencing (NGS), and PCR products. This mitogenome was found to be very large (18,729 bp), given the presence of 30 non-coding intergenic spacers (IGS) spanning 3,808 bp. A portion of the putative control region remained unsequenced. The gene content and organization correspond to that inferred for the ancestral pancrustacea, except for two tRNA gene rearrangements that have been described previously in other ants. The IGS were highly variable in length and dispersed through the mitogenome. This pattern was also found for the other hymenopterans in particular for the monophyletic Apocrita. These spacers with unknown function may be valuable for characterizing genome evolution and distinguishing closely related species and individuals. NGS provided better coverage than Sanger sequencing, especially for tRNA and ribosomal subunit genes, thus facilitating efforts to fill in sequence gaps. The results obtained showed that data from transcriptomic libraries contain valuable information for assembling mitogenomes. The present data also provide a source of molecular markers that will be very important for improving our understanding of genomic evolutionary processes and phylogenetic relationships among hymenopterans. PMID:24828084

  15. The mitochondrial genome of the leaf-cutter ant Atta laevigata: a mitogenome with a large number of intergenic spacers.

    PubMed

    Rodovalho, Cynara de Melo; Lyra, Mariana Lúcio; Ferro, Milene; Bacci, Maurício

    2014-01-01

    In this paper we describe the nearly complete mitochondrial genome of the leaf-cutter ant Atta laevigata, assembled using transcriptomic libraries from Sanger and Illumina next generation sequencing (NGS), and PCR products. This mitogenome was found to be very large (18,729 bp), given the presence of 30 non-coding intergenic spacers (IGS) spanning 3,808 bp. A portion of the putative control region remained unsequenced. The gene content and organization correspond to that inferred for the ancestral pancrustacea, except for two tRNA gene rearrangements that have been described previously in other ants. The IGS were highly variable in length and dispersed through the mitogenome. This pattern was also found for the other hymenopterans in particular for the monophyletic Apocrita. These spacers with unknown function may be valuable for characterizing genome evolution and distinguishing closely related species and individuals. NGS provided better coverage than Sanger sequencing, especially for tRNA and ribosomal subunit genes, thus facilitating efforts to fill in sequence gaps. The results obtained showed that data from transcriptomic libraries contain valuable information for assembling mitogenomes. The present data also provide a source of molecular markers that will be very important for improving our understanding of genomic evolutionary processes and phylogenetic relationships among hymenopterans.

  16. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    USDA-ARS?s Scientific Manuscript database

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  17. Trans-ethnic fine-mapping of genetic loci for body mass index in the diverse ancestral populations of the Population Architecture using Genomics and Epidemiology (PAGE) Study reveals evidence for multiple signals at established loci.

    PubMed

    Fernández-Rhodes, Lindsay; Gong, Jian; Haessler, Jeffrey; Franceschini, Nora; Graff, Mariaelisa; Nishimura, Katherine K; Wang, Yujie; Highland, Heather M; Yoneyama, Sachiko; Bush, William S; Goodloe, Robert; Ritchie, Marylyn D; Crawford, Dana; Gross, Myron; Fornage, Myriam; Buzkova, Petra; Tao, Ran; Isasi, Carmen; Avilés-Santa, Larissa; Daviglus, Martha; Mackey, Rachel H; Houston, Denise; Gu, C Charles; Ehret, Georg; Nguyen, Khanh-Dung H; Lewis, Cora E; Leppert, Mark; Irvin, Marguerite R; Lim, Unhee; Haiman, Christopher A; Le Marchand, Loic; Schumacher, Fredrick; Wilkens, Lynne; Lu, Yingchang; Bottinger, Erwin P; Loos, Ruth J L; Sheu, Wayne H-H; Guo, Xiuqing; Lee, Wen-Jane; Hai, Yang; Hung, Yi-Jen; Absher, Devin; Wu, I-Chien; Taylor, Kent D; Lee, I-Te; Liu, Yeheng; Wang, Tzung-Dau; Quertermous, Thomas; Juang, Jyh-Ming J; Rotter, Jerome I; Assimes, Themistocles; Hsiung, Chao A; Chen, Yii-Der Ida; Prentice, Ross; Kuller, Lewis H; Manson, JoAnn E; Kooperberg, Charles; Smokowski, Paul; Robinson, Whitney R; Gordon-Larsen, Penny; Li, Rongling; Hindorff, Lucia; Buyske, Steven; Matise, Tara C; Peters, Ulrike; North, Kari E

    2017-06-01

    Most body mass index (BMI) genetic loci have been identified in studies of primarily European ancestries. The effect of these loci in other racial/ethnic groups is less clear. Thus, we aimed to characterize the generalizability of 170 established BMI variants, or their proxies, to diverse US populations and trans-ethnically fine-map 36 BMI loci using a sample of >102,000 adults of African, Hispanic/Latino, Asian, European and American Indian/Alaskan Native descent from the Population Architecture using Genomics and Epidemiology Study. We performed linear regression of the natural log of BMI (18.5-70 kg/m(2)) on the additive single nucleotide polymorphisms (SNPs) at BMI loci on the MetaboChip (Illumina, Inc.), adjusting for age, sex, population stratification, study site, or relatedness. We then performed fixed-effect meta-analyses and a Bayesian trans-ethnic meta-analysis to empirically cluster by allele frequency differences. Finally, we approximated conditional and joint associations to test for the presence of secondary signals. We noted directional consistency with the previously reported risk alleles beyond what would have been expected by chance (binomial p < 0.05). Nearly, a quarter of the previously described BMI index SNPs and 29 of 36 densely-genotyped BMI loci on the MetaboChip replicated/generalized in trans-ethnic analyses. We observed multiple signals at nine loci, including the description of seven loci with novel multiple signals. This study supports the generalization of most common genetic loci to diverse ancestral populations and emphasizes the importance of dense multiethnic genomic data in refining the functional variation at genetic loci of interest and describing several loci with multiple underlying genetic variants.

  18. A large and diverse collection of bovine genome sequences from the Canadian Cattle Genome Project.

    PubMed

    Stothard, Paul; Liao, Xiaoping; Arantes, Adriano S; De Pauw, Mary; Coros, Colin; Plastow, Graham S; Sargolzaei, Mehdi; Crowley, John J; Basarab, John A; Schenkel, Flavio; Moore, Stephen; Miller, Stephen P

    2015-01-01

    The Canadian Cattle Genome Project is a large-scale international project that aims to develop genomics-based tools to enhance the efficiency and sustainability of beef and dairy production. Obtaining DNA sequence information is an important part of achieving this goal as it facilitates efforts to associate specific DNA differences with phenotypic variation. These associations can be used to guide breeding decisions and provide valuable insight into the molecular basis of traits. We describe a dataset of 379 whole-genome sequences, taken primarily from key historic Bos taurus animals, along with the analyses that were performed to assess data quality. The sequenced animals represent ten populations relevant to beef or dairy production. Animal information (name, breed, population), sequence data metrics (mapping rate, depth, concordance), and sequence repository identifiers (NCBI BioProject and BioSample IDs) are provided to enable others to access and exploit this sequence information. The large number of whole-genome sequences generated as a result of this project will contribute to ongoing work aiming to catalogue the variation that exists in cattle as well as efforts to improve traits through genotype-guided selection. Studies of gene function, population structure, and sequence evolution are also likely to benefit from the availability of this resource.

  19. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications

    PubMed Central

    2013-01-01

    Background Vertebrate color vision is dependent on four major color opsin subtypes: RH2 (green opsin), SWS1 (ultraviolet opsin), SWS2 (blue opsin), and LWS (red opsin). Together with the dim-light receptor rhodopsin (RH1), these form the family of vertebrate visual opsins. Vertebrate genomes contain many multi-membered gene families that can largely be explained by the two rounds of whole genome duplication (WGD) in the vertebrate ancestor (2R) followed by a third round in the teleost ancestor (3R). Related chromosome regions resulting from WGD or block duplications are said to form a paralogon. We describe here a paralogon containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage-gated calcium channels (CACNA1-L). Results Sequence-based phylogenies and analyses of conserved synteny show that the above-mentioned gene families, and many neighboring gene families, expanded in the early vertebrate WGDs. This allows us to deduce the following evolutionary scenario: The vertebrate ancestor had a chromosome containing the genes for two visual opsins, one GNAT, one GNAI, two OT/VP-Rs and one CACNA1-L gene. This chromosome was quadrupled in 2R. Subsequent gene losses resulted in a set of five visual opsin genes, three GNAT and GNAI genes, six OT/VP-R genes and four CACNA1-L genes. These regions were duplicated again in 3R resulting in additional teleost genes for some of the families. Major chromosomal rearrangements have taken place in the teleost genomes. By comparison with the corresponding chromosomal regions in the spotted gar, which diverged prior to 3R, we could time these rearrangements to post-3R. Conclusions We present an extensive analysis of the paralogon housing the visual opsin, GNAT and GNAI, OT/VP-R, and CACNA1-L gene families. The combined data imply that the early vertebrate WGD events contributed to the

  20. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications.

    PubMed

    Lagman, David; Ocampo Daza, Daniel; Widmark, Jenny; Abalo, Xesús M; Sundström, Görel; Larhammar, Dan

    2013-11-02

    Vertebrate color vision is dependent on four major color opsin subtypes: RH2 (green opsin), SWS1 (ultraviolet opsin), SWS2 (blue opsin), and LWS (red opsin). Together with the dim-light receptor rhodopsin (RH1), these form the family of vertebrate visual opsins. Vertebrate genomes contain many multi-membered gene families that can largely be explained by the two rounds of whole genome duplication (WGD) in the vertebrate ancestor (2R) followed by a third round in the teleost ancestor (3R). Related chromosome regions resulting from WGD or block duplications are said to form a paralogon. We describe here a paralogon containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage-gated calcium channels (CACNA1-L). Sequence-based phylogenies and analyses of conserved synteny show that the above-mentioned gene families, and many neighboring gene families, expanded in the early vertebrate WGDs. This allows us to deduce the following evolutionary scenario: The vertebrate ancestor had a chromosome containing the genes for two visual opsins, one GNAT, one GNAI, two OT/VP-Rs and one CACNA1-L gene. This chromosome was quadrupled in 2R. Subsequent gene losses resulted in a set of five visual opsin genes, three GNAT and GNAI genes, six OT/VP-R genes and four CACNA1-L genes. These regions were duplicated again in 3R resulting in additional teleost genes for some of the families. Major chromosomal rearrangements have taken place in the teleost genomes. By comparison with the corresponding chromosomal regions in the spotted gar, which diverged prior to 3R, we could time these rearrangements to post-3R. We present an extensive analysis of the paralogon housing the visual opsin, GNAT and GNAI, OT/VP-R, and CACNA1-L gene families. The combined data imply that the early vertebrate WGD events contributed to the evolution of vision and the

  1. The Ancestral Gene for Transcribed, Low-Copy Repeats in the Prader-Willi/Angleman Region Encodes a Large Protein Implicated in Protein Trafficking that is Deficient in Mice with Neuromuscular and

    SciTech Connect

    Ji, Y.

    1999-01-01

    Transcribed, low-copy repeat elements are associated with the breakpoint regions of common deletions in Prader-Willi and Angelman syndromes. We report here the identification of the ancestral gene ( HERC2 ) and a family of duplicated, truncated copies that comprise these low-copy repeats. This gene encodes a highly conserved giant protein, HERC2, that is distantly related to p532 (HERC1), a guanine nucleotide exchange factor (GEF) implicated in vesicular trafficking. The mouse genome contains a single Herc2 locus, located in the jdf2 (juvenile development and fertility-2) interval of chromosome 7C. We have identified single nucleotide splice junction mutations in Herc2 in three independent N-ethyl-N-nitrosourea-induced jdf2 mutant alleles, each leading to exon skipping with premature termination of translation and/or deletion of conserved amino acids. Therefore, mutations in Herc2 lead to the neuromuscular secretory vesicle and sperm acrosome defects, other developmental abnormalities and juvenile lethality of jdf2 mice. Combined, these findings suggest that HERC2 is an important gene encoding a GEF involved in protein trafficking and degradation pathways in the cell.

  2. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine

    PubMed Central

    Biesecker, Leslie G.; Mullikin, James C.; Facio, Flavia M.; Turner, Clesson; Cherukuri, Praveen F.; Blakesley, Robert W.; Bouffard, Gerard G.; Chines, Peter S.; Cruz, Pedro; Hansen, Nancy F.; Teer, Jamie K.; Maskeri, Baishali; Young, Alice C.; Manolio, Teri A.; Wilson, Alexander F.; Finkel, Toren; Hwang, Paul; Arai, Andrew; Remaley, Alan T.; Sachdev, Vandana; Shamburek, Robert; Cannon, Richard O.; Green, Eric D.

    2009-01-01

    ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicine, including the exploration of issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and archiving, analyzing, and displaying sequence data. In the initial phase of ClinSeq, we are enrolling roughly 1000 participants; the evaluation of each includes obtaining a detailed family and medical history, as well as a clinical evaluation. The participants are being consented broadly for research on many traits and for whole-genome sequencing. Initially, Sanger-based sequencing of 300–400 genes thought to be relevant to atherosclerosis is being performed, with the resulting data analyzed for rare, high-penetrance variants associated with specific clinical traits. The participants are also being consented to allow the contact of family members for additional studies of sequence variants to explore their potential association with specific phenotypes. Here, we present the general considerations in designing ClinSeq, preliminary results based on the generation of an initial 826 Mb of sequence data, the findings for several genes that serve as positive controls for the project, and our views about the potential implications of ClinSeq. The early experiences with ClinSeq illustrate how large-scale medical sequencing can be a practical, productive, and critical component of research in genomic medicine. PMID:19602640

  3. Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries

    SciTech Connect

    Branscomb, E.; Slezak, T.; Pae, R.; Carrano, A.V. ); Galas, D.; Waterman, M. )

    1990-01-01

    The authors present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. Such ordering projects involve a large investment of effort involving many repetitious experiments. Their primary purpose here is to provide methods of maximizing the efficiency of such efforts. To this end, they adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. The main advantages of this approach are that (1) it allows the relatively straightforward incorporation of the observed statistical properties of the data; (2) it permits the efficiency of a particular experimental method for detecting overlap to be quantitatively defined so that alternative experimental designs may be compared and optimized; and (3) it yields a direct estimate of the probability that any two library members overlap. This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called contigs.' These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome.

  4. The detection of large deletions or duplications in genomic DNA.

    PubMed

    Armour, J A L; Barton, D E; Cockburn, D J; Taylor, G R

    2002-11-01

    While methods for the detection of point mutations and small insertions or deletions in genomic DNA are well established, the detection of larger (>100 bp) genomic duplications or deletions can be more difficult. Most mutation scanning methods use PCR as a first step, but the subsequent analyses are usually qualitative rather than quantitative. Gene dosage methods based on PCR need to be quantitative (i.e., they should report molar quantities of starting material) or semi-quantitative (i.e., they should report gene dosage relative to an internal standard). Without some sort of quantitation, heterozygous deletions and duplications may be overlooked and therefore be under-ascertained. Gene dosage methods provide the additional benefit of reporting allele drop-out in the PCR. This could impact on SNP surveys, where large-scale genotyping may miss null alleles. Here we review recent developments in techniques for the detection of this type of mutation and compare their relative strengths and weaknesses. We emphasize that comprehensive mutation analysis should include scanning for large insertions and deletions and duplications. Copyright 2002 Wiley-Liss, Inc.

  5. Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly

    PubMed Central

    Altemose, Nicolas; Miga, Karen H.; Maggioni, Mauro; Willard, Huntington F.

    2014-01-01

    The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations. PMID:24831296

  6. Rapid construction of genome map for large yellow croaker (Larimichthys crocea) by the whole-genome mapping in BioNano Genomics Irys system.

    PubMed

    Xiao, Shijun; Li, Jiongtang; Ma, Fengshou; Fang, Lujing; Xu, Shuangbin; Chen, Wei; Wang, Zhi Yong

    2015-09-03

    Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome

  7. ProCARs: Progressive Reconstruction of Ancestral Gene Orders

    PubMed Central

    2015-01-01

    Background In the context of ancestral gene order reconstruction from extant genomes, there exist two main computational approaches: rearrangement-based, and homology-based methods. The rearrangement-based methods consist in minimizing a total rearrangement distance on the branches of a species tree. The homology-based methods consist in the detection of a set of potential ancestral contiguity features, followed by the assembling of these features into Contiguous Ancestral Regions (CARs). Results In this paper, we present a new homology-based method that uses a progressive approach for both the detection and the assembling of ancestral contiguity features into CARs. The method is based on detecting a set of potential ancestral adjacencies iteratively using the current set of CARs at each step, and constructing CARs progressively using a 2-phase assembling method. Conclusion We show the usefulness of the method through a reconstruction of the boreoeutherian ancestral gene order, and a comparison with three other homology-based methods: AnGeS, InferCARs and GapAdj. The program, written in Python, and the dataset used in this paper are available at http://bioinfo.lifl.fr/procars/. PMID:26040958

  8. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  9. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    PubMed Central

    Jayapal, Karthik P; Lian, Wei; Glod, Frank; Sherman, David H; Hu, Wei-Shou

    2007-01-01

    Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Results We identified five large S. coelicolor genomic islands (larger than 25 kb) and 18 smaller islets absent in S. lividans chromosome. Many of these regions show anomalous GC bias and codon usage patterns. Six of them are in close vicinity of tRNA genes while nine are flanked with near perfect repeat sequences indicating that these are probable recent evolutionary acquisitions into S. coelicolor. Embedded within these segments are at least four DNA methylases and two probable methyl-sensing restriction endonucleases. Comparison with S. coelicolor transcriptome and proteome data revealed that some of the missing genes are active during the course of growth and differentiation in S. coelicolor. In particular, a pair of methylmalonyl CoA mutase (mcm) genes involved in polyketide precursor biosynthesis, an acyl-CoA dehydrogenase implicated in timing of actinorhodin synthesis and bldB, a developmentally significant regulator whose mutation causes complete abrogation of antibiotic synthesis belong to this category. Conclusion Our findings provide tangible hints for elucidating the genetic basis of important phenotypic differences between these two streptomycetes. Importantly, absence of certain genes in S. lividans identified here could potentially explain the relative ease of DNA transformations and the conditional lack of actinorhodin synthesis in S. lividans. PMID:17623098

  10. Volume visualization of multiple alignment of large genomicDNA

    SciTech Connect

    Shah, Nameeta; Dillard, Scott E.; Weber, Gunther H.; Hamann, Bernd

    2005-07-25

    Genomes of hundreds of species have been sequenced to date, and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. As a result, tools using 1D representations are incapable of providing informatory overview for extremely large data sets. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We demonstrate our technique using multi-millions-basepair-long aligned DNA sequence data and compare it with traditional 1D line plots. The results show that our technique is superior in providing an overview of entire data sets. Our technique, coupled with 1D line plots, results in effective multi-resolution visualization of very large aligned sequence data sets.

  11. Genomic analysis of regulatory network dynamics reveals large topological changes

    NASA Astrophysics Data System (ADS)

    Luscombe, Nicholas M.; Madan Babu, M.; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A.; Gerstein, Mark

    2004-09-01

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here-particularly the large-scale topological changes and hub transience-will apply to other biological networks, including complex sub-systems in higher eukaryotes.

  12. Genomic analysis of regulatory network dynamics reveals large topological changes.

    PubMed

    Luscombe, Nicholas M; Babu, M Madan; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A; Gerstein, Mark

    2004-09-16

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here--particularly the large-scale topological changes and hub transience--will apply to other biological networks, including complex sub-systems in higher eukaryotes.

  13. CGCI Investigators Reveal Comprehensive Landscape of Diffuse Large B-Cell Lymphoma (DLBCL) Genomes | Office of Cancer Genomics

    Cancer.gov

    Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.  

  14. A practical approach to detect ancestral haplotypes in livestock populations.

    PubMed

    Sánchez-Molano, Enrique; Tsiokos, Dimitrios; Chatziplis, Dimitrios; Jorjani, Hossein; Degano, Lorenzo; Diaz, Clara; Rossoni, Attilio; Schwarzenbacher, Hermann; Seefried, Franz; Varona, Luis; Vicario, Daniele; Nicolazzi, Ezequiel L; Banos, Georgios

    2016-06-24

    The effects of different evolutionary forces are expected to lead to the conservation, over many generations, of particular genomic regions (haplotypes) due to the development of linkage disequilibrium (LD). The detection and identification of early (ancestral) haplotypes can be used to clarify the evolutionary dynamics of different populations as well as identify selection signatures and genomic regions of interest to be used both in conservation and breeding programs. The aims of this study were to develop a simple procedure to identify ancestral haplotypes segregating across several generations both within and between populations with genetic links based on whole-genome scanning. This procedure was tested with simulated and then applied to real data from different genotyped populations of Spanish, Fleckvieh, Simmental and Brown-Swiss cattle. The identification of ancestral haplotypes has shown coincident patterns of selection across different breeds, allowing the detection of common regions of interest on different bovine chromosomes and mirroring the evolutionary dynamics of the studied populations. These regions, mainly located on chromosomes BTA5, BTA6, BTA7 and BTA21 are related with certain animal traits such as coat colour and milk protein and fat content. In agreement with previous studies, the detection of ancestral haplotypes provides useful information for the development and comparison of breeding and conservation programs both through the identification of selection signatures and other regions of interest, and as indicator of the general genetic status of the populations.

  15. Large-scale parallel genome assembler over cloud computing environment.

    PubMed

    Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

    2017-06-01

    The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

  16. Ancestral vertebrate complexity of the opioid system.

    PubMed

    Larhammar, Dan; Bergqvist, Christina; Sundström, Görel

    2015-01-01

    The evolution of the opioid peptides and nociceptin/orphanin as well as their receptors has been difficult to resolve due to variable evolutionary rates. By combining sequence comparisons with information on the chromosomal locations of the genes, we have deduced the following evolutionary scenario: The vertebrate predecessor had one opioid precursor gene and one receptor gene. The two genome doublings before the vertebrate radiation resulted in three peptide precursor genes whereupon a fourth copy arose by a local gene duplication. These four precursors diverged to become the prepropeptides for endorphin (POMC), enkephalins, dynorphins, and nociceptin, respectively. The ancestral receptor gene was quadrupled in the genome doublings leading to delta, kappa, and mu and the nociceptin/orphanin receptor. This scenario is corroborated by new data presented here for coelacanth and spotted gar, representing two basal branches in the vertebrate tree. A third genome doubling in the ancestor of teleost fishes generated additional gene copies. These results show that the opioid system was quite complex already in the first vertebrates and that it has more components in teleost fishes than in mammals. From an evolutionary point of view, nociceptin and its receptor can be considered full-fledged members of the opioid system.

  17. Antarctic krill population genomics: apparent panmixia, but genome complexity and large population size muddy the water.

    PubMed

    Deagle, Bruce E; Faux, Cassandra; Kawaguchi, So; Meyer, Bettina; Jarman, Simon N

    2015-10-01

    Antarctic krill (Euphausia superba; hereafter krill) are an incredibly abundant pelagic crustacean which has a wide, but patchy, distribution in the Southern Ocean. Several studies have examined the potential for population genetic structuring in krill, but DNA-based analyses have focused on a limited number of markers and have covered only part of their circum-Antarctic range. We used mitochondrial DNA and restriction site-associated DNA sequencing (RAD-seq) to investigate genetic differences between krill from five sites, including two from East Antarctica. Our mtDNA results show no discernible genetic structuring between sites separated by thousands of kilometres, which is consistent with previous studies. Using standard RAD-seq methodology, we obtained over a billion sequences from >140 krill, and thousands of variable nucleotides were identified at hundreds of loci. However, downstream analysis found that markers with sufficient coverage were primarily from multicopy genomic regions. Careful examination of these data highlights the complexity of the RAD-seq approach in organisms with very large genomes. To characterize the multicopy markers, we recorded sequence counts from variable nucleotide sites rather than the derived genotypes; we also examined a small number of manually curated genotypes. Although these analyses effectively fingerprinted individuals, and uncovered a minor laboratory batch effect, no population structuring was observed. Overall, our results are consistent with panmixia of krill throughout their distribution. This result may indicate ongoing gene flow. However, krill's enormous population size creates substantial panmictic inertia, so genetic differentiation may not occur on an ecologically relevant timescale even if demographically separate populations exist. © 2015 John Wiley & Sons Ltd.

  18. An unexpectedly large and loosely packed mitochondrial genome in the charophycean green alga Chlorokybus atmophyticus

    PubMed Central

    Turmel, Monique; Otis, Christian; Lemieux, Claude

    2007-01-01

    Background The Streptophyta comprises all land plants and six groups of charophycean green algae. The scaly biflagellate Mesostigma viride (Mesostigmatales) and the sarcinoid Chlorokybus atmophyticus (Chlorokybales) represent the earliest diverging lineages of this phylum. In trees based on chloroplast genome data, these two charophycean green algae are nested in the same clade. To validate this relationship and gain insight into the ancestral state of the mitochondrial genome in the Charophyceae, we sequenced the mitochondrial DNA (mtDNA) of Chlorokybus and compared this genome sequence with those of three other charophycean green algae and the bryophytes Marchantia polymorpha and Physcomitrella patens. Results The Chlorokybus genome differs radically from its 42,424-bp Mesostigma counterpart in size, gene order, intron content and density of repeated elements. At 201,763-bp, it is the largest mtDNA yet reported for a green alga. The 70 conserved genes represent 41.4% of the genome sequence and include nad10 and trnL(gag), two genes reported for the first time in a streptophyte mtDNA. At the gene order level, the Chlorokybus genome shares with its Chara, Chaetosphaeridium and bryophyte homologues eight to ten gene clusters including about 20 genes. Notably, some of these clusters exhibit gene linkages not previously found outside the Streptophyta, suggesting that they originated early during streptophyte evolution. In addition to six group I and 14 group II introns, short repeated sequences accounting for 7.5% of the genome were identified. Mitochondrial trees were unable to resolve the correct position of Mesostigma, due to analytical problems arising from accelerated sequence evolution in this lineage. Conclusion The Chlorokybus and Mesostigma mtDNAs exemplify the marked fluidity of the mitochondrial genome in charophycean green algae. The notion that the mitochondrial genome was constrained to remain compact during charophycean evolution is no longer tenable

  19. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  20. Genomic evidence for large, long-lived ancestors to placental mammals.

    PubMed

    Romiguier, J; Ranwez, V; Douzery, E J P; Galtier, N

    2013-01-01

    It is widely assumed that our mammalian ancestors, which lived in the Cretaceous era, were tiny animals that survived massive asteroid impacts in shelters and evolved into modern forms after dinosaurs went extinct, 65 Ma. The small size of most Mesozoic mammalian fossils essentially supports this view. Paleontology, however, is not conclusive regarding the ancestry of extant mammals, because Cretaceous and Paleocene fossils are not easily linked to modern lineages. Here, we use full-genome data to estimate the longevity and body mass of early placental mammals. Analyzing 36 fully sequenced mammalian genomes, we reconstruct two aspects of the ancestral genome dynamics, namely GC-content evolution and nonsynonymous over synonymous rate ratio. Linking these molecular evolutionary processes to life-history traits in modern species, we estimate that early placental mammals had a life span above 25 years and a body mass above 1 kg. This is similar to current primates, cetartiodactyls, or carnivores, but markedly different from mice or shrews, challenging the dominant view about mammalian origin and evolution. Our results imply that long-lived mammals existed in the Cretaceous era and were the most successful in evolution, opening new perspectives about the conditions for survival to the Cretaceous-Tertiary crisis.

  1. Fast and sensitive multiple alignment of large genomic sequences

    PubMed Central

    Brudno, Michael; Chapman, Michael; Göttgens, Berthold; Batzoglou, Serafim; Morgenstern, Burkhard

    2003-01-01

    Background Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. Results Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. Conclusion We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues. PMID:14693042

  2. What was the ancestral sex-determining mechanism in amniote vertebrates?

    PubMed

    Johnson Pokorná, Martina; Kratochvíl, Lukáš

    2016-02-01

    Amniote vertebrates, the group consisting of mammals and reptiles including birds, possess various mechanisms of sex determination. Under environmental sex determination (ESD), the sex of individuals depends on the environmental conditions occurring during their development and therefore there are no sexual differences present in their genotypes. Alternatively, through the mode of genotypic sex determination (GSD), sex is determined by a sex-specific genotype, i.e. by the combination of sex chromosomes at various stages of differentiation at conception. As well as influencing sex determination, sex-specific parts of genomes may, and often do, develop specific reproductive or ecological roles in their bearers. Accordingly, an individual with a mismatch between phenotypic (gonadal) and genotypic sex, for example an individual sex-reversed by environmental effects, should have a lower fitness due to the lack of specialized, sex-specific parts of their genome. In this case, evolutionary transitions from GSD to ESD should be less likely than transitions in the opposite direction. This prediction contrasts with the view that GSD was the ancestral sex-determining mechanism for amniote vertebrates. Ancestral GSD would require several transitions from GSD to ESD associated with an independent dedifferentiation of sex chromosomes, at least in the ancestors of crocodiles, turtles, and lepidosaurs (tuataras and squamate reptiles). In this review, we argue that the alternative theory postulating ESD as ancestral in amniotes is more parsimonious and is largely concordant with the theoretical expectations and current knowledge of the phylogenetic distribution and homology of sex-determining mechanisms. © 2014 Cambridge Philosophical Society.

  3. Targeted Large-Scale Deletion of Bacterial Genomes Using CRISPR-Nickases.

    PubMed

    Standage-Beier, Kylie; Zhang, Qi; Wang, Xiao

    2015-11-20

    Programmable CRISPR-Cas systems have augmented our ability to produce precise genome manipulations. Here we demonstrate and characterize the ability of CRISPR-Cas derived nickases to direct targeted recombination of both small and large genomic regions flanked by repetitive elements in Escherichia coli. While CRISPR directed double-stranded DNA breaks are highly lethal in many bacteria, we show that CRISPR-guided nickase systems can be programmed to make precise, nonlethal, single-stranded incisions in targeted genomic regions. This induces recombination events and leads to targeted deletion. We demonstrate that dual-targeted nicking enables deletion of 36 and 97 Kb of the genome. Furthermore, multiplex targeting enables deletion of 133 Kb, accounting for approximately 3% of the entire E. coli genome. This technology provides a framework for methods to manipulate bacterial genomes using CRISPR-nickase systems. We envision this system working synergistically with preexisting bacterial genome engineering methods.

  4. The common ancestral core of vertebrate and fungal telomerase RNAs

    PubMed Central

    Qi, Xiaodong; Li, Yang; Honda, Shinji; Hoffmann, Steve; Marz, Manja; Mosig, Axel; Podlevsky, Joshua D.; Stadler, Peter F.; Selker, Eric U.; Chen, Julian J.-L.

    2013-01-01

    Telomerase is a ribonucleoprotein with an intrinsic telomerase RNA (TER) component. Within yeasts, TER is remarkably large and presents little similarity in secondary structure to vertebrate or ciliate TERs. To better understand the evolution of fungal telomerase, we identified 74 TERs from Pezizomycotina and Taphrinomycotina subphyla, sister clades to budding yeasts. We initially identified TER from Neurospora crassa using a novel deep-sequencing–based approach, and homologous TER sequences from available fungal genome databases by computational searches. Remarkably, TERs from these non-yeast fungi have many attributes in common with vertebrate TERs. Comparative phylogenetic analysis of highly conserved regions within Pezizomycotina TERs revealed two core domains nearly identical in secondary structure to the pseudoknot and CR4/5 within vertebrate TERs. We then analyzed N. crassa and Schizosaccharomyces pombe telomerase reconstituted in vitro, and showed that the two RNA core domains in both systems can reconstitute activity in trans as two separate RNA fragments. Furthermore, the primer-extension pulse-chase analysis affirmed that the reconstituted N. crassa telomerase synthesizes TTAGGG repeats with high processivity, a common attribute of vertebrate telomerase. Overall, this study reveals the common ancestral cores of vertebrate and fungal TERs, and provides insights into the molecular evolution of fungal TER structure and function. PMID:23093598

  5. Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing

    PubMed Central

    2013-01-01

    Background Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses. Results Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly. Conclusions We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes. PMID:23590730

  6. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    PubMed Central

    2011-01-01

    Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was

  7. Decoding Synteny Blocks and Large-Scale Duplications in Mammalian and Plant Genomes

    NASA Astrophysics Data System (ADS)

    Peng, Qian; Alekseyev, Max A.; Tesler, Glenn; Pevzner, Pavel A.

    The existing synteny block reconstruction algorithms use anchors (e.g., orthologous genes) shared over all genomes to construct the synteny blocks for multiple genomes. This approach, while efficient for a few genomes, cannot be scaled to address the need to construct synteny blocks in many mammalian genomes that are currently being sequenced. The problem is that the number of anchors shared among all genomes quickly decreases with the increase in the number of genomes. Another problem is that many genomes (plant genomes in particular) had extensive duplications, which makes decoding of genomic architecture and rearrangement analysis in plants difficult. The existing synteny block generation algorithms in plants do not address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolution history of duplications. We present a new algorithm based on the A-Bruijn graph framework that overcomes these difficulties and provides a unified approach to synteny block reconstruction for multiple genomes, and for genomes with large duplications.

  8. Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.

  9. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    PubMed

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.

  10. GEnomes Management Application (GEM.app): A new software tool for large-scale collaborative genome analysis

    PubMed Central

    Gonzalez, Michael A.; Acosta Lebrigio, Rafael F.; Van Booven, Derek; Ulloa, Rick H.; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schule, Rebecca; Zuchner, Stephan

    2015-01-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ~1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for non-bioinformaticians to make NGS data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 seconds across ~1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. PMID:23463597

  11. Small genomes and large seeds: chromosome numbers, genome size and seed mass in diploid Aesculus species (Sapindaceae).

    PubMed

    Krahulcová, Anna; Trávnícek, Pavel; Krahulec, František; Rejmánek, Marcel

    2017-04-01

    Aesculus L. (horse chestnut, buckeye) is a genus of 12-19 extant woody species native to the temperate Northern Hemisphere. This genus is known for unusually large seeds among angiosperms. While chromosome counts are available for many Aesculus species, only one has had its genome size measured. The aim of this study is to provide more genome size data and analyse the relationship between genome size and seed mass in this genus. Chromosome numbers in root tip cuttings were confirmed for four species and reported for the first time for three additional species. Flow cytometric measurements of 2C nuclear DNA values were conducted on eight species, and mean seed mass values were estimated for the same taxa. The same chromosome number, 2 n = 40, was determined in all investigated taxa. Original measurements of 2C values for seven Aesculus species (eight taxa), added to just one reliable datum for A. hippocastanum , confirmed the notion that the genome size in this genus with relatively large seeds is surprisingly low, ranging from 0·955 pg 2C -1 in A. parviflora to 1·275 pg 2C -1 in A. glabra var. glabra. The chromosome number of 2 n = 40 seems to be conclusively the universal 2 n number for non-hybrid species in this genus. Aesculus genome sizes are relatively small, not only within its own family, Sapindaceae, but also within woody angiosperms. The genome sizes seem to be distinct and non-overlapping among the four major Aesculus clades. These results provide an extra support for the most recent reconstruction of Aesculus phylogeny. The correlation between the 2C values and seed masses in examined Aesculus species is slightly negative and not significant. However, when the four major clades are treated separately, there is consistent positive association between larger genome size and larger seed mass within individual lineages.

  12. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment.

    PubMed

    Kim, Jonghwan; Bhinge, Akshay A; Morgan, Xochitl C; Iyer, Vishwanath R

    2005-01-01

    Identifying the chromosomal targets of transcription factors is important for reconstructing the transcriptional regulatory networks underlying global gene expression programs. We have developed an unbiased genomic method called sequence tag analysis of genomic enrichment (STAGE) to identify the direct binding targets of transcription factors in vivo. STAGE is based on high-throughput sequencing of concatemerized tags derived from target DNA enriched by chromatin immunoprecipitation. We first used STAGE in yeast to confirm that RNA polymerase III genes are the most prominent targets of the TATA-box binding protein. We optimized the STAGE protocol and developed analysis methods to allow the identification of transcription factor targets in human cells. We used STAGE to identify several previously unknown binding targets of human transcription factor E2F4 that we independently validated by promoter-specific PCR and microarray hybridization. STAGE provides a means of identifying the chromosomal targets of DNA-associated proteins in any sequenced genome.

  13. Estimation of ancestral inbreeding effects on stillbirth, calving ease and birthweight in German Holstein dairy cattle.

    PubMed

    Hinrichs, D; Bennewitz, J; Wellmann, R; Thaller, G

    2015-02-01

    In this study, the effect of different measurements of ancestral inbreeding on birthweight, calving ease and stillbirth were analysed. Three models were used to estimate the effect of ancestral inbreeding, and the estimated regression coefficient of phenotypic data on different measurements of ancestral inbreeding was used to quantify the effect of ancestral inbreeding. The first model included only one measurement of inbreeding, whereas the second model included the classical inbreeding coefficients and one alternative inbreeding coefficient. The third model included the classical inbreeding coefficients, the interaction between classical inbreeding and ancestral inbreeding, and the classical inbreeding coefficients of the dam. Phenotypic data for this study were collected from February 1998 to December 2008 on three large commercial milk farms. During this time, 36,477 calving events were recorded. All calves were weighed after birth, and 8.08% of the calves died within 48 h after calving. Calving ease was recorded on a scale between 1 and 4 (1 = easy birth, 4 = surgery), and 69.95, 20.91, 8.92 and 0.21% of the calvings were scored with 1, 2, 3 and 4, respectively. The average inbreeding coefficient of inbred animals was 0.03, and average ancestral inbreeding coefficients were 0.08 and 0.01, depending on how ancestral inbreeding was calculated. Approximately 26% of classically non-inbred animals showed ancestral inbreeding. Correlations between different inbreeding coefficients ranged between 0.46 and 0.99. No significant effect of ancestral inbreeding was found for calving ease, because the number of animals with reasonable high level of ancestral inbreeding was too low. Significant effects of ancestral inbreeding were estimated for birthweight and stillbirth. Unfavourable effects of ancestral inbreeding were observed for birthweight. However, favourable purging effects were estimated for stillbirth, indicating that purging could be partly beneficial for genetic

  14. Shrinking genomes? Evidence from genome size variation in Crepis (Compositae).

    PubMed

    Enke, N; Fuchs, J; Gemeinholzer, B

    2011-01-01

    Large-scale surveys of genome size evolution in angiosperms show that the ancestral genome was most likely small, with a tendency towards an increase in DNA content during evolution. Due to polyploidisation and self-replicating DNA elements, angiosperm genomes were considered to have a 'one-way ticket to obesity' (Bennetzen & Kellogg 1997). New findings on how organisms can lose DNA challenged the hypotheses of unidirectional evolution of genome size. The present study is based on the classical work of Babcock (1947a) on karyotype evolution within Crepis and analyses karyotypic diversification within the genus in a phylogenetic context. Genome size of 21 Crepis species was estimated using flow cytometry. Additional data of 17 further species were taken from the literature. Within 30 diploid Crepis species there is a striking trend towards genome contraction. The direction of genome size evolution was analysed by reconstructing ancestral character states on a molecular phylogeny based on ITS sequence data. DNA content is correlated to distributional aspects as well as life form. Genome size is significantly higher in perennials than in annuals. Within sampled species, very small genomes are only present in Mediterranean or European species, whereas their Central and East Asian relatives have larger 1C values.

  15. SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets.

    PubMed

    Sarovich, Derek S; Price, Erin P

    2014-09-08

    Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/.

  16. FastML: a web server for probabilistic reconstruction of ancestral sequences

    PubMed Central

    Ashkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, Tal

    2012-01-01

    Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.il/. PMID:22661579

  17. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE PAGES

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; ...

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  18. BactoGeNIE: A large-scale comparative genome visualization for big displays

    SciTech Connect

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.

  19. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-04-28

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.

  20. Ancestral Chromatin Configuration Constrains Chromatin Evolution on Differentiating Sex Chromosomes in Drosophila.

    PubMed

    Zhou, Qi; Bachtrog, Doris

    2015-06-01

    Sex chromosomes evolve distinctive types of chromatin from a pair of ancestral autosomes that are usually euchromatic. In Drosophila, the dosage-compensated X becomes enriched for hyperactive chromatin in males (mediated by H4K16ac), while the Y chromosome acquires silencing heterochromatin (enriched for H3K9me2/3). Drosophila autosomes are typically mostly euchromatic but the small dot chromosome has evolved a heterochromatin-like milieu (enriched for H3K9me2/3) that permits the normal expression of dot-linked genes, but which is different from typical pericentric heterochromatin. In Drosophila busckii, the dot chromosomes have fused to the ancestral sex chromosomes, creating a pair of 'neo-sex' chromosomes. Here we collect genomic, transcriptomic and epigenomic data from D. busckii, to investigate the evolutionary trajectory of sex chromosomes from a largely heterochromatic ancestor. We show that the neo-sex chromosomes formed <1 million years ago, but nearly 60% of neo-Y linked genes have already become non-functional. Expression levels are generally lower for the neo-Y alleles relative to their neo-X homologs, and the silencing heterochromatin mark H3K9me2, but not H3K9me3, is significantly enriched on silenced neo-Y genes. Despite rampant neo-Y degeneration, we find that the neo-X is deficient for the canonical histone modification mark of dosage compensation (H4K16ac), relative to autosomes or the compensated ancestral X chromosome, possibly reflecting constraints imposed on evolving hyperactive chromatin in an originally heterochromatic environment. Yet, neo-X genes are transcriptionally more active in males, relative to females, suggesting the evolution of incipient dosage compensation on the neo-X. Our data show that Y degeneration proceeds quickly after sex chromosomes become established through genomic and epigenetic changes, and are consistent with the idea that the evolution of sex-linked chromatin is influenced by its ancestral configuration.

  1. Ancestral Chromatin Configuration Constrains Chromatin Evolution on Differentiating Sex Chromosomes in Drosophila

    PubMed Central

    Zhou, Qi; Bachtrog, Doris

    2015-01-01

    Sex chromosomes evolve distinctive types of chromatin from a pair of ancestral autosomes that are usually euchromatic. In Drosophila, the dosage-compensated X becomes enriched for hyperactive chromatin in males (mediated by H4K16ac), while the Y chromosome acquires silencing heterochromatin (enriched for H3K9me2/3). Drosophila autosomes are typically mostly euchromatic but the small dot chromosome has evolved a heterochromatin-like milieu (enriched for H3K9me2/3) that permits the normal expression of dot-linked genes, but which is different from typical pericentric heterochromatin. In Drosophila busckii, the dot chromosomes have fused to the ancestral sex chromosomes, creating a pair of ‘neo-sex’ chromosomes. Here we collect genomic, transcriptomic and epigenomic data from D. busckii, to investigate the evolutionary trajectory of sex chromosomes from a largely heterochromatic ancestor. We show that the neo-sex chromosomes formed <1 million years ago, but nearly 60% of neo-Y linked genes have already become non-functional. Expression levels are generally lower for the neo-Y alleles relative to their neo-X homologs, and the silencing heterochromatin mark H3K9me2, but not H3K9me3, is significantly enriched on silenced neo-Y genes. Despite rampant neo-Y degeneration, we find that the neo-X is deficient for the canonical histone modification mark of dosage compensation (H4K16ac), relative to autosomes or the compensated ancestral X chromosome, possibly reflecting constraints imposed on evolving hyperactive chromatin in an originally heterochromatic environment. Yet, neo-X genes are transcriptionally more active in males, relative to females, suggesting the evolution of incipient dosage compensation on the neo-X. Our data show that Y degeneration proceeds quickly after sex chromosomes become established through genomic and epigenetic changes, and are consistent with the idea that the evolution of sex-linked chromatin is influenced by its ancestral configuration. PMID

  2. Engineering large viral DNA genomes using the CRISPR-Cas9 system.

    PubMed

    Suenaga, Tadahiro; Kohyama, Masako; Hirayasu, Kouyuki; Arase, Hisashi

    2014-09-01

    Manipulation of viral genomes is essential for studying viral gene function and utilizing viruses for therapy. Several techniques for viral genome engineering have been developed. Homologous recombination in virus-infected cells has traditionally been used to edit viral genomes; however, the frequency of the expected recombination is quite low. Alternatively, large viral genomes have been edited using a bacterial artificial chromosome (BAC) plasmid system. However, cloning of large viral genomes into BAC plasmids is both laborious and time-consuming. In addition, because it is possible for insertion into the viral genome of drug selection markers or parts of BAC plasmids to affect viral function, artificial genes sometimes need to be removed from edited viruses. Herpes simplex virus (HSV), a common DNA virus with a genome length of 152 kbp, causes labialis, genital herpes and encephalitis. Mutant HSV is a candidate for oncotherapy, in which HSV is used to kill tumor cells. In this study, the clustered regularly interspaced short palindromic repeat-Cas9 system was used to very efficiently engineer HSV without inserting artificial genes into viral genomes. Not only gene-ablated HSV but also gene knock-in HSV were generated using this method. Furthermore, selection with phenotypes of edited genes promotes the isolation efficiencies of expectedly mutated viral clones. Because our method can be applied to other DNA viruses such as Epstein-Barr virus, cytomegaloviruses, vaccinia virus and baculovirus, our system will be useful for studying various types of viruses, including clinical isolates.

  3. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis

    PubMed Central

    Wheeler, Eleanor; Leong, Aaron; Yao, Jie; Hong, Jaeyoung; Chu, Audrey Y.; Zhang, Weihua; Wang, Xu; Maruthur, Nisa M.; Porneala, Bianca C.; Jia, Yucheng; Kabagambe, Edmond K.; Chang, Li-Ching; Chen, Wei-Min; Elks, Cathy E.; Fan, Qiao; Giulianini, Franco; Go, Min Jin; Hottenga, Jouke-Jan; Hu, Yao; Jackson, Anne U.; Kanoni, Stavroula; Kleber, Marcus E.; Lu, Yingchang; Mahajan, Anubha; Marzi, Carola; Nalls, Mike A.; Nolte, Ilja M.; Rose, Lynda M.; Rybin, Denis V.; Shi, Yuan; Stram, Daniel O.; Tan, Shu Pei; Zhao, Wanting; Goel, Anuj; Martinez Larrad, Maria Teresa; Radke, Dörte; Salo, Perttu; van Iperen, Erik P. A.; Abecasis, Goncalo; Afaq, Saima; Bertoni, Alain G.; Bonnefond, Amelie; Böttcher, Yvonne; Chen, Chien-Hsiun; Cho, Yoon Shin; Garvey, W. Timothy; Gieger, Christian; Goodarzi, Mark O.; Grallert, Harald; Hamsten, Anders; Hartman, Catharina A.; Hsiung, Chao Agnes; Igase, Michiya; Isono, Masato; Khor, Chiea-Chuen; Kiess, Wieland; Kohara, Katsuhiko; Lee, Juyoung; Lehne, Benjamin; Li, Huaixing; Liu, Jianjun; Lobbens, Stephane; Luan, Jian'an; Lyssenko, Valeriya; Meitinger, Thomas; Miki, Tetsuro; Moon, Sanghoon; Mulas, Antonella; Müller-Nurasyid, Martina; Nagaraja, Ramaiah; Nauck, Matthias; Pankow, James S.; Polasek, Ozren; Prokopenko, Inga; Rasmussen-Torvik, Laura; Rathmann, Wolfgang; Rich, Stephen S.; Robertson, Neil R.; Roden, Michael; Roussel, Ronan; Rudan, Igor; Scott, Robert A.; Scott, William R.; Sennblad, Bengt; Siscovick, David S.; Strauch, Konstantin; Sun, Liang; Taylor, Kent D.; Teo, Yik-Ying; Tham, Yih Chung; Tönjes, Anke; Willemsen, Gonneke; Wilsgaard, Tom; Egan, Josephine; Hovingh, G. Kees; Jula, Antti; Kumari, Meena; Njølstad, Inger; Serrano Ríos, Manuel; Stumvoll, Michael; Watkins, Hugh; Aung, Tin; Blüher, Matthias; Boehnke, Michael; Bornstein, Stefan R.; Chambers, John C.; Chasman, Daniel I.; Chen, Yii-Der Ida; Chen, Yduan-Tsong; Cheng, Ching-Yu; Deloukas, Panos; Evans, Michele K.; Fornage, Myriam; Froguel, Philippe; Groop, Leif; Gross, Myron D.; Harris, Tamara B.; Hayward, Caroline; Ingelsson, Erik; Kato, Norihiro; Kim, Bong-Jo; Koh, Woon-Puay; Kooner, Jaspal S.; Körner, Antje; Kuh, Diana; Kuusisto, Johanna; Laakso, Markku; Lin, Xu; Liu, Yongmei; Loos, Ruth J. F.; März, Winfried; Pedersen, Nancy L.; Ridker, Paul M.; Saleheen, Danish; Saltevo, Juha; Schwarz, Peter EH.; Sheu, Wayne H. H.; Snieder, Harold; Spector, Timothy D.; Tabara, Yasuharu; Tuomilehto, Jaakko; Wilson, James G.; Wolffenbuttel, Bruce H. R.; Wu, Jer-Yuarn; Zonderman, Alan B.; Soranzo, Nicole; Guo, Xiuqing; Roberts, David J.; Florez, Jose C.; Tai, E-Shyong; Selvin, Elizabeth; Rotter, Jerome I.

    2017-01-01

    Background Glycated hemoglobin (HbA1c) is used to diagnose type 2 diabetes (T2D) and assess glycemic control in patients with diabetes. Previous genome-wide association studies (GWAS) have identified 18 HbA1c-associated genetic variants. These variants proved to be classifiable by their likely biological action as erythrocytic (also associated with erythrocyte traits) or glycemic (associated with other glucose-related traits). In this study, we tested the hypotheses that, in a very large scale GWAS, we would identify more genetic variants associated with HbA1c and that HbA1c variants implicated in erythrocytic biology would affect the diagnostic accuracy of HbA1c. We therefore expanded the number of HbA1c-associated loci and tested the effect of genetic risk-scores comprised of erythrocytic or glycemic variants on incident diabetes prediction and on prevalent diabetes screening performance. Throughout this multiancestry study, we kept a focus on interancestry differences in HbA1c genetics performance that might influence race-ancestry differences in health outcomes. Methods & findings Using genome-wide association meta-analyses in up to 159,940 individuals from 82 cohorts of European, African, East Asian, and South Asian ancestry, we identified 60 common genetic variants associated with HbA1c. We classified variants as implicated in glycemic, erythrocytic, or unclassified biology and tested whether additive genetic scores of erythrocytic variants (GS-E) or glycemic variants (GS-G) were associated with higher T2D incidence in multiethnic longitudinal cohorts (N = 33,241). Nineteen glycemic and 22 erythrocytic variants were associated with HbA1c at genome-wide significance. GS-G was associated with higher T2D risk (incidence OR = 1.05, 95% CI 1.04–1.06, per HbA1c-raising allele, p = 3 × 10−29); whereas GS-E was not (OR = 1.00, 95% CI 0.99–1.01, p = 0.60). In Europeans and Asians, erythrocytic variants in aggregate had only modest effects on the diagnostic

  4. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis.

    PubMed

    Wheeler, Eleanor; Leong, Aaron; Liu, Ching-Ti; Hivert, Marie-France; Strawbridge, Rona J; Podmore, Clara; Li, Man; Yao, Jie; Sim, Xueling; Hong, Jaeyoung; Chu, Audrey Y; Zhang, Weihua; Wang, Xu; Chen, Peng; Maruthur, Nisa M; Porneala, Bianca C; Sharp, Stephen J; Jia, Yucheng; Kabagambe, Edmond K; Chang, Li-Ching; Chen, Wei-Min; Elks, Cathy E; Evans, Daniel S; Fan, Qiao; Giulianini, Franco; Go, Min Jin; Hottenga, Jouke-Jan; Hu, Yao; Jackson, Anne U; Kanoni, Stavroula; Kim, Young Jin; Kleber, Marcus E; Ladenvall, Claes; Lecoeur, Cecile; Lim, Sing-Hui; Lu, Yingchang; Mahajan, Anubha; Marzi, Carola; Nalls, Mike A; Navarro, Pau; Nolte, Ilja M; Rose, Lynda M; Rybin, Denis V; Sanna, Serena; Shi, Yuan; Stram, Daniel O; Takeuchi, Fumihiko; Tan, Shu Pei; van der Most, Peter J; Van Vliet-Ostaptchouk, Jana V; Wong, Andrew; Yengo, Loic; Zhao, Wanting; Goel, Anuj; Martinez Larrad, Maria Teresa; Radke, Dörte; Salo, Perttu; Tanaka, Toshiko; van Iperen, Erik P A; Abecasis, Goncalo; Afaq, Saima; Alizadeh, Behrooz Z; Bertoni, Alain G; Bonnefond, Amelie; Böttcher, Yvonne; Bottinger, Erwin P; Campbell, Harry; Carlson, Olga D; Chen, Chien-Hsiun; Cho, Yoon Shin; Garvey, W Timothy; Gieger, Christian; Goodarzi, Mark O; Grallert, Harald; Hamsten, Anders; Hartman, Catharina A; Herder, Christian; Hsiung, Chao Agnes; Huang, Jie; Igase, Michiya; Isono, Masato; Katsuya, Tomohiro; Khor, Chiea-Chuen; Kiess, Wieland; Kohara, Katsuhiko; Kovacs, Peter; Lee, Juyoung; Lee, Wen-Jane; Lehne, Benjamin; Li, Huaixing; Liu, Jianjun; Lobbens, Stephane; Luan, Jian'an; Lyssenko, Valeriya; Meitinger, Thomas; Miki, Tetsuro; Miljkovic, Iva; Moon, Sanghoon; Mulas, Antonella; Müller, Gabriele; Müller-Nurasyid, Martina; Nagaraja, Ramaiah; Nauck, Matthias; Pankow, James S; Polasek, Ozren; Prokopenko, Inga; Ramos, Paula S; Rasmussen-Torvik, Laura; Rathmann, Wolfgang; Rich, Stephen S; Robertson, Neil R; Roden, Michael; Roussel, Ronan; Rudan, Igor; Scott, Robert A; Scott, William R; Sennblad, Bengt; Siscovick, David S; Strauch, Konstantin; Sun, Liang; Swertz, Morris; Tajuddin, Salman M; Taylor, Kent D; Teo, Yik-Ying; Tham, Yih Chung; Tönjes, Anke; Wareham, Nicholas J; Willemsen, Gonneke; Wilsgaard, Tom; Hingorani, Aroon D; Egan, Josephine; Ferrucci, Luigi; Hovingh, G Kees; Jula, Antti; Kivimaki, Mika; Kumari, Meena; Njølstad, Inger; Palmer, Colin N A; Serrano Ríos, Manuel; Stumvoll, Michael; Watkins, Hugh; Aung, Tin; Blüher, Matthias; Boehnke, Michael; Boomsma, Dorret I; Bornstein, Stefan R; Chambers, John C; Chasman, Daniel I; Chen, Yii-Der Ida; Chen, Yduan-Tsong; Cheng, Ching-Yu; Cucca, Francesco; de Geus, Eco J C; Deloukas, Panos; Evans, Michele K; Fornage, Myriam; Friedlander, Yechiel; Froguel, Philippe; Groop, Leif; Gross, Myron D; Harris, Tamara B; Hayward, Caroline; Heng, Chew-Kiat; Ingelsson, Erik; Kato, Norihiro; Kim, Bong-Jo; Koh, Woon-Puay; Kooner, Jaspal S; Körner, Antje; Kuh, Diana; Kuusisto, Johanna; Laakso, Markku; Lin, Xu; Liu, Yongmei; Loos, Ruth J F; Magnusson, Patrik K E; März, Winfried; McCarthy, Mark I; Oldehinkel, Albertine J; Ong, Ken K; Pedersen, Nancy L; Pereira, Mark A; Peters, Annette; Ridker, Paul M; Sabanayagam, Charumathi; Sale, Michele; Saleheen, Danish; Saltevo, Juha; Schwarz, Peter Eh; Sheu, Wayne H H; Snieder, Harold; Spector, Timothy D; Tabara, Yasuharu; Tuomilehto, Jaakko; van Dam, Rob M; Wilson, James G; Wilson, James F; Wolffenbuttel, Bruce H R; Wong, Tien Yin; Wu, Jer-Yuarn; Yuan, Jian-Min; Zonderman, Alan B; Soranzo, Nicole; Guo, Xiuqing; Roberts, David J; Florez, Jose C; Sladek, Robert; Dupuis, Josée; Morris, Andrew P; Tai, E-Shyong; Selvin, Elizabeth; Rotter, Jerome I; Langenberg, Claudia; Barroso, Inês; Meigs, James B

    2017-09-01

    Glycated hemoglobin (HbA1c) is used to diagnose type 2 diabetes (T2D) and assess glycemic control in patients with diabetes. Previous genome-wide association studies (GWAS) have identified 18 HbA1c-associated genetic variants. These variants proved to be classifiable by their likely biological action as erythrocytic (also associated with erythrocyte traits) or glycemic (associated with other glucose-related traits). In this study, we tested the hypotheses that, in a very large scale GWAS, we would identify more genetic variants associated with HbA1c and that HbA1c variants implicated in erythrocytic biology would affect the diagnostic accuracy of HbA1c. We therefore expanded the number of HbA1c-associated loci and tested the effect of genetic risk-scores comprised of erythrocytic or glycemic variants on incident diabetes prediction and on prevalent diabetes screening performance. Throughout this multiancestry study, we kept a focus on interancestry differences in HbA1c genetics performance that might influence race-ancestry differences in health outcomes. Using genome-wide association meta-analyses in up to 159,940 individuals from 82 cohorts of European, African, East Asian, and South Asian ancestry, we identified 60 common genetic variants associated with HbA1c. We classified variants as implicated in glycemic, erythrocytic, or unclassified biology and tested whether additive genetic scores of erythrocytic variants (GS-E) or glycemic variants (GS-G) were associated with higher T2D incidence in multiethnic longitudinal cohorts (N = 33,241). Nineteen glycemic and 22 erythrocytic variants were associated with HbA1c at genome-wide significance. GS-G was associated with higher T2D risk (incidence OR = 1.05, 95% CI 1.04-1.06, per HbA1c-raising allele, p = 3 × 10-29); whereas GS-E was not (OR = 1.00, 95% CI 0.99-1.01, p = 0.60). In Europeans and Asians, erythrocytic variants in aggregate had only modest effects on the diagnostic accuracy of HbA1c. Yet, in African

  5. A guided tour of large genome size in animals: what we know and where we are heading.

    PubMed

    Dufresne, France; Jeffery, Nicholas

    2011-10-01

    The study of genome size diversity is an ever-expanding field that is highly relevant in today's world of rapid and efficient DNA sequencing. Animal genome sizes range from 0.02 to 132.83 pg but the majority of animal genomes are small, with the most of these genome sizes being less than 5 pg. Animals with large genomes (> 10 pg) are scattered within some invertebrates, including the Platyhelminthes, crustaceans, and orthopterans, and also the vertebrates including the Actinopterygii, Chondrichthyes, and some amphibians. In this paper, we explore the connections between organismal phenotype, physiology, and ecology to genome size. We also discuss some of the molecular mechanisms of genome shrinkage and expansion obtained through comparative studies of species with full genome sequences and how this may apply to species with large genomes. As most animal species sequenced to date have been in the small range for genome size (especially invertebrates) due to sequencing costs and to difficulties associated with large genome assemblies, an understanding of the structural composition of large genomes is still lacking. Studies using next-generation sequencing are being attempted for the first time in animals with larger genomes. Such analyses using low genome coverage are providing a glimpse of the composition of repetitive elements in animals with more complex genomes. These future studies will allow a better understanding of factors leading to genomic obesity in animals.

  6. Independent evolution of genomic characters during major metazoan transitions.

    PubMed

    Simakov, Oleg; Kawashima, Takeshi

    2017-07-15

    Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  7. The draft genome of the large yellow croaker reveals well-developed innate immunity

    PubMed Central

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-01-01

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies. PMID:25407894

  8. The draft genome of the large yellow croaker reveals well-developed innate immunity.

    PubMed

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-11-19

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies.

  9. Genome evolution in Reptilia, the sister group of mammals.

    PubMed

    Janes, Daniel E; Organ, Christopher L; Fujita, Matthew K; Shedlock, Andrew M; Edwards, Scott V

    2010-01-01

    The genomes of birds and nonavian reptiles (Reptilia) are critical for understanding genome evolution in mammals and amniotes generally. Despite decades of study at the chromosomal and single-gene levels, and the evidence for great diversity in genome size, karyotype, and sex chromosome diversity, reptile genomes are virtually unknown in the comparative genomics era. The recent sequencing of the chicken and zebra finch genomes, in conjunction with genome scans and the online publication of the Anolis lizard genome, has begun to clarify the events leading from an ancestral amniote genome--predicted to be large and to possess a diverse repeat landscape on par with mammals and a birdlike sex chromosome system--to the small and highly streamlined genomes of birds. Reptilia exhibit a wide range of evolutionary rates of different subgenomes and, from isochores to mitochondrial DNA, provide a critical contrast to the genomic paradigms established in mammals.

  10. Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

    PubMed Central

    Biji, Christopher Leela; Madhu, Manu K; Vishnu, Vineetha; K, Satheesh Kumar; Vijayakumar; Nair, Achuthsankar S

    2015-01-01

    The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD. Availability The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/ PMID:26124572

  11. Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

    USDA-ARS?s Scientific Manuscript database

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...

  12. Systematics and morphological evolution within the moss family Bryaceae: a comparison between parsimony and Bayesian methods for reconstruction of ancestral character states.

    PubMed

    Pedersen, Niklas; Holyoak, David T; Newton, Angela E

    2007-06-01

    The Bryaceae are a large cosmopolitan moss family including genera of significant morphological and taxonomic complexity. Phylogenetic relationships within the Bryaceae were reconstructed based on DNA sequence data from all three genomic compartments. In addition, maximum parsimony and Bayesian inference were employed to reconstruct ancestral character states of 38 morphological plus four habitat characters and eight insertion/deletion events. The recovered phylogenetic patterns are generally in accord with previous phylogenies based on chloroplast DNA sequence data and three major clades are identified. The first clade comprises Bryum bornholmense, B. rubens, B. caespiticium, and Plagiobryum. This corroborates the hypothesis suggested by previous studies that several Bryum species are more closely related to Plagiobryum than to the core Bryum species. The second clade includes Acidodontium, Anomobryum, and Haplodontium, while the third clade contains the core Bryum species plus Imbribryum. Within the latter clade, B. subapiculatum and B. tenuisetum form the sister clade to Imbribryum. Reconstructions of ancestral character states under maximum parsimony and Bayesian inference suggest fourteen morphological synapomorphies for the ingroup and synapomorphies are detected for most clades within the ingroup. Maximum parsimony and Bayesian reconstructions of ancestral character states are mostly congruent although Bayesian inference shows that the posterior probability of ancestral character states may decrease dramatically when node support is taken into account. Bayesian inference also indicates that reconstructions may be ambiguous at internal nodes for highly polymorphic characters.

  13. Feasibility of Large-Scale Genomic Testing to Facilitate Enrollment Onto Genomically Matched Clinical Trials

    PubMed Central

    Meric-Bernstam, Funda; Brusco, Lauren; Shaw, Kenna; Horombe, Chacha; Kopetz, Scott; Davies, Michael A.; Routbort, Mark; Piha-Paul, Sarina A.; Janku, Filip; Ueno, Naoto; Hong, David; De Groot, John; Ravi, Vinod; Li, Yisheng; Luthra, Raja; Patel, Keyur; Broaddus, Russell; Mendelsohn, John; Mills, Gordon B.

    2015-01-01

    Purpose We report the experience with 2,000 consecutive patients with advanced cancer who underwent testing on a genomic testing protocol, including the frequency of actionable alterations across tumor types, subsequent enrollment onto clinical trials, and the challenges for trial enrollment. Patients and Methods Standardized hotspot mutation analysis was performed in 2,000 patients, using either an 11-gene (251 patients) or a 46- or 50-gene (1,749 patients) multiplex platform. Thirty-five genes were considered potentially actionable based on their potential to be targeted with approved or investigational therapies. Results Seven hundred eighty-nine patients (39%) had at least one mutation in potentially actionable genes. Eighty-three patients (11%) with potentially actionable mutations went on genotype-matched trials targeting these alterations. Of 230 patients with PIK3CA/AKT1/PTEN/BRAF mutations that returned for therapy, 116 (50%) received a genotype-matched drug. Forty patients (17%) were treated on a genotype-selected trial requiring a mutation for eligibility, 16 (7%) were treated on a genotype-relevant trial targeting a genomic alteration without biomarker selection, and 40 (17%) received a genotype-relevant drug off trial. Challenges to trial accrual included patient preference of noninvestigational treatment or local treatment, poor performance status or other reasons for trial ineligibility, lack of trials/slots, and insurance denial. Conclusion Broad implementation of multiplex hotspot testing is feasible; however, only a small portion of patients with actionable alterations were actually enrolled onto genotype-matched trials. Increased awareness of therapeutic implications and access to novel therapeutics are needed to optimally leverage results from broad-based genomic testing. PMID:26014291

  14. Array-comparative genomic hybridization profiling of immunohistochemical subgroups of diffuse large B-cell lymphoma shows distinct genomic alterations

    PubMed Central

    Guo, Ying; Takeuchi, Ichiro; Karnan, Sivasundaram; Miyata, Tomoko; Ohshima, Koichi; Seto, Masao

    2014-01-01

    Diffuse large B-cell lymphoma (DLBCL) displays striking heterogeneity at the clinical, genetic and molecular levels. Subtypes include germinal center B-cell-like (GCB) DLBCL and activated B-cell-like (ABC) DLBCL, according to microarray analysis, and germinal center type or non-germinal center type by immunohistochemistry. Although some reports have described genomic aberrations based upon microarray classification system, genomic aberrations based upon immunohistochemical classifications have rarely been reported. The present study aimed to ascertain the relationship between genomic aberrations and subtypes identified by immunohistochemistry, and to study the pathogenetic character of Chinese DLBCL. We conducted immunohistochemistry using antibodies against CD10, BCL6 and MUM1 in 59 samples of DLBCL from Chinese patients, and then performed microarray-based comparative genomic hybridization for each case. Characteristic genomic differences were found between GCB and non-GCB DLBCL from the array data. The GCB type was characterized by more gains at 7q (7q22.1, P < 0.05) and losses at 16q (P ≤ 0.05), while the non-GCB type was characterized by gains at 11q24.3 and 3q13.2 (P < 0.05). We found completely different mutations in BCL6+ and BCL6− non-GCB type DLBCL, whereby the BCL6− group had a higher number of gains at 1q and a loss at 14q32.13 (P ≤ 0.005), while the BCL6+ group showed a higher number of gains at 14q23.1 (P = 0.15) and losses at 6q (P = 0.07). The BCL6− group had a higher frequency of genomic imbalances compared to the BCL6+ group. In conclusion, the BCL6+ and BCL6− non-GCB type of DLBCL appear to have different mechanisms of pathogenesis. PMID:24843885

  15. Array-comparative genomic hybridization profiling of immunohistochemical subgroups of diffuse large B-cell lymphoma shows distinct genomic alterations.

    PubMed

    Guo, Ying; Takeuchi, Ichiro; Karnan, Sivasundaram; Miyata, Tomoko; Ohshima, Koichi; Seto, Masao

    2014-04-01

    Diffuse large B-cell lymphoma (DLBCL) displays striking heterogeneity at the clinical, genetic and molecular levels. Subtypes include germinal center B-cell-like (GCB) DLBCL and activated B-cell-like (ABC) DLBCL, according to microarray analysis, and germinal center type or non-germinal center type by immunohistochemistry. Although some reports have described genomic aberrations based upon microarray classification system, genomic aberrations based upon immunohistochemical classifications have rarely been reported. The present study aimed to ascertain the relationship between genomic aberrations and subtypes identified by immunohistochemistry, and to study the pathogenetic character of Chinese DLBCL. We conducted immunohistochemistry using antibodies against CD10, BCL6 and MUM1 in 59 samples of DLBCL from Chinese patients, and then performed microarray-based comparative genomic hybridization for each case. Characteristic genomic differences were found between GCB and non-GCB DLBCL from the array data. The GCB type was characterized by more gains at 7q (7q22.1, P < 0.05) and losses at 16q (P ≤ 0.05), while the non-GCB type was characterized by gains at 11q24.3 and 3q13.2 (P < 0.05). We found completely different mutations in BCL6+ and BCL6- non-GCB type DLBCL, whereby the BCL6- group had a higher number of gains at 1q and a loss at 14q32.13 (P ≤ 0.005), while the BCL6+ group showed a higher number of gains at 14q23.1 (P = 0.15) and losses at 6q (P = 0.07). The BCL6- group had a higher frequency of genomic imbalances compared to the BCL6+ group. In conclusion, the BCL6+ and BCL6- non-GCB type of DLBCL appear to have different mechanisms of pathogenesis.

  16. Large-Scale Comparative Genomics Meta-Analysis of Campylobacter jejuni Isolates Reveals Low Level of Genome Plasticity

    PubMed Central

    Taboada, Eduardo N.; Acedillo, Rey R.; Carrillo, Catherine D.; Findlay, Wendy A.; Medeiros, Diane T.; Mykytczuk, Oksana L.; Roberts, Michael J.; Valencia, C. Alexander; Farber, Jeffrey M.; Nash, John H. E.

    2004-01-01

    We have used comparative genomic hybridization (CGH) on a full-genome Campylobacter jejuni microarray to examine genome-wide gene conservation patterns among 51 strains isolated from food and clinical sources. These data have been integrated with data from three previous C. jejuni CGH studies to perform a meta-analysis that included 97 strains from the four separate data sets. Although many genes were found to be divergent across multiple strains (n = 350), many genes (n = 249) were uniquely variable in single strains. Thus, the strains in each data set comprise strains with a unique genetic diversity not found in the strains in the other data sets. Despite the large increase in the collective number of variable C. jejuni genes (n = 599) found in the meta-analysis data set, nearly half of these (n = 276) mapped to previously defined variable loci, and it therefore appears that large regions of the C. jejuni genome are genetically stable. A detailed analysis of the microarray data revealed that divergent genes could be differentiated on the basis of the amplitudes of their differential microarray signals. Of 599 variable genes, 122 could be classified as highly divergent on the basis of CGH data. Nearly all highly divergent genes (117 of 122) had divergent neighbors and showed high levels of intraspecies variability. The approach outlined here has enabled us to distinguish global trends of gene conservation in C. jejuni and has enabled us to define this group of genes as a robust set of variable markers that can become the cornerstone of a new generation of genotyping methods that use genome-wide C. jejuni gene variability data. PMID:15472310

  17. Recreating a functional ancestral archosaur visual pigment.

    PubMed

    Chang, Belinda S W; Jönsson, Karolina; Kazmi, Manija A; Donoghue, Michael J; Sakmar, Thomas P

    2002-09-01

    The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods.

  18. Widely Divergent Transcriptional Patterns Between SLE Patients of Different Ancestral Backgrounds in Sorted Immune Cell Populations

    PubMed Central

    Rosenzweig, Elizabeth; Rao, Swapna; Ko, Kichul; Niewold, Timothy B.

    2015-01-01

    Systemic lupus erythematosus (SLE) is a complex autoimmune disease of uncertain etiology. Patients from different ancestral backgrounds demonstrate differences in clinical manifestations and autoantibody profiles. We examined genome-wide transcriptional patterns in major immune cell subsets across different ancestral backgrounds. Peripheral blood was collected from African-American (AA) and European-American (EA) SLE patients and controls. CD4 T-cells, CD8 T-cells, monocytes, and B cells were purified by flow sorting, and each cell subset from each subject was run on a genome-wide expression array. Cases were compared to controls of the same ancestral background. The overlap in differentially expressed gene (DEG) lists between different cell types from the same ancestral background was modest (<10%), and only 5-8% overlap in DEG lists was observed when comparing the same cell type between different ancestral backgrounds. IFN-stimulated gene (ISG) expression was not up-regulated synchronously in all cell types from a given patient, for example a given subject could have high ISG expression in T and B cells, but not in monocytes. AA subjects demonstrated more concordance in ISG expression between cell types from the same individual, and AA patients demonstrated significant down-regulation of metabolic gene expression which was not observed in EA patients. ISG expression was significantly decreased in B cells in patients taking immunosuppressants, while ISGs in other cell types did not differ with medication use. In conclusion, gene expression was strikingly different between immune cell subsets and between ancestral backgrounds in SLE patients. These findings emphasize the critical importance of studying multiple ancestral backgrounds and multiple cell types in gene expression studies. Ancestral backgrounds which are not studied will not benefit from personalized medicine strategies in SLE. PMID:25921064

  19. Assessing the utility of confirmatory studies following identification of large-scale genomic imbalances by microarray.

    PubMed

    Sanmann, Jennifer N; Pickering, Diane L; Golden, Denae M; Stevens, Jadd M; Hempel, Thomas E; Althof, Pamela A; Wiggins, Michele L; Starr, Lois J; Davé, Bhavana J; Sanger, Warren G

    2015-11-01

    The identification of clinically relevant genomic dosage anomalies assists in accurate diagnosis, prognosis, and medical management of affected individuals. Technological advancements within the field, such as the advent of microarray, have markedly increased the resolution of detection; however, clinical laboratories have maintained conventional techniques for confirmation of genomic imbalances identified by microarray to ensure diagnostic accuracy. In recent years the utility of this confirmatory testing of large-scale aberrations has been questioned but has not been scientifically addressed. We retrospectively reviewed 519 laboratory cases with genomic imbalances meeting reportable criteria by microarray and subsequently confirmed with a second technology, primarily fluorescence in situ hybridization. All genomic imbalances meeting reportable criteria detected by microarray were confirmed with a second technology. Microarray analysis generated no false-positive results. Confirmatory testing of large-scale genomic imbalances (deletion of ≥150 kb, duplication of ≥500 kb) solely for the purpose of microarray verification may be unwarranted. In some cases, however, adjunct testing is necessary to overcome limitations inherent to microarray. A recommended clinical strategy for adjunct testing following identified genomic imbalances using microarray is detailed.

  20. Large-scale recoding of a bacterial genome by iterative recombineering of synthetic DNA

    PubMed Central

    Lau, Yu Heng; Stirling, Finn; Kuo, James; Karrenbelt, Michiel A. P.; Chan, Yujia A.; Riesselman, Adam; Horton, Connor A.; Schäfer, Elena; Lips, David; Weinstock, Matthew T.; Gibson, Daniel G.; Way, Jeffrey C.

    2017-01-01

    Abstract The ability to rewrite large stretches of genomic DNA enables the creation of new organisms with customized functions. However, few methods currently exist for accumulating such widespread genomic changes in a single organism. In this study, we demonstrate a rapid approach for rewriting bacterial genomes with modified synthetic DNA. We recode 200 kb of the Salmonella typhimurium LT2 genome through a process we term SIRCAS (stepwise integration of rolling circle amplified segments), towards constructing an attenuated and genetically isolated bacterial chassis. The SIRCAS process involves direct iterative recombineering of 10–25 kb synthetic DNA constructs which are assembled in yeast and amplified by rolling circle amplification. Using SIRCAS, we create a Salmonella with 1557 synonymous leucine codon replacements across 176 genes, the largest number of cumulative recoding changes in a single bacterial strain to date. We demonstrate reproducibility over sixteen two-day cycles of integration and parallelization for hierarchical construction of a synthetic genome by conjugation. The resulting recoded strain grows at a similar rate to the wild-type strain and does not exhibit any major growth defects. This work is the first instance of synthetic bacterial recoding beyond the Escherichia coli genome, and reveals that Salmonella is remarkably amenable to genome-scale modification. PMID:28499033

  1. When directed evolution met ancestral enzyme resurrection.

    PubMed

    Alcalde, Miguel

    2017-01-01

    The directed evolution of ancestral -resurrected- enzymes can give a new twist in protein engineering approaches towards more versatile and robust biocatalysts. © 2016 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  2. Chloroplast genomes of two conifers lack a large inverted repeat and are extensively rearranged.

    PubMed Central

    Strauss, S H; Palmer, J D; Howe, G T; Doerksen, A H

    1988-01-01

    Chloroplast genomes of Douglas-fir [Pseudotsuga menziesii (Mirb.) Franco] and radiata (Monterey) pine [Pinus radiata D. Don], two conifers from the widespread Pinaceae, were mapped and their genomes were compared to other land plants. Douglas-fir and radiata pine lack the large (20-25 kilobases) inverted repeat that characterizes most land plants. To our knowledge, this is only the second recorded loss of this ancient and highly conserved inverted repeat among all lineages of land plants thus far examined. Loss of the repeat largely accounts for the small size of the conifer genome, 120 kilobase, versus 140-160 kilobases in most land plants. Douglas-fir possesses a major inversion of 40-50 kilobases relative to radiata pine and nonconiferous plants. Nucleotide sequence differentiation between Douglas-fir and radiata pine was estimated to be 3.8%. Both conifer genomes possess a number of rearrangements relative to Osmunda, a fern, Ginkgo, a gymnosperm, and Petunia, an angiosperm. Among land plants, structural changes of this degree have occurred primarily within tribes of the legume family (Fabaceae) that have also lost the inverted repeat. These results support the hypothesis that the presence of the large inverted repeat stabilizes the chloroplast genome against major structural rearrangements. PMID:2836862

  3. Discovery of STL polyomavirus, a polyomavirus of ancestral recombinant origin that encodes a unique T antigen by alternative splicing.

    PubMed

    Lim, Efrem S; Reyes, Alejandro; Antonio, Martin; Saha, Debasish; Ikumapayi, Usman N; Adeyemi, Mitchell; Stine, O Colin; Skelton, Rebecca; Brennan, Daniel C; Mkakosya, Rajhab S; Manary, Mark J; Gordon, Jeffrey I; Wang, David

    2013-02-20

    The family Polyomaviridae is comprised of circular double-stranded DNA viruses, several of which are associated with diseases, including cancer, in immunocompromised patients. Here we describe a novel polyomavirus recovered from the fecal microbiota of a child in Malawi, provisionally named STL polyomavirus (STLPyV). We detected STLPyV in clinical stool specimens from USA and The Gambia at up to 1% frequency. Complete genome comparisons of two STLPyV strains demonstrated 5.2% nucleotide divergence. Alternative splicing of the STLPyV early region yielded a unique form of T antigen, which we named 229T, in addition to the expected large and small T antigens. STLPyV has a mosaic genome and shares an ancestral recombinant origin with MWPyV. The discovery of STLPyV highlights a novel alternative splicing strategy and advances our understanding of the complex evolutionary history of polyomaviruses.

  4. The genome of Oryctes rhinoceros nudivirus provides novel insight into the evolution of nuclear arthropod-specific large circular double-stranded DNA viruses.

    PubMed

    Wang, Yongjie; Bininda-Emonds, Olaf R P; van Oers, Monique M; Vlak, Just M; Jehle, Johannes A

    2011-06-01

    The Oryctes rhinoceros nudivirus (OrNV) is a dsDNA virus with enveloped, rod-shaped virions. Its genome is 127,615 bp in size and contains 139 predicted protein-coding open reading frames (ORFs). In-depth genome sequence comparisons revealed a varying number of shared gene homologues, not only with other nudiviruses (NVs) and baculoviruses, but also with other arthropod-specific large dsDNA viruses, including the so-called Monodon baculovirus (MBV), the salivary gland hypertrophy viruses (SGHVs) and white spot syndrome virus (WSSV). Nudivirus genomes contain 20 baculovirus core gene homologues associated with transcription (p47, lef-8, lef-9, lef-4, vlf-1, and lef-5), replication (dnapol and helicase), virus structure (p74, pif-1, pif-2, pif-3, 19kda/pif-4, odv-e56/pif-5, vp91, vp39, and 38K), and unknown functions (ac68, ac81, and p33). Most strikingly, a set of homologous genes involved in peroral infection (p74, pif-1, pif-2, and pif-3) are common to baculoviruses, nudiviruses, SGHVs, and WSSV indicating an ancestral mode of infection in these highly diverged viruses. A gene similar to polyhedrin/granulin encoding the baculovirus occlusion body protein was identified in non-occluded NVs and in Musca domestica SGHV evoking the question of the evolutionary origin of the baculovirus polyhedrin/granulin gene. Based on gene homologies, we further propose that the shrimp MBV is an occluded member of the nudiviruses. We conclude that baculoviruses, NVs and the shrimp MBV, the SGHVs and WSSV share the significant number of conserved genetic functions, which may point to a common ancestry of these viruses.

  5. Whole genome molecular phylogeny of large dsDNA viruses using composition vector method

    PubMed Central

    Gao, Lei; Qi, Ji

    2007-01-01

    Background One important mechanism by which large DNA viruses increase their genome size is the addition of modules acquired from other viruses, host genomes or gene duplications. Phylogenetic analysis of large DNA viruses, especially using methods based on alignment, is often difficult due to the presence of horizontal gene transfer events. The recent composition vector approach, not sensitive to such events, is applied here to reconstruct the phylogeny of 124 large DNA viruses. Results The results are mostly consistent with the biologist's systematics with only a few outliers and can also provide some information for those unclassified viruses and cladistic relationships of several families. Conclusion With composition vector approach we obtained the phylogenetic tree of large DNA viruses, which not only give results comparable to biologist's systematics but also provide a new way for recovering the phylogeny of viruses. PMID:17359548

  6. Software engineering the mixed model for genome-wide association studies on large samples

    USDA-ARS?s Scientific Manuscript database

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...

  7. Molecular cytogenetic and genomic insights into chromosomal evolution

    PubMed Central

    Ruiz-Herrera, A; Farré, M; Robinson, T J

    2012-01-01

    This review summarizes aspects of the extensive literature on the patterns and processes underpinning chromosomal evolution in vertebrates and especially placental mammals. It highlights the growing synergy between molecular cytogenetics and comparative genomics, particularly with respect to fully or partially sequenced genomes, and provides novel insights into changes in chromosome number and structure across deep division of the vertebrate tree of life. The examination of basal numbers in the deeper branches of the vertebrate tree suggest a haploid (n) chromosome number of 10–13 in an ancestral vertebrate, with modest increases in tetrapods and amniotes most probably by chromosomal fissioning. Information drawn largely from cross-species chromosome painting in the data-dense Placentalia permits the confident reconstruction of an ancestral karyotype comprising n=23 chromosomes that is similarly retained in Boreoeutheria. Using in silico genome-wide scans that include the newly released frog genome we show that of the nine ancient syntenies detected in conserved karyotypes of extant placentals (thought likely to reflect the structure of ancestral chromosomes), the human syntenic segmental associations 3p/21, 4pq/8p, 7a/16p, 14/15, 12qt/22q and 12pq/22qt predate the divergence of tetrapods. These findings underscore the enhanced quality of ancestral reconstructions based on the integrative molecular cytogenetic and comparative genomic approaches that collectively highlight a pattern of conserved syntenic associations that extends back ∼360 million years ago. PMID:22108627

  8. Inference of Ancestral Recombination Graphs through Topological Data Analysis

    PubMed Central

    Cámara, Pablo G.; Levine, Arnold J.; Rabadán, Raúl

    2016-01-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  9. HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples.

    PubMed

    Lekschas, Fritz; Bach, Benjamin; Kerpedjiev, Peter; Gehlenborg, Nils; Pfister, Hanspeter

    2017-08-29

    This paper presents an interactive visualization interface-HiPiler-for the exploration and visualization of regions-of-interest in large genome interaction matrices. Genome interaction matrices approximate the physical distance of pairs of regions on the genome to each other and can contain up to 3 million rows and columns with many sparse regions. Regions of interest (ROIs) can be defined, e.g., by sets of adjacent rows and columns, or by specific visual patterns in the matrix. However, traditional matrix aggregation or pan-and-zoom interfaces fail in supporting search, inspection, and comparison of ROIs in such large matrices. In HiPiler, ROIs are first-class objects, represented as thumbnail-like "snippets". Snippets can be interactively explored and grouped or laid out automatically in scatterplots, or through dimension reduction methods. Snippets are linked to the entire navigable genome interaction matrix through brushing and linking. The design of HiPiler is based on a series of semi-structured interviews with 10 domain experts involved in the analysis and interpretation of genome interaction matrices. We describe six exploration tasks that are crucial for analysis of interaction matrices and demonstrate how HiPiler supports these tasks. We report on a user study with a series of data exploration sessions with domain experts to assess the usability of HiPiler as well as to demonstrate respective findings in the data.

  10. Encroaching genomics: adapting large-scale science to small academic laboratories.

    PubMed

    Einarson, M B; Golemis, E A

    2000-04-27

    The process of conducting biological research is undergoing a profound metamorphosis due to the technological innovations and torrent of information resulting from the execution of multiple species genome projects. The further tasks of mapping polymorphisms and characterizing genome-wide protein-protein interaction (the characterization of the proteome) will continue to garner resources, talent, and public attention. Although some elements of these whole genome size projects can only be addressed by large research groups, consortia, or industry, the impact of these projects has already begun to transform the process of research in many small laboratories. Although the impact of this transformation is generally positive, laboratories engaged in types of research destined to be dominated by the efforts of a genomic consortium may be negatively impacted if they cannot rapidly adjust strategies in the face of new large-scale competition. The focus of this report is to outline a series of strategies that have been productively utilized by a number of small academic laboratories that have attempted to integrate such genomic resources into research plans with the goal of developing novel physiological insights.

  11. Physical mapping resources for large plant genomes: radiation hybrids for wheat D-genome progenitor Aegilops tauschii

    PubMed Central

    2012-01-01

    Background Development of a high quality reference sequence is a daunting task in crops like wheat with large (~17Gb), highly repetitive (>80%) and polyploid genome. To achieve complete sequence assembly of such genomes, development of a high quality physical map is a necessary first step. However, due to the lack of recombination in certain regions of the chromosomes, genetic mapping, which uses recombination frequency to map marker loci, alone is not sufficient to develop high quality marker scaffolds for a sequence ready physical map. Radiation hybrid (RH) mapping, which uses radiation induced chromosomal breaks, has proven to be a successful approach for developing marker scaffolds for sequence assembly in animal systems. Here, the development and characterization of a RH panel for the mapping of D-genome of wheat progenitor Aegilops tauschii is reported. Results Radiation dosages of 350 and 450 Gy were optimized for seed irradiation of a synthetic hexaploid (AABBDD) wheat with the D-genome of Ae. tauschii accession AL8/78. The surviving plants after irradiation were crossed to durum wheat (AABB), to produce pentaploid RH1s (AABBD), which allows the simultaneous mapping of the whole D-genome. A panel of 1,510 RH1 plants was obtained, of which 592 plants were generated from the mature RH1 seeds, and 918 plants were rescued through embryo culture due to poor germination (<3%) of mature RH1 seeds. This panel showed a homogenous marker loss (2.1%) after screening with SSR markers uniformly covering all the D-genome chromosomes. Different marker systems mostly detected different lines with deletions. Using markers covering known distances, the mapping resolution of this RH panel was estimated to be <140kb. Analysis of only 16 RH lines carrying deletions on chromosome 2D resulted in a physical map with cM/cR ratio of 1:5.2 and 15 distinct bins. Additionally, with this small set of lines, almost all the tested ESTs could be mapped. A set of 399 most informative RH

  12. Physical mapping resources for large plant genomes: radiation hybrids for wheat D-genome progenitor Aegilops tauschii.

    PubMed

    Kumar, Ajay; Simons, Kristin; Iqbal, Muhammad J; de Jiménez, Monika Michalak; Bassi, Filippo M; Ghavami, Farhad; Al-Azzam, Omar; Drader, Thomas; Wang, Yi; Luo, Ming-Cheng; Gu, Yong Q; Denton, Anne; Lazo, Gerard R; Xu, Steven S; Dvorak, Jan; Kianian, Penny M A; Kianian, Shahryar F

    2012-11-05

    Development of a high quality reference sequence is a daunting task in crops like wheat with large (~17Gb), highly repetitive (>80%) and polyploid genome. To achieve complete sequence assembly of such genomes, development of a high quality physical map is a necessary first step. However, due to the lack of recombination in certain regions of the chromosomes, genetic mapping, which uses recombination frequency to map marker loci, alone is not sufficient to develop high quality marker scaffolds for a sequence ready physical map. Radiation hybrid (RH) mapping, which uses radiation induced chromosomal breaks, has proven to be a successful approach for developing marker scaffolds for sequence assembly in animal systems. Here, the development and characterization of a RH panel for the mapping of D-genome of wheat progenitor Aegilops tauschii is reported. Radiation dosages of 350 and 450 Gy were optimized for seed irradiation of a synthetic hexaploid (AABBDD) wheat with the D-genome of Ae. tauschii accession AL8/78. The surviving plants after irradiation were crossed to durum wheat (AABB), to produce pentaploid RH1s (AABBD), which allows the simultaneous mapping of the whole D-genome. A panel of 1,510 RH1 plants was obtained, of which 592 plants were generated from the mature RH1 seeds, and 918 plants were rescued through embryo culture due to poor germination (<3%) of mature RH1 seeds. This panel showed a homogenous marker loss (2.1%) after screening with SSR markers uniformly covering all the D-genome chromosomes. Different marker systems mostly detected different lines with deletions. Using markers covering known distances, the mapping resolution of this RH panel was estimated to be <140kb. Analysis of only 16 RH lines carrying deletions on chromosome 2D resulted in a physical map with cM/cR ratio of 1:5.2 and 15 distinct bins. Additionally, with this small set of lines, almost all the tested ESTs could be mapped. A set of 399 most informative RH lines with an average

  13. Identification and analysis of genomic regions with large between-population differentiation in humans.

    PubMed

    Myles, S; Tang, K; Somel, M; Green, R E; Kelso, J; Stoneking, M

    2008-01-01

    The primary aim of genetic association and linkage studies is to identify genetic variants that contribute to phenotypic variation within human populations. Since the overwhelming majority of human genetic variation is found within populations, these methods are expected to be effective and can likely be extrapolated from one human population to another. However, they may lack power in detecting the genetic variants that contribute to phenotypes that differ greatly between human populations. Phenotypes that show large differences between populations are expected to be associated with genomic regions exhibiting large allele frequency differences between populations. Thus, from genome-wide polymorphism data genomic regions with large allele frequency differences between populations can be identified, and evaluated as candidates for large between-population phenotypic differences. Here we use allele frequency data from approximately 1.5 million SNPs from three human populations, and present an algorithm that identifies genomic regions containing SNPs with extreme Fst. We demonstrate that our candidate regions have reduced heterozygosity in Europeans and Chinese relative to African-Americans, and are likely enriched with genes that have experienced positive natural selection. We identify genes that are likely responsible for phenotypes known to differ dramatically between human populations and present several candidates worthy of future investigation. Our list of high Fst genomic regions is a first step in identifying the genetic variants that contribute to large phenotypic differences between populations, many of which have likely experienced positive natural selection. Our approach based on between population differences can compliment traditional within population linkage and association studies to uncover novel genotype-phenotype relationships.

  14. Single-crossover recombination and ancestral recombination trees.

    PubMed

    Baake, Ellen; von Wangenheim, Ute

    2014-05-01

    We consider the Wright-Fisher model for a population of [Formula: see text] individuals, each identified with a sequence of a finite number of sites, and single-crossover recombination between them. We trace back the ancestry of single individuals from the present population. In the [Formula: see text] limit without rescaling of parameters or time, this ancestral process is described by a random tree, whose branching events correspond to the splitting of the sequence due to recombination. With the help of a decomposition of the trees into subtrees, we calculate the probabilities of the topologies of the ancestral trees. At the same time, these probabilities lead to a semi-explicit solution of the deterministic single-crossover equation. The latter is a discrete-time dynamical system that emerges from the Wright-Fisher model via a law of large numbers and has been waiting for a solution for many decades.

  15. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

    PubMed Central

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952

  16. FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-scale Imaging Genetic Data 1

    PubMed Central

    Huang, Meiyan; Nichols, Thomas; Huang, Chao; Yang, Yu; Lu, Zhaohua; Feng, Qianjing; Knickmeyer, Rebecca C; Zhu, Hongtu

    2015-01-01

    More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC > 12 million known variants) associations with signals at millions of locations (NV ~ 106) in the brain from thousands of subjects (n ~ 103). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to e ciently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (G-SIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O(nNV NC) for voxelwise genome wide association analysis (VGWAS) method compared with O((NC + NV)n2) for FVGWAS. Simulation studies show that FVGWAS is an effcient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275 voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645 seconds for a single CPU. Our FVG-WAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. PMID:26025292

  17. Insertion sequence-caused large-scale rearrangements in the genome of Escherichia coli.

    PubMed

    Lee, Heewook; Doak, Thomas G; Popodi, Ellen; Foster, Patricia L; Tang, Haixu

    2016-09-06

    A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome shotgun sequencing data. Based on 857 identified events (758 IS insertions, 98 recombinations and 1 excision), we estimate that the rate of IS insertion is 3.5 × 10(-4) insertions per genome per generation and the rate of IS homologous recombination is 4.5 × 10(-5) recombinations per genome per generation. These events are mostly contributed by the IS elements IS1, IS2, IS5 and IS186 Spatial analysis of new insertions suggest that transposition is biased to proximal insertions, and the length spectrum of IS-caused deletions is largely explained by local hopping. For any of the ISs studied there is no region of the circular genome that is favored or disfavored for new insertions but there are notable hotspots for deletions. Some elements have preferences for non-coding sequence or for the beginning and end of coding regions, largely explained by target site motifs. Interestingly, transposition and deletion rates remain constant across the wild-type and 12 mutant E. coli lines, each deficient in a distinct DNA repair pathway. Finally, we characterized the target sites of four IS families, confirming previous results and characterizing a highly specific pattern at IS186 target-sites, 5'-GGGG(N6/N7)CCCC-3'. We also detected 48 long deletions not involving IS elements. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Insertion sequence-caused large-scale rearrangements in the genome of Escherichia coli

    PubMed Central

    Lee, Heewook; Doak, Thomas G.; Popodi, Ellen; Foster, Patricia L.; Tang, Haixu

    2016-01-01

    A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome shotgun sequencing data. Based on 857 identified events (758 IS insertions, 98 recombinations and 1 excision), we estimate that the rate of IS insertion is 3.5 × 10−4 insertions per genome per generation and the rate of IS homologous recombination is 4.5 × 10−5 recombinations per genome per generation. These events are mostly contributed by the IS elements IS1, IS2, IS5 and IS186. Spatial analysis of new insertions suggest that transposition is biased to proximal insertions, and the length spectrum of IS-caused deletions is largely explained by local hopping. For any of the ISs studied there is no region of the circular genome that is favored or disfavored for new insertions but there are notable hotspots for deletions. Some elements have preferences for non-coding sequence or for the beginning and end of coding regions, largely explained by target site motifs. Interestingly, transposition and deletion rates remain constant across the wild-type and 12 mutant E. coli lines, each deficient in a distinct DNA repair pathway. Finally, we characterized the target sites of four IS families, confirming previous results and characterizing a highly specific pattern at IS186 target-sites, 5′-GGGG(N6/N7)CCCC-3′. We also detected 48 long deletions not involving IS elements. PMID:27431326

  19. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    PubMed

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  20. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

    PubMed Central

    2013-01-01

    Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of

  1. Convergent mechanisms of genome evolution of large and giant DNA viruses.

    PubMed

    Filée, Jonathan; Chandler, Michael

    2008-06-01

    We have taken advantage of the availability of the genome sequences of a collection of large and giant viruses infecting bacteria (T4 family) and eukaryotes (NCLDV group) to assess some of the evolutionary forces which might have shaped their genomes. Despite having apparently different ancestors, these two groups of viruses are affected by convergent evolutionary forces. Both types of virus probably originated from a simple and ancient viral ancestor with a small subset of 30-35 genes encoding replication and structural proteins. The genome size and diversity of the descendants most likely grew progressively by (i) lineage-specific gene duplications, (ii) lateral gene transfers of cellular genes and (iii) accretion of diverse families of mobile genetic elements. These results argue against the hypothesis that giant viruses derive from a regressive cell.

  2. Final report. Human artificial episomal chromosome (HAEC) for building large genomic libraries

    SciTech Connect

    Jean-Michael H. Vos

    1999-12-09

    Collections of human DNA fragments are maintained for research purposes as clones in bacterial host cells. However for unknown reasons, some regions of the human genome appear to be unclonable or unstable in bacteria. Their team has developed a system using episomes (extrachromosomal, autonomously replication DNA) that maintains large DNA fragments in human cells. This human artificial episomal chromosomal (HAEC) system may prove useful for coverage of these especially difficult regions. In the broader biomedical community, the HAEC system also shows promise for use in functional genomics and gene therapy. Recent improvements to the HAEC system and its application to mapping, sequencing, and functionally studying human and mouse DNA are summarized. Mapping and sequencing the human genome and model organisms are only the first steps in determining the function of various genetic units critical for gene regulation, DNA replication, chromatin packaging, chromosomal stability, and chromatid segregation. Such studies will require the ability to transfer and manipulate entire functional units into mammalian cells.

  3. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    SciTech Connect

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  4. Biological Consequences of Ancient Gene Acquisition and Duplication in the Large Genome of Candidatus Solibacter usitatus Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Hauser, Loren John; Land, Miriam L; Xie, Gary; Kuske, Cheryl R

    2011-01-01

    Members of the bacterial phylum Acidobacteria are widespread in soils and sediments worldwide, and are abundant in many soils. Acidobacteria are challenging to culture in vitro, and many basic features of their biology and functional roles in the soil have not been determined. Candidatus Solibacter usitatus strain Ellin6076 has a 9.9 Mb genome that is approximately 2 5 times as large as the other sequenced Acidobacteria genomes. Bacterial genome sizes typically range from 0.5 to 10 Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Our comparative genome analyses indicate that the Ellin6076 large genome has arisen by horizontal gene transfer via ancient bacteriophage and/or plasmid-mediated transduction, and widespread small-scale gene duplications, resulting in an increased number of paralogs. Low amino acid sequence identities among functional group members, and lack of conserved gene order and orientation in regions containing similar groups of paralogs, suggest that most of the paralogs are not the result of recent duplication events. The genome sizes of additional cultured Acidobacteria strains were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 3 had larger genomes than those of subdivision 1, but none were as large as the Ellin6076 genome. The large genome of Ellin6076 may not be typical of the phylum, and encodes traits that could provide a selective metabolic, defensive and regulatory advantage in the soil environment.

  5. Large-scale analysis of the yeast genome by transposon tagging and gene disruption.

    PubMed

    Ross-Macdonald, P; Coelho, P S; Roemer, T; Agarwal, S; Kumar, A; Jansen, R; Cheung, K H; Sheehan, A; Symoniatis, D; Umansky, L; Heidtman, M; Nelson, F K; Iwasaki, H; Hager, K; Gerstein, M; Miller, P; Roeder, G S; Snyder, M

    1999-11-25

    Economical methods by which gene function may be analysed on a genomic scale are relatively scarce. To fill this need, we have developed a transposon-tagging strategy for the genome-wide analysis of disruption phenotypes, gene expression and protein localization, and have applied this method to the large-scale analysis of gene function in the budding yeast Saccharomyces cerevisiae. Here we present the largest collection of defined yeast mutants ever generated within a single genetic background--a collection of over 11,000 strains, each carrying a transposon inserted within a region of the genome expressed during vegetative growth and/or sporulation. These insertions affect nearly 2,000 annotated genes, representing about one-third of the 6,200 predicted genes in the yeast genome. We have used this collection to determine disruption phenotypes for nearly 8,000 strains using 20 different growth conditions; the resulting data sets were clustered to identify groups of functionally related genes. We have also identified over 300 previously non-annotated open reading frames and analysed by indirect immunofluorescence over 1,300 transposon-tagged proteins. In total, our study encompasses over 260,000 data points, constituting the largest functional analysis of the yeast genome ever undertaken.

  6. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Shih, Patrick [Kerfeld Lab, UC Berkeley and JGI

    2016-07-12

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  7. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    SciTech Connect

    Shih, Patrick

    2012-03-22

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  8. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

    PubMed Central

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852

  9. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

    PubMed

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  10. A rare example of germ-line chromothripsis resulting in large genomic imbalance.

    PubMed

    Anderson, Sarah E; Kamath, Arveen; Pilz, Daniela T; Morgan, Sian M

    2016-04-01

    Chromothripsis is a recently described 'chromosome catastrophe' phenomenon in which multiple genomic rearrangements are generated in a single catastrophic event. Chromothripsis has most frequently been associated with cancer, but there have also been rare reports of chromothripsis in patients with developmental disorders and congenital anomalies. In contrast to the massive DNA loss that often accompanies chromothripsis in cancer, only minimal DNA loss has been reported in the majority of cases of chromothripsis that have occurred in the germ line. Presumably, this is because in most instances, large genomic losses would be lethal in utero. We report on a female patient with developmental delay and dysmorphism. G-banded chromosome analysis detected a subtle, interstitial deletion of chromosome 13 and a complex rearrangement of one X chromosome. Subsequent array comparative genomic hybridisation studies indicated nine deletions on the X chromosome ranging from 327 kb to 8 Mb in size. A 4.4 Mb deletion on chromosome 13 was also confirmed, compatible with the patient's clinical phenotype. We propose that this is a rare example of constitutional chromothripsis in association with relatively large genomic imbalances and that these have been tolerated in this case as they have occurred in a female on the X chromosome, which has undergone preferential X inactivation.

  11. Large genomic deletions inactivate the BRCA2 gene in breast cancer families

    PubMed Central

    Agata, S; Dalla, P; Callegaro, M; Scaini, M; Menin, C; Ghiotto, C; Nicoletto, O; Zavagno, G; Chieco-Bianchi, L; D'Andrea, E; Montagna, M

    2005-01-01

    Background: BRCA1 and BRCA2 are the two major genes responsible for the breast and ovarian cancers that cluster in families with a genetically determined predisposition. However, regardless of the mutation detection method employed, the percentage of families without identifiable alterations of these genes exceeds 50%, even when applying stringent criteria for family selection. A small but significant increase in mutation detection rate has resulted from the discovery of large genomic alterations in BRCA1. A few studies have addressed the question of whether BRCA2 might be inactivated by the same kinds of alteration, but most were either done on a relatively small number of samples or employed cumbersome mutation detection methods of variable sensitivity. Objective: To analyse 121 highly selected families using the recently available BRCA2 multiplex ligation dependent probe amplification (MLPA) technique. Results: Three different large genomic deletions were identified and confirmed by analysis of the mutant transcript and genomic characterisation of the breakpoints. Conclusions: Contrary to initial suggestions, the presence of BRCA2 genomic rearrangements is worth investigating in high risk breast or ovarian cancer families. PMID:16199546

  12. The Use of Weighted Graphs for Large-Scale Genome Analysis

    PubMed Central

    Zhou, Fang; Toivonen, Hannu; King, Ross D.

    2014-01-01

    There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061

  13. Fusion of Large-Scale Genomic Knowledge and Frequency Data Computationally Prioritizes Variants in Epilepsy

    PubMed Central

    Campbell, Ian M.; Rao, Mitchell; Arredondo, Sean D.; Lalani, Seema R.; Xia, Zhilian; Kang, Sung-Hae L.; Bi, Weimin; Breman, Amy M.; Smith, Janice L.; Bacino, Carlos A.; Beaudet, Arthur L.; Patel, Ankita; Cheung, Sau Wai; Lupski, James R.; Stankiewicz, Paweł; Ramocki, Melissa B.; Shaw, Chad A.

    2013-01-01

    Curation and interpretation of copy number variants identified by genome-wide testing is challenged by the large number of events harbored in each personal genome. Conventional determination of phenotypic relevance relies on patterns of higher frequency in affected individuals versus controls; however, an increasing amount of ascertained variation is rare or private to clans. Consequently, frequency data have less utility to resolve pathogenic from benign. One solution is disease-specific algorithms that leverage gene knowledge together with variant frequency to aid prioritization. We used large-scale resources including Gene Ontology, protein-protein interactions and other annotation systems together with a broad set of 83 genes with known associations to epilepsy to construct a pathogenicity score for the phenotype. We evaluated the score for all annotated human genes and applied Bayesian methods to combine the derived pathogenicity score with frequency information from our diagnostic laboratory. Analysis determined Bayes factors and posterior distributions for each gene. We applied our method to subjects with abnormal chromosomal microarray results and confirmed epilepsy diagnoses gathered by electronic medical record review. Genes deleted in our subjects with epilepsy had significantly higher pathogenicity scores and Bayes factors compared to subjects referred for non-neurologic indications. We also applied our scores to identify a recently validated epilepsy gene in a complex genomic region and to reveal candidate genes for epilepsy. We propose a potential use in clinical decision support for our results in the context of genome-wide screening. Our approach demonstrates the utility of integrative data in medical genomics. PMID:24086149

  14. Whole-genome mapping reveals a large chromosomal inversion on Iberian Brucella suis biovar 2 strains.

    PubMed

    Ferreira, Ana Cristina; Dias, Ricardo; de Sá, Maria Inácia Corrêa; Tenreiro, Rogério

    2016-08-30

    Optical mapping is a technology able to quickly generate high resolution ordered whole-genome restriction maps of bacteria, being a proven approach to search for diversity among bacterial isolates. In this work, optical whole-genome maps were used to compare closely-related Brucella suis biovar 2 strains. This biovar is the unique isolated in domestic pigs and wild boars in Portugal and Spain and most of the strains share specific molecular characteristics establishing an Iberian clonal lineage that can be differentiated from another lineage mainly isolated in several Central European countries. We performed the BamHI whole-genome optical maps of five B. suis biovar 2 field strains, isolated from wild boars in Portugal and Spain (three from the Iberian lineage and two from the Central European one) as well as of the reference strain B. suis biovar 2 ATCC 23445 (Central European lineage, Denmark). Each strain showed a distinct, highly individual configuration of 228-231 BamHI fragments. Nevertheless, a low divergence was globally observed in chromosome II (1.6%) relatively to chromosome I (2.4%). Optical mapping also disclosed genomic events associated with B. suis strains in chromosome I, namely one indel (3.5kb) and one large inversion (944kb). By using targeted-PCR in a set of 176 B. suis strains, including all biovars and haplotypes, the indel was found to be specific of the reference strain ATCC 23445 and the large inversion was shown to be an exclusive genomic marker of the Iberian clonal lineage of biovar 2. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Patterns and mechanisms of ancestral histone protein inheritance in budding yeast.

    PubMed

    Radman-Livaja, Marta; Verzijlbergen, Kitty F; Weiner, Assaf; van Welsem, Tibor; Friedman, Nir; Rando, Oliver J; van Leeuwen, Fred

    2011-06-01

    Replicating chromatin involves disruption of histone-DNA contacts and subsequent reassembly of maternal histones on the new daughter genomes. In bulk, maternal histones are randomly segregated to the two daughters, but little is known about the fine details of this process: do maternal histones re-assemble at preferred locations or close to their original loci? Here, we use a recently developed method for swapping epitope tags to measure the disposition of ancestral histone H3 across the yeast genome over six generations. We find that ancestral H3 is preferentially retained at the 5' ends of most genes, with strongest retention at long, poorly transcribed genes. We recapitulate these observations with a quantitative model in which the majority of maternal histones are reincorporated within 400 bp of their pre-replication locus during replication, with replication-independent replacement and transcription-related retrograde nucleosome movement shaping the resulting distributions of ancestral histones. We find a key role for Topoisomerase I in retrograde histone movement during transcription, and we find that loss of Chromatin Assembly Factor-1 affects replication-independent turnover. Together, these results show that specific loci are enriched for histone proteins first synthesized several generations beforehand, and that maternal histones re-associate close to their original locations on daughter genomes after replication. Our findings further suggest that accumulation of ancestral histones could play a role in shaping histone modification patterns.

  16. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences.

    PubMed

    Lam, Su Datt; Das, Sayoni; Sillitoe, Ian; Orengo, Christine

    2017-08-01

    Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.

  17. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences

    PubMed Central

    Lam, Su Datt; Das, Sayoni; Sillitoe, Ian; Orengo, Christine

    2017-01-01

    Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited. PMID:28777078

  18. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism.

    PubMed

    Warren, René L; Keeling, Christopher I; Yuen, Macaire Man Saint; Raymond, Anthony; Taylor, Greg A; Vandervalk, Benjamin P; Mohamadi, Hamid; Paulino, Daniel; Chiu, Readman; Jackman, Shaun D; Robertson, Gordon; Yang, Chen; Boyle, Brian; Hoffmann, Margarete; Weigel, Detlef; Nelson, David R; Ritland, Carol; Isabel, Nathalie; Jaquish, Barry; Yanchuk, Alvin; Bousquet, Jean; Jones, Steven J M; MacKay, John; Birol, Inanc; Bohlmann, Joerg

    2015-07-01

    White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.

  19. DNA from Dust: Comparative Genomics of Large DNA Viruses in Field Surveillance Samples

    PubMed Central

    Pandey, Utsav; Bell, Andrew S.; Renner, Daniel W.; Kennedy, David A.; Shreve, Jacob T.; Cairns, Chris L.; Jones, Matthew J.; Dunn, Patricia A.; Read, Andrew F.

    2016-01-01

    ABSTRACT The intensification of the poultry industry over the last 60 years facilitated the evolution of increased virulence and vaccine breaks in Marek’s disease virus (MDV-1). Full-genome sequences are essential for understanding why and how this evolution occurred, but what is known about genome-wide variation in MDV comes from laboratory culture. To rectify this, we developed methods for obtaining high-quality genome sequences directly from field samples without the need for sequence-based enrichment strategies prior to sequencing. We applied this to the first characterization of MDV-1 genomes from the field, without prior culture. These viruses were collected from vaccinated hosts that acquired naturally circulating field strains of MDV-1, in the absence of a disease outbreak. This reflects the current issue afflicting the poultry industry, where virulent field strains continue to circulate despite vaccination and can remain undetected due to the lack of overt disease symptoms. We found that viral genomes from adjacent field sites had high levels of overall DNA identity, and despite strong evidence of purifying selection, had coding variations in proteins associated with virulence and manipulation of host immunity. Our methods empower ecological field surveillance, make it possible to determine the basis of viral virulence and vaccine breaks, and can be used to obtain full genomes from clinical samples of other large DNA viruses, known and unknown. IMPORTANCE Despite both clinical and laboratory data that show increased virulence in field isolates of MDV-1 over the last half century, we do not yet understand the genetic basis of its pathogenicity. Our knowledge of genome-wide variation between strains of this virus comes exclusively from isolates that have been cultured in the laboratory. MDV-1 isolates tend to lose virulence during repeated cycles of replication in the laboratory, raising concerns about the ability of cultured isolates to accurately

  20. DNA from Dust: Comparative Genomics of Large DNA Viruses in Field Surveillance Samples.

    PubMed

    Pandey, Utsav; Bell, Andrew S; Renner, Daniel W; Kennedy, David A; Shreve, Jacob T; Cairns, Chris L; Jones, Matthew J; Dunn, Patricia A; Read, Andrew F; Szpara, Moriah L

    2016-01-01

    The intensification of the poultry industry over the last 60 years facilitated the evolution of increased virulence and vaccine breaks in Marek's disease virus (MDV-1). Full-genome sequences are essential for understanding why and how this evolution occurred, but what is known about genome-wide variation in MDV comes from laboratory culture. To rectify this, we developed methods for obtaining high-quality genome sequences directly from field samples without the need for sequence-based enrichment strategies prior to sequencing. We applied this to the first characterization of MDV-1 genomes from the field, without prior culture. These viruses were collected from vaccinated hosts that acquired naturally circulating field strains of MDV-1, in the absence of a disease outbreak. This reflects the current issue afflicting the poultry industry, where virulent field strains continue to circulate despite vaccination and can remain undetected due to the lack of overt disease symptoms. We found that viral genomes from adjacent field sites had high levels of overall DNA identity, and despite strong evidence of purifying selection, had coding variations in proteins associated with virulence and manipulation of host immunity. Our methods empower ecological field surveillance, make it possible to determine the basis of viral virulence and vaccine breaks, and can be used to obtain full genomes from clinical samples of other large DNA viruses, known and unknown. IMPORTANCE Despite both clinical and laboratory data that show increased virulence in field isolates of MDV-1 over the last half century, we do not yet understand the genetic basis of its pathogenicity. Our knowledge of genome-wide variation between strains of this virus comes exclusively from isolates that have been cultured in the laboratory. MDV-1 isolates tend to lose virulence during repeated cycles of replication in the laboratory, raising concerns about the ability of cultured isolates to accurately reflect virus in

  1. A Glimpse of Nucleo-Cytoplasmic Large DNA Virus Biodiversity through the Eukaryotic Genomics Window

    PubMed Central

    Gallot-Lavallée, Lucie; Blanc, Guillaume

    2017-01-01

    The nucleocytoplasmic large DNA viruses (NCLDV) are a group of extremely complex double-stranded DNA viruses, which are major parasites of a variety of eukaryotes. Recent studies showed that certain eukaryotes contain fragments of NCLDV DNA integrated in their genome, when surprisingly many of these organisms were not previously shown to be infected by NCLDVs. We performed an update survey of NCLDV genes hidden in eukaryotic sequences to measure the incidence of this phenomenon in common public sequence databases. A total of 66 eukaryotic genomic or transcriptomic datasets—many of which are from algae and aquatic protists—contained at least one of the five most consistently conserved NCLDV core genes. Phylogenetic study of the eukaryotic NCLDV-like sequences identified putative new members of already recognized viral families, as well as members of as yet unknown viral clades. Genomic evidence suggested that most of these sequences resulted from viral DNA integrations rather than contaminating viruses. Furthermore, the nature of the inserted viral genes helped predicting original functional capacities of the donor viruses. These insights confirm that genomic insertions of NCLDV DNA are common in eukaryotes and can be exploited to delineate the contours of NCLDV biodiversity. PMID:28117696

  2. Comparative Genomics of Amphibian-like Ranaviruses, Nucleocytoplasmic Large DNA Viruses of Poikilotherms

    PubMed Central

    Price, Stephen J.

    2015-01-01

    Recent research on genome evolution of large DNA viruses has highlighted a number of incredibly dynamic processes that can facilitate rapid adaptation. The genomes of amphibian-like ranaviruses – double-stranded DNA viruses infecting amphibians, reptiles, and fish (family Iridoviridae) – were examined to assess variation in genome content and evolutionary processes. The viruses studied were closely related, but their genome content varied considerably, with 29 genes identified that were not present in all of the major clades. Twenty-one genes had evidence of recombination, while a virus isolated from a captive reptile appeared to be a mosaic of two divergent parents. Positive selection was also found to be acting on more than a quarter of Ranavirus genes and was found most frequently in the Spanish common midwife toad virus, which has had a severe impact on amphibian host communities. Efforts to resolve the root of this group by inclusion of an outgroup were inconclusive, but a set of core genes were identified, which recovered a well-supported species tree. PMID:27812275

  3. A Glimpse of Nucleo-Cytoplasmic Large DNA Virus Biodiversity through the Eukaryotic Genomics Window.

    PubMed

    Gallot-Lavallée, Lucie; Blanc, Guillaume

    2017-01-20

    The nucleocytoplasmic large DNA viruses (NCLDV) are a group of extremely complex double-stranded DNA viruses, which are major parasites of a variety of eukaryotes. Recent studies showed that certain eukaryotes contain fragments of NCLDV DNA integrated in their genome, when surprisingly many of these organisms were not previously shown to be infected by NCLDVs. We performed an update survey of NCLDV genes hidden in eukaryotic sequences to measure the incidence of this phenomenon in common public sequence databases. A total of 66 eukaryotic genomic or transcriptomic datasets-many of which are from algae and aquatic protists-contained at least one of the five most consistently conserved NCLDV core genes. Phylogenetic study of the eukaryotic NCLDV-like sequences identified putative new members of already recognized viral families, as well as members of as yet unknown viral clades. Genomic evidence suggested that most of these sequences resulted from viral DNA integrations rather than contaminating viruses. Furthermore, the nature of the inserted viral genes helped predicting original functional capacities of the donor viruses. These insights confirm that genomic insertions of NCLDV DNA are common in eukaryotes and can be exploited to delineate the contours of NCLDV biodiversity.

  4. Ancestral origins of the prion protein gene D178N mutation in the Basque Country.

    PubMed

    Rodríguez-Martínez, Ana B; Barreau, Christian; Coupry, Isabelle; Yagüe, Jordi; Sánchez-Valle, Raquel; Galdós-Alcelay, Luis; Ibáñez, Agustín; Digón, Antón; Fernández-Manchola, Ignacio; Goizet, Cyril; Castro, Azucena; Cuevas, Nerea; Alvarez-Alvarez, Maite; de Pancorbo, Marian M; Arveiler, Benoît; Zarranz, Juan J

    2005-06-01

    Fatal familial insomnia (FFI) and familial Creutzfeldt-Jakob disease (fCJD) are familial prion diseases with autosomal dominant inheritance of the D178N mutation. FFI has been reported in at least 27 pedigrees around the world. Twelve apparently unrelated FFI and fCJD pedigrees with the characteristic D178N mutation have been reported in the Prion Diseases Registry of the Basque Country since 1993. The high incidence of familial prion diseases in this region may reflect a unique ancestral origin of the chromosome carrying this mutation. In order to investigate this putative founder effect, we developed "happy typing", a new approach to the happy mapping method, which consists of the physical isolation of large haploid genomic DNA fragments and their analysis by the Polymerase Chain Reaction in order to perform haplotypic analysis instead of pedigree analysis. Six novel microsatellite markers, located in a 150-kb genomic segment flanking the PRNP gene were characterized for typing haploid DNA fragments of 285 kb in size. A common haplotype was found in patients from the Basque region, strongly suggesting a founder effect. We propose that "happy typing" constitutes an efficient method for determining disease-associated haplotypes, since the analysis of a single affected individual per pedigree should provide sufficient evidence.

  5. Molecular analysis of small grain cereal genomes: Current status and prospects

    SciTech Connect

    Moore, G.; Gale, M.D. ); Flavell, R.B. ); Kurata, N. )

    1993-05-01

    Recent developments in cereal genome analysis include generation of RFLP maps, flow sorting of chromosomes, identification of landmareks for genes and a more advanced model for cereal genome organization. These developments ar reviewed together with new prospects for the isolation of defined genes from large cereal genomes and for the production of a composite map of the ancestral grass genome to aid in the genetic analysis of all the Gramineae. The advances that can now come from comparative genome mapping are likely to promote further the new era of plant genetics. 65 refs., 2 figs.

  6. Efficient and rapid generation of large genomic variants in rats and mice using CRISMERE

    PubMed Central

    Birling, Marie-Christine; Schaeffer, Laurence; André, Philippe; Lindner, Loic; Maréchal, Damien; Ayadi, Abdel; Sorg, Tania; Pavlovic, Guillaume; Hérault, Yann

    2017-01-01

    Modelling Down syndrome (DS) in mouse has been crucial for the understanding of the disease and the evaluation of therapeutic targets. Nevertheless, the modelling so far has been limited to the mouse and, even in this model, generating duplication of genomic regions has been labour intensive and time consuming. We developed the CRISpr MEdiated REarrangement (CRISMERE) strategy, which takes advantage of the CRISPR/Cas9 system, to generate most of the desired rearrangements from a single experiment at much lower expenses and in less than 9 months. Deletions, duplications, and inversions of genomic regions as large as 24.4 Mb in rat and mouse founders were observed and germ line transmission was confirmed for fragment as large as 3.6 Mb. Interestingly we have been able to recover duplicated regions from founders in which we only detected deletions. CRISMERE is even more powerful than anticipated it allows the scientific community to manipulate the rodent and probably other genomes in a fast and efficient manner which was not possible before. PMID:28266534

  7. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  8. Distilling Artificial Recombinants from Large Sets of Complete mtDNA Genomes

    PubMed Central

    Kong, Qing-Peng; Salas, Antonio; Sun, Chang; Fuku, Noriyuki; Tanaka, Masashi; Zhong, Li; Wang, Cheng-Ye; Yao, Yong-Gang; Bandelt, Hans-Jürgen

    2008-01-01

    Background Large-scale genome sequencing poses enormous problems to the logistics of laboratory work and data handling. When numerous fragments of different genomes are PCR amplified and sequenced in a laboratory, there is a high immanent risk of sample confusion. For genetic markers, such as mitochondrial DNA (mtDNA), which are free of natural recombination, single instances of sample mix-up involving different branches of the mtDNA phylogeny would give rise to reticulate patterns and should therefore be detectable. Methodology/Principal Findings We have developed a strategy for comparing new complete mtDNA genomes, one by one, to a current skeleton of the worldwide mtDNA phylogeny. The mutations distinguishing the reference sequence from a putative recombinant sequence can then be allocated to two or more different branches of this phylogenetic skeleton. Thus, one would search for two (or three) near-matches in the total mtDNA database that together best explain the variation seen in the recombinants. The evolutionary pathway from the mtDNA tree connecting this pair together with the recombinant then generate a grid-like median network, from which one can read off the exchanged segments. Conclusions We have applied this procedure to a large collection of complete human mtDNA sequences, where several recombinants could be distilled by our method. All these recombinant sequences were subsequently corrected by de novo experiments – fully concordant with the predictions from our data-analytical approach. PMID:18714389

  9. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

    PubMed

    Gao, Song; Bertrand, Denis; Chia, Burton K H; Nagarajan, Niranjan

    2016-05-11

    The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies.

  10. The New Red Algal Subphylum Proteorhodophytina Comprises the Largest and Most Divergent Plastid Genomes Known.

    PubMed

    Muñoz-Gómez, Sergio A; Mejía-Franco, Fabián G; Durnin, Keira; Colp, Morgan; Grisdale, Cameron J; Archibald, John M; Slamovits, Claudio H

    2017-06-05

    Red algal plastid genomes are often considered ancestral and evolutionarily stable, and thus more closely resembling the last common ancestral plastid genome of all photosynthetic eukaryotes [1, 2]. However, sampling of red algal diversity is still quite limited (e.g., [2-5]). We aimed to remedy this problem. To this end, we sequenced six new plastid genomes from four undersampled and phylogenetically disparate red algal classes (Porphyridiophyceae, Stylonematophyceae, Compsopogonophyceae, and Rhodellophyceae) and discovered an unprecedented degree of genomic diversity among them. These genomes are rich in introns, enlarged intergenic regions, and transposable elements (in the rhodellophycean Bulboplastis apyrenoidosa), and include the largest and most intron-rich plastid genomes ever sequenced (that of the rhodellophycean Corynoplastis japonica; 1.13 Mbp). Sophisticated phylogenetic analyses accounting for compositional heterogeneity show that these four "basal" red algal classes form a larger monophyletic group, Proteorhodophytina subphylum nov., and confidently resolve the large-scale relationships in the Rhodophyta. Our analyses also suggest that secondary red plastids originated before the diversification of all mesophilic red algae. Our genomic survey has challenged the current paradigmatic view of red algal plastid genomes as "living fossils" [1, 2, 6] by revealing an astonishing degree of divergence in size, organization, and non-coding DNA content. A closer look at red algae shows that they comprise the most ancestral (e.g., [2, 7, 8]) as well as some of the most divergent plastid genomes known. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Genome-scale phylogenetic function annotation of large and diverse protein families.

    PubMed

    Engelhardt, Barbara E; Jordan, Michael I; Srouji, John R; Brenner, Steven E

    2011-11-01

    The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu.

  12. Genome-scale phylogenetic function annotation of large and diverse protein families

    PubMed Central

    Engelhardt, Barbara E.; Jordan, Michael I.; Srouji, John R.; Brenner, Steven E.

    2011-01-01

    The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu. PMID:21784873

  13. Ultra Large Gene Families: A Matter of Adaptation or Genomic Parasites?

    PubMed Central

    Schiffer, Philipp H.; Gravemeyer, Jan; Rauscher, Martina; Wiehe, Thomas

    2016-01-01

    Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly) deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR)-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term “run-away evolution”. This process might ultimately lead to the failure of genomic integrity and drive species to extinction. PMID:27509525

  14. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.

    PubMed

    Ghedin, Elodie; Sengamalay, Naomi A; Shumway, Martin; Zaborsky, Jennifer; Feldblyum, Tamara; Subbu, Vik; Spiro, David J; Sitz, Jeff; Koo, Hean; Bolotov, Pavel; Dernovoy, Dmitry; Tatusova, Tatiana; Bao, Yiming; St George, Kirsten; Taylor, Jill; Lipman, David J; Fraser, Claire M; Taubenberger, Jeffery K; Salzberg, Steven L

    2005-10-20

    Influenza viruses are remarkably adept at surviving in the human population over a long timescale. The human influenza A virus continues to thrive even among populations with widespread access to vaccines, and continues to be a major cause of morbidity and mortality. The virus mutates from year to year, making the existing vaccines ineffective on a regular basis, and requiring that new strains be chosen for a new vaccine. Less-frequent major changes, known as antigenic shift, create new strains against which the human population has little protective immunity, thereby causing worldwide pandemics. The most recent pandemics include the 1918 'Spanish' flu, one of the most deadly outbreaks in recorded history, which killed 30-50 million people worldwide, the 1957 'Asian' flu, and the 1968 'Hong Kong' flu. Motivated by the need for a better understanding of influenza evolution, we have developed flexible protocols that make it possible to apply large-scale sequencing techniques to the highly variable influenza genome. Here we report the results of sequencing 209 complete genomes of the human influenza A virus, encompassing a total of 2,821,103 nucleotides. In addition to increasing markedly the number of publicly available, complete influenza virus genomes, we have discovered several anomalies in these first 209 genomes that demonstrate the dynamic nature of influenza transmission and evolution. This new, large-scale sequencing effort promises to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations. All data from this project are being deposited, without delay, in public archives.

  15. Biological consequences of ancient gene acquisition and duplication in the large genome soil bacterium, ""solibacter usitatus"" strain Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Xie, Gary; Kuske, Cheryl R; Hauser, Loren; Land, Miriam

    2009-01-01

    Bacterial genome sizes range from ca. 0.5 to 10Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Sequenced genomes of strains in the phylum Acidobacteria revealed that 'Solibacter usistatus' strain Ellin6076 harbors a 9.9 Mb genome. This large genome appears to have arisen by horizontal gene transfer via ancient bacteriophage and plasmid-mediated transduction, as well as widespread small-scale gene duplications. This has resulted in an increased number of paralogs that are potentially ecologically important (ecoparalogs). Low amino acid sequence identities among functional group members and lack of conserved gene order and orientation in the regions containing similar groups of paralogs suggest that most of the paralogs were not the result of recent duplication events. The genome sizes of cultured subdivision 1 and 3 strains in the phylum Acidobacteria were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 1 were estimated to have smaller genome sizes ranging from ca. 2.0 to 4.8 Mb, whereas members of subdivision 3 had slightly larger genomes, from ca. 5.8 to 9.9 Mb. It is hypothesized that the large genome of strain Ellin6076 encodes traits that provide a selective metabolic, defensive and regulatory advantage in the variable soil environment.

  16. The ancestral gene repertoire of animal stem cells.

    PubMed

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-12-22

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the "germ-line multipotency program" and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells.

  17. The ancestral gene repertoire of animal stem cells

    PubMed Central

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-01-01

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the “germ-line multipotency program” and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  18. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing

    PubMed Central

    Keinath, Melissa C.; Timoshevskiy, Vladimir A.; Timoshevskaya, Nataliya Y.; Tsonis, Panagiotis A.; Voss, S. Randal; Smith, Jeramiah J.

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  19. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    PubMed

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  20. Reverse engineering and analysis of large genome-scale gene networks.

    PubMed

    Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas

    2013-01-07

    Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web.

  1. Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.

    PubMed

    Geeleher, Paul; Zhang, Zhenyu; Wang, Fan; Gruener, Robert F; Nath, Aritro; Morrison, Gladys; Bhutra, Steven; Grossman, Robert L; Huang, R Stephanie

    2017-08-28

    Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most cancer pharmacogenomics discovery is conducted in preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute drug response in very large clinical cancer genomics data sets, such as The Cancer Genome Atlas (TCGA). The approach works by creating statistical models relating gene expression to drug response in large panels of cancer cell lines and applying these models to tumor gene expression data in the clinical data sets (e.g., TCGA). This yields an imputed drug response for every drug in each patient. These imputed drug response data are then associated with somatic genetic variants measured in the clinical cohort, such as copy number changes or mutations in protein coding genes. These analyses recapitulated drug associations for known clinically actionable somatic genetic alterations and identified new predictive biomarkers for existing drugs. © 2017 Geeleher et al.; Published by Cold Spring Harbor Laboratory Press.

  2. Mutational and structural analysis of diffuse large B-cell lymphoma using whole-genome sequencing

    PubMed Central

    Morin, Ryan D.; Mungall, Karen; Pleasance, Erin; Mungall, Andrew J.; Goya, Rodrigo; Huff, Ryan D.; Scott, David W.; Ding, Jiarui; Roth, Andrew; Chiu, Readman; Corbett, Richard D.; Chan, Fong Chun; Mendez-Lago, Maria; Trinh, Diane L.; Bolger-Munro, Madison; Taylor, Greg; Hadj Khodabakhshi, Alireza; Ben-Neriah, Susana; Pon, Julia; Meissner, Barbara; Woolcock, Bruce; Farnoud, Noushin; Rogic, Sanja; Lim, Emilia L.; Johnson, Nathalie A.; Shah, Sohrab; Jones, Steven; Steidl, Christian; Holt, Robert; Birol, Inanc; Moore, Richard; Connors, Joseph M.; Gascoyne, Randy D.

    2013-01-01

    Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer composed of at least 2 molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease. Here we provide a whole-genome-sequencing-based perspective of DLBCL mutational complexity by characterizing 40 de novo DLBCL cases and 13 DLBCL cell lines and combining these data with DNA copy number analysis and RNA-seq from an extended cohort of 96 cases. Our analysis identified widespread genomic rearrangements including evidence for chromothripsis as well as the presence of known and novel fusion transcripts. We uncovered new gene targets of recurrent somatic point mutations and genes that are targeted by focal somatic deletions in this disease. We highlight the recurrence of germinal center B-cell-restricted mutations affecting genes that encode the S1P receptor and 2 small GTPases (GNA13 and GNAI2) that together converge on regulation of B-cell homing. We further analyzed our data to approximate the relative temporal order in which some recurrent mutations were acquired and demonstrate that ongoing acquisition of mutations and intratumoral clonal heterogeneity are common features of DLBCL. This study further improves our understanding of the processes and pathways involved in lymphomagenesis, and some of the pathways mutated here may indicate new avenues for therapeutic intervention. PMID:23699601

  3. Large genomic fibrillin-1 (FBN1) gene deletions provide evidence for true haploinsufficiency in Marfan syndrome.

    PubMed

    Mátyás, Gábor; Alonso, Sira; Patrignani, Andrea; Marti, Myriam; Arnold, Eliane; Magyar, István; Henggeler, Caroline; Carrel, Thierry; Steinmann, Beat; Berger, Wolfgang

    2007-08-01

    Mutations in the FBN1 gene are the major cause of Marfan syndrome (MFS), an autosomal dominant connective tissue disorder, which displays variable manifestations in the cardiovascular, ocular, and skeletal systems. Current molecular genetic testing of FBN1 may miss mutations in the promoter region or in other noncoding sequences as well as partial or complete gene deletions and duplications. In this study, we tested for copy number variations by successively applying multiplex ligation-dependent probe amplification (MLPA) and the Affymetrix Human Mapping 500 K Array Set, which contains probes for approximately 500,000 single-nucleotide polymorphisms (SNPs) across the genome. By analyzing genomic DNA of 101 unrelated individuals with MFS or related phenotypes in whom standard genetic testing detected no mutation, we identified FBN1 deletions in two patients with MFS. Our high-resolution approach narrowed down the deletion breakpoints. Subsequent sequencing of the junctional fragments revealed the deletion sizes of 26,887 and 302,580 bp, respectively. Surprisingly, both deletions affect the putative regulatory and promoter region of the FBN1 gene, strongly indicating that they abolish transcription of the deleted allele. This expectation of complete loss of function of one allele, i.e. true haploinsufficiency, was confirmed by transcript analyses. Our findings not only emphasize the importance of screening for large genomic rearrangements in comprehensive genetic testing of FBN1 but, importantly, also extend the molecular etiology of MFS by providing hitherto unreported evidence that true haploinsufficiency is sufficient to cause MFS.

  4. Sequence capture and next-generation sequencing of ultraconserved elements in a large-genome salamander.

    PubMed

    Newman, Catherine E; Austin, Christopher C

    2016-12-01

    Amidst the rapid advancement in next-generation sequencing (NGS) technology over the last few years, salamanders have been left behind. Salamanders have enormous genomes-up to 40 times the size of the human genome-and this poses challenges to generating NGS data sets of quality and quantity similar to those of other vertebrates. However, optimization of laboratory protocols is time-consuming and often cost prohibitive, and continued omission of salamanders from novel phylogeographic research is detrimental to species facing decline. Here, we use a salamander endemic to the southeastern United States, Plethodon serratus, to test the utility of an established protocol for sequence capture of ultraconserved elements (UCEs) in resolving intraspecific phylogeographic relationships and delimiting cryptic species. Without modifying the standard laboratory protocol, we generated a data set consisting of over 600 million reads for 85 P. serratus samples. Species delimitation analyses support recognition of seven species within P. serratus sensu lato, and all phylogenetic relationships among the seven species are fully resolved under a coalescent model. Results also corroborate previous data suggesting nonmonophyly of the Ouachita and Louisiana regions. Our results demonstrate that established UCE protocols can successfully be used in phylogeographic studies of salamander species, providing a powerful tool for future research on evolutionary history of amphibians and other organisms with large genomes.

  5. Mitochondrial introgression suggests extensive ancestral hybridization events among Saccharomyces species.

    PubMed

    Peris, David; Arias, Armando; Orlić, Sandi; Belloch, Carmela; Pérez-Través, Laura; Querol, Amparo; Barrio, Eladio

    2017-03-01

    Horizontal gene transfer (HGT) in eukaryotic plastids and mitochondrial genomes is common, and plays an important role in organism evolution. In yeasts, recent mitochondrial HGT has been suggested between S. cerevisiae and S. paradoxus. However, few strains have been explored given the lack of accurate mitochondrial genome annotations. Mitochondrial genome sequences are important to understand how frequent these introgressions occur, and their role in cytonuclear incompatibilities and fitness. Indeed, most of the Bateson-Dobzhansky-Muller genetic incompatibilities described in yeasts are driven by cytonuclear incompatibilities. We herein explored the mitochondrial inheritance of several worldwide distributed wild Saccharomyces species and their hybrids isolated from different sources and geographic origins. We demonstrated the existence of several recombination points in mitochondrial region COX2-ORF1, likely mediated by either the activity of the protein encoded by the ORF1 (F-SceIII) gene, a free-standing homing endonuclease, or mostly facilitated by A+T tandem repeats and regions of integration of GC clusters. These introgressions were shown to occur among strains of the same species and among strains of different species, which suggests a complex model of Saccharomyces evolution that involves several ancestral hybridization events in wild environments.

  6. Mosaic Uniparental Disomies and Aneuploidies as Large Structural Variants of the Human Genome

    PubMed Central

    Rodríguez-Santiago, Benjamín; Malats, Núria; Rothman, Nathaniel; Armengol, Lluís; Garcia-Closas, Montse; Kogevinas, Manolis; Villa, Olaya; Hutchinson, Amy; Earl, Julie; Marenne, Gaëlle; Jacobs, Kevin; Rico, Daniel; Tardón, Adonina; Carrato, Alfredo; Thomas, Gilles; Valencia, Alfonso; Silverman, Debra; Real, Francisco X.; Chanock, Stephen J.; Pérez-Jurado, Luis A.

    2010-01-01

    Mosaicism is defined as the coexistence of cells with different genetic composition within an individual, caused by postzygotic somatic mutation. Although somatic mosaicism for chromosomal abnormalities is a well-established cause of developmental and somatic disorders and has also been detected in different tissues, its frequency and extent in the adult normal population are still unknown. We provide here a genome-wide survey of mosaic genomic variation obtained by analyzing Illumina 1M SNP array data from blood or buccal DNA samples of 1991 adult individuals from the Spanish Bladder Cancer/EPICURO genome-wide association study. We found mosaic abnormalities in autosomes in 1.7% of samples, including 23 segmental uniparental disomies, 8 complete trisomies, and 11 large (1.5–37 Mb) copy-number variants. Alterations were observed across the different autosomes with recurrent events in chromosomes 9 and 20. No case-control differences were found in the frequency of events or the percentage of cells affected, thus indicating that most rearrangements found are not central to the development of bladder cancer. However, five out of six events tested were detected in both blood and bladder tissue from the same individual, indicating an early developmental origin. The high cellular frequency of the anomalies detected and their presence in normal adult individuals suggest that this type of mosaicism is a widespread phenomenon in the human genome. Somatic mosaicism should be considered in the expanding repertoire of inter- and intraindividual genetic variation, some of which may cause somatic human diseases but also contribute to modifying inherited disorders and/or late-onset multifactorial traits. PMID:20598279

  7. Large-insert genome analysis technology detects structural variation in Pseudomonas aeruginosa clinical strains from cystic fibrosis patients.

    PubMed

    Hayden, Hillary S; Gillett, Will; Saenphimmachak, Channakhone; Lim, Regina; Zhou, Yang; Jacobs, Michael A; Chang, Jean; Rohmer, Laurence; D'Argenio, David A; Palmieri, Anthony; Levy, Ruth; Haugen, Eric; Wong, Gane K S; Brittnacher, Mitch J; Burns, Jane L; Miller, Samuel I; Olson, Maynard V; Kaul, Rajinder

    2008-06-01

    Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the "genome universe" of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation.

  8. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    PubMed Central

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  9. Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

    PubMed

    Li, Sen; Jakobsson, Mattias

    2012-03-27

    The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years. We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC. We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a

  10. Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation

    PubMed Central

    2012-01-01

    Background The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years. Results We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC. Conclusions We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that

  11. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    PubMed

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most

  12. Cross-Platform Assessment of Genomic Imbalance Confirms the Clinical Relevance of Genomic Complexity and Reveals Loci with Potential Pathogenic Roles in Diffuse Large B-Cell Lymphoma

    PubMed Central

    Dias, Lizalynn M.; Thodima, Venkata; Friedman, Julia; Ma, Charles; Guttapalli, Asha; Mendiratta, Geetu; Siddiqi, Imran N.; Syrbu, Sergei; Chaganti, R. S. K.; Houldsworth, Jane

    2016-01-01

    Genomic copy number alterations (CNAs) in diffuse large B-cell lymphoma (DLBCL) have roles in disease pathogenesis but overall clinical relevance remains unclear. Herein, an unbiased algorithm was uniformly applied across three genome profiling datasets comprising 392 newly-diagnosed DLBCL specimens that defined 32 overlapping CNAs, involving 36 minimal common regions (MCRs). Scoring criteria were established for 50 aberrations within the MCRs while considering peak gains/losses. Application of these criteria to independent datasets revealed novel candidate genes with coordinated expression, such as CNOT2, potentially with pathogenic roles. No one single aberration significantly associated with patient outcome across datasets, but genomic complexity, defined by imbalance in more than one MCR, significantly portended adverse outcome in two of three independent datasets. Thus, the standardized scoring of CNAs currently developed can be uniformly applied across platforms, affording robust validation of genomic imbalance and complexity in DLBCL and overall clinical utility as biomarkers of patient outcome. PMID:26294112

  13. Cross-platform assessment of genomic imbalance confirms the clinical relevance of genomic complexity and reveals loci with potential pathogenic roles in diffuse large B-cell lymphoma.

    PubMed

    Dias, Lizalynn M; Thodima, Venkata; Friedman, Julia; Ma, Charles; Guttapalli, Asha; Mendiratta, Geetu; Siddiqi, Imran N; Syrbu, Sergei; Chaganti, R S K; Houldsworth, Jane

    2016-01-01

    Genomic copy number alterations (CNAs) in diffuse large B-cell lymphoma (DLBCL) have roles in disease pathogenesis, but overall clinical relevance remains unclear. Herein, an unbiased algorithm was uniformly applied across three genome profiling datasets comprising 392 newly-diagnosed DLBCL specimens that defined 32 overlapping CNAs, involving 36 minimal common regions (MCRs). Scoring criteria were established for 50 aberrations within the MCRs while considering peak gains/losses. Application of these criteria to independent datasets revealed novel candidate genes with coordinated expression, such as CNOT2, potentially with pathogenic roles. No one single aberration significantly associated with patient outcome across datasets, but genomic complexity, defined by imbalance in more than one MCR, significantly portended adverse outcome in two of three independent datasets. Thus, the standardized scoring of CNAs currently developed can be uniformly applied across platforms, affording robust validation of genomic imbalance and complexity in DLBCL and overall clinical utility as biomarkers of patient outcome.

  14. Common 5' beta-globin RFLP haplotypes harbour a surprising level of ancestral sequence mosaicism.

    PubMed

    Webster, Matthew T; Clegg, John B; Harding, Rosalind M

    2003-07-01

    Blocks of linkage disequilibrium (LD) in the human genome represent segments of ancestral chromosomes. To investigate the relationship between LD and genealogy, we analysed diversity associated with restriction fragment length polymorphism (RFLP) haplotypes of the 5' beta-globin gene complex. Genealogical analyses were based on sequence alleles that spanned a 12.2-kb interval, covering 3.1 kb around the psibeta gene and 6.2 kb of the delta-globin gene and its 5' flanking sequence known as the R/T region. Diversity was sampled from a Kenyan Luo population where recent malarial selection has contributed to substantial LD. A single common sequence allele spanning the 12.2-kb interval exclusively identified the ancestral chromosome bearing the "Bantu" beta(s) (sickle-cell) RFLP haplotype. Other common 5' RFLP haplotypes comprised interspersed segments from multiple ancestral chromosomes. Nucleotide diversity was similar between psibeta and R/T-delta-globin but was non-uniformly distributed within the R/T-delta-globin region. High diversity associated with the 5' R/T identified two ancestral lineages that probably date back more than 2 million years. Within this genealogy, variation has been introduced into the 3' R/T by gene conversion from other ancestral chromosomes. Diversity in delta-globin was found to lead through parts of the main genealogy but to coalesce in a more recent ancestor. The well-known recombination hotspot is clearly restricted to the region 3' of delta-globin. Our analyses show that, whereas one common haplotype in a block of high LD represents a long segment from a single ancestral chromosome, others are mosaics of short segments from multiple ancestors related in genealogies of unsuspected complexity.

  15. Large genomic rearrangement of BRCA1 and BRCA2 genes in familial breast cancer patients in Korea.

    PubMed

    Cho, Ja Young; Cho, Dae-Yeon; Ahn, Sei Hyun; Choi, Su-Youn; Shin, Inkyung; Park, Hyun Gyu; Lee, Jong Won; Kim, Hee Jeong; Yu, Jong Han; Ko, Beom Seok; Ku, Bo Kyung; Son, Byung Ho

    2014-06-01

    We screened large genomic rearrangements of the BRCA1 and BRCA2 genes in Korean, familial breast cancer patients. Multiplex ligation-dependent probe amplification assay was used to identify BRCA1 and BRCA2 genomic rearrangements in 226 Korean familial breast cancer patients with risk factors for BRCA1 and BRCA2 mutations, who previously tested negative for point mutations in the two genes. We identified only one large deletion (c.4186-1593_4676-1465del) in BRCA1. No large rearrangements were found in BRCA2. Our result indicates that large genomic rearrangement in the BRCA1 and BRCA2 genes does not seem like a major determinant of breast cancer susceptibility in the Korean population. A large-scale study needs to validate our result in Korea.

  16. First Insights into the Large Genome of Epimedium sagittatum (Sieb. et Zucc) Maxim, a Chinese Traditional Medicinal Plant

    PubMed Central

    Liu, Di; Zeng, Shao-Hua; Chen, Jian-Jun; Zhang, Yan-Jun; Xiao, Gong; Zhu, Lin-Yao; Wang, Ying

    2013-01-01

    Epimedium sagittatum (Sieb. et Zucc) Maxim is a member of the Berberidaceae family of basal eudicot plants, widely distributed and used as a traditional medicinal plant in China for therapeutic effects on many diseases with a long history. Recent data shows that E. sagittatum has a relatively large genome, with a haploid genome size of ~4496 Mbp, divided into a small number of only 12 diploid chromosomes (2n = 2x = 12). However, little is known about Epimedium genome structure and composition. Here we present the analysis of 691 kb of high-quality genomic sequence derived from 672 randomly selected plasmid clones of E. sagittatum genomic DNA, representing ~0.0154% of the genome. The sampled sequences comprised at least 78.41% repetitive DNA elements and 2.51% confirmed annotated gene sequences, with a total GC% content of 39%. Retrotransposons represented the major class of transposable element (TE) repeats identified (65.37% of all TE repeats), particularly LTR (Long Terminal Repeat) retrotransposons (52.27% of all TE repeats). Chromosome analysis and Fluorescence in situ Hybridization of Gypsy-Ty3 retrotransposons were performed to survey the E. sagittatum genome at the cytological level. Our data provide the first insights into the composition and structure of the E. sagittatum genome, and will facilitate the functional genomic analysis of this valuable medicinal plant. PMID:23807511

  17. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications

    PubMed Central

    Del Medico, Luca; Christen, Heinz; Christen, Beat

    2017-01-01

    Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner. PMID:28531174

  18. Inferring Ancestral Recombination Graphs from Bacterial Genomic Data.

    PubMed

    Vaughan, Timothy G; Welch, David; Drummond, Alexei J; Biggs, Patrick J; George, Tessy; French, Nigel P

    2017-02-01

    Homologous recombination is a central feature of bacterial evolution, yet it confounds traditional phylogenetic methods. While a number of methods specific to bacterial evolution have been developed, none of these permit joint inference of a bacterial recombination graph and associated parameters. In this article, we present a new method which addresses this shortcoming. Our method uses a novel Markov chain Monte Carlo algorithm to perform phylogenetic inference under the ClonalOrigin model. We demonstrate the utility of our method by applying it to ribosomal multilocus sequence typing data sequenced from pathogenic and nonpathogenic Escherichia coli serotype O157 and O26 isolates collected in rural New Zealand. The method is implemented as an open source BEAST 2 package, Bacter, which is available via the project web page at http://tgvaughan.github.io/bacter. Copyright © 2017 Vaughan et al.

  19. Inferring Ancestral Recombination Graphs from Bacterial Genomic Data

    PubMed Central

    Vaughan, Timothy G.; Welch, David; Drummond, Alexei J.; Biggs, Patrick J.; George, Tessy; French, Nigel P.

    2017-01-01

    Homologous recombination is a central feature of bacterial evolution, yet it confounds traditional phylogenetic methods. While a number of methods specific to bacterial evolution have been developed, none of these permit joint inference of a bacterial recombination graph and associated parameters. In this article, we present a new method which addresses this shortcoming. Our method uses a novel Markov chain Monte Carlo algorithm to perform phylogenetic inference under the ClonalOrigin model. We demonstrate the utility of our method by applying it to ribosomal multilocus sequence typing data sequenced from pathogenic and nonpathogenic Escherichia coli serotype O157 and O26 isolates collected in rural New Zealand. The method is implemented as an open source BEAST 2 package, Bacter, which is available via the project web page at http://tgvaughan.github.io/bacter. PMID:28007885

  20. Comparative Genomics of 12 Strains of Erwinia amylovora Identifies a Pan-Genome with a Large Conserved Core

    PubMed Central

    Mann, Rachel A.; Smits, Theo H. M.; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E.; Plummer, Kim M.; Beer, Steven V.; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1Ea and a putative secondary metabolite pathway only present in Rubus-infecting strains. PMID:23409014

  1. Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core.

    PubMed

    Mann, Rachel A; Smits, Theo H M; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E; Plummer, Kim M; Beer, Steven V; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1(Ea) and a putative secondary metabolite pathway only present in Rubus-infecting strains.

  2. Bacterial delivery of large intact genomic-DNA-containing BACs into mammalian cells

    PubMed Central

    Cheung, Wing; Kotzamanis, George; Abdulrazzak, Hassan; Goussard, Sylvie; Kaname, Tadashi; Kotsinas, Athanassios; Gorgoulis, Vassilis G.; Grillot-Courvalin, Catherine; Huxley, Clare

    2012-01-01

    Efficient delivery of large intact vectors into mammalian cells remains problematical. Here we evaluate delivery by bacterial invasion of two large BACs of more than 150 kb in size into various cells. First, we determined the effect of several drugs on bacterial delivery of a small plasmid into different cell lines. Most drugs tested resulted in a marginal increase of the overall efficiency of delivery in only some cell lines, except the lysosomotropic drug chloroquine, which was found to increase the efficiency of delivery by 6-fold in B16F10 cells. Bacterial invasion was found to be significantly advantageous compared with lipofection in delivering large intact BACs into mouse cells, resulting in 100% of clones containing intact DNA. Furthermore, evaluation of expression of the human hypoxanthine phosphoribosyltransferase (HPRT) gene from its genomic locus, which was present in one of the BACs, showed that single copy integrations of the HPRT-containing BAC had occurred in mouse B16F10 cells and that expression of HPRT from each human copy was 0.33 times as much as from each endogenous mouse copy. These data provide new evidence that bacterial delivery is a convenient and efficient method to transfer large intact therapeutic genes into mammalian cells. PMID:22095052

  3. Bacterial delivery of large intact genomic-DNA-containing BACs into mammalian cells.

    PubMed

    Cheung, Wing; Kotzamanis, George; Abdulrazzak, Hassan; Goussard, Sylvie; Kaname, Tadashi; Kotsinas, Athanassios; Gorgoulis, Vassilis G; Grillot-Courvalin, Catherine; Huxley, Clare

    2012-01-01

    Efficient delivery of large intact vectors into mammalian cells remains problematical. Here we evaluate delivery by bacterial invasion of two large BACs of more than 150 kb in size into various cells. First, we determined the effect of several drugs on bacterial delivery of a small plasmid into different cell lines. Most drugs tested resulted in a marginal increase of the overall efficiency of delivery in only some cell lines, except the lysosomotropic drug chloroquine, which was found to increase the efficiency of delivery by 6-fold in B16F10 cells. Bacterial invasion was found to be significantly advantageous compared with lipofection in delivering large intact BACs into mouse cells, resulting in 100% of clones containing intact DNA. Furthermore, evaluation of expression of the human hypoxanthine phosphoribosyltransferase (HPRT) gene from its genomic locus, which was present in one of the BACs, showed that single copy integrations of the HPRT-containing BAC had occurred in mouse B16F10 cells and that expression of HPRT from each human copy was 0.33 times as much as from each endogenous mouse copy. These data provide new evidence that bacterial delivery is a convenient and efficient method to transfer large intact therapeutic genes into mammalian cells.

  4. Direct selection: a method for the isolation of cDNAs encoded by large genomic regions.

    PubMed Central

    Lovett, M; Kere, J; Hinton, L M

    1991-01-01

    We have developed a strategy for the rapid enrichment and identification of cDNAs encoded by large genomic regions. The basis of this "direct selection" scheme is the hybridization of an entire library of cDNAs to an immobilized genomic clone. Nonspecific hybrids are eliminated and selected cDNAs are eluted. These molecules are then amplified and are either cloned or subjected to further selection/amplification cycles. This scheme was tested using a 550-kilobase yeast artificial chromosome clone that contains the EPO gene. Using this clone and a fetal kidney cDNA library, we have achieved a 1000-fold enrichment of EPO cDNAs in one cycle of enrichment. More significantly, we have further investigated one of the "anonymous" cDNAs that was selectively enriched. We confirmed that this cDNA was encoded by the yeast artificial chromosome. Its frequency in the starting library was 1 in 1 x 10(5) cDNAs and after selection comprised 2% of the selected library. DNA sequence analysis of this cDNA and of the yeast artificial chromosome clone revealed that this gene encodes the beta 2 subunit of the human guanine nucleotide-binding regulatory proteins. Restriction mapping and hybridization data position this gene (GNB2) to within 30-70 kilobases of the EPO gene. The selective isolation and mapping of GNB2 confirms the feasibility of this direct selection strategy and suggests that it will be useful for the rapid isolation of cDNAs, including disease-related genes, across extensive portions of the human genome. Images PMID:1946378

  5. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

    SciTech Connect

    Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

    2016-01-01

    In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

  6. Repair of base damage and genome maintenance in the nucleo-cytoplasmic large DNA viruses.

    PubMed

    Redrejo-Rodríguez, Modesto; Salas, María L

    2014-01-22

    Among the DNA viruses, the so-called nucleo-cytoplasmic large DNA viruses (NCLDV) constitute a monophyletic group that currently consists of seven families of viruses infecting a very broad variety of eukaryotes, from unicellular marine protists to humans. Many recent papers have analyzed the sequence and structure of NCLDV genomes and their phylogeny, providing detailed analysis about their genomic structure and evolutionary history and proposing their inclusion in a new viral order named Megavirales that, according to some authors, should be considered as a fourth domain of life, aside from Bacteria, Archaea and Eukarya. The maintenance of genetic information protected from environmental attacks and mutations is essential not only for the survival of cellular organisms but also viruses. In cellular organisms, damaged DNA bases are removed in two major repair pathways: base excision repair (BER) and nucleotide incision repair (NIR) that constitute the major pathways responsible for repairing most endogenous base lesions and abnormal bases in the genome by precise repair procedures. Like cells, many NCLDV encode proteins that might constitute viral DNA repair pathways that would remove damages through BER/NIR pathways. However, the molecular mechanisms and, specially, the biological roles of those viral repair pathways have not been deeply addressed in the literature so far. In this paper, we review viral-encoded BER proteins and the genetic and biochemical data available about them. We propose and discuss probable viral-encoded DNA repair mechanisms and pathways, as compared with the functional and molecular features of known homologs proteins. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

    SciTech Connect

    Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

    2016-08-16

    In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

  8. Plasticity of Animal Genome Architecture Unmasked by Rapid Evolution of a Pelagic Tunicate

    PubMed Central

    Denoeud, France; Henriet, Simon; Mungpakdee, Sutada; Aury, Jean-Marc; Da Silva, Corinne; Brinkmann, Henner; Mikhaleva, Jana; Olsen, Lisbeth Charlotte; Jubin, Claire; Cañestro, Cristian; Bouquet, Jean-Marie; Danks, Gemma; Poulain, Julie; Campsteijn, Coen; Adamski, Marcin; Cross, Ismael; Yadetie, Fekadu; Muffato, Matthieu; Louis, Alexandra; Butcher, Stephen; Tsagkogeorga, Georgia; Konrad, Anke; Singh, Sarabdeep; Jensen, Marit Flo; Cong, Evelyne Huynh; Eikeseth-Otteraa, Helen; Noel, Benjamin; Anthouard, Véronique; Porcel, Betina M.; Kachouri-Lafond, Rym; Nishino, Atsuo; Ugolini, Matteo; Chourrout, Pascal; Nishida, Hiroki; Aasland, Rein; Huzurbazar, Snehalata; Westhof, Eric; Delsuc, Frédéric; Lehrach, Hans; Reinhardt, Richard; Weissenbach, Jean; Roy, Scott W.; Artiguenave, François; Postlethwait, John H.; Manak, J. Robert; Thompson, Eric M.; Jaillon, Olivier; Pasquier, Louis Du; Boudinot, Pierre; Liberles, David A.; Volff, Jean-Nicolas; Philippe, Hervé; Lenhard, Boris; Crollius, Hugues Roest; Wincker, Patrick; Chourrout, Daniel

    2012-01-01

    Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain. PMID:21097902

  9. Plasticity of animal genome architecture unmasked by rapid evolution of a pelagic tunicate.

    PubMed

    Denoeud, France; Henriet, Simon; Mungpakdee, Sutada; Aury, Jean-Marc; Da Silva, Corinne; Brinkmann, Henner; Mikhaleva, Jana; Olsen, Lisbeth Charlotte; Jubin, Claire; Cañestro, Cristian; Bouquet, Jean-Marie; Danks, Gemma; Poulain, Julie; Campsteijn, Coen; Adamski, Marcin; Cross, Ismael; Yadetie, Fekadu; Muffato, Matthieu; Louis, Alexandra; Butcher, Stephen; Tsagkogeorga, Georgia; Konrad, Anke; Singh, Sarabdeep; Jensen, Marit Flo; Huynh Cong, Evelyne; Eikeseth-Otteraa, Helen; Noel, Benjamin; Anthouard, Véronique; Porcel, Betina M; Kachouri-Lafond, Rym; Nishino, Atsuo; Ugolini, Matteo; Chourrout, Pascal; Nishida, Hiroki; Aasland, Rein; Huzurbazar, Snehalata; Westhof, Eric; Delsuc, Frédéric; Lehrach, Hans; Reinhardt, Richard; Weissenbach, Jean; Roy, Scott W; Artiguenave, François; Postlethwait, John H; Manak, J Robert; Thompson, Eric M; Jaillon, Olivier; Du Pasquier, Louis; Boudinot, Pierre; Liberles, David A; Volff, Jean-Nicolas; Philippe, Hervé; Lenhard, Boris; Roest Crollius, Hugues; Wincker, Patrick; Chourrout, Daniel

    2010-12-03

    Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain.

  10. microRNAs reveal the interrelationships of hagfish, lampreys, and gnathostomes and the nature of the ancestral vertebrate.

    PubMed

    Heimberg, Alysha M; Cowper-Sal-lari, Richard; Sémon, Marie; Donoghue, Philip C J; Peterson, Kevin J

    2010-11-09

    Hagfish and lampreys are the only living representatives of the jawless vertebrates (agnathans), and compared with jawed vertebrates (gnathostomes), they provide insight into the embryology, genomics, and body plan of the ancestral vertebrate. However, this insight has been obscured by controversy over their interrelationships. Morphological cladistic analyses have identified lampreys and gnathostomes as closest relatives, whereas molecular phylogenetic studies recover a monophyletic Cyclostomata (hagfish and lampreys as closest relatives). Here, we show through deep sequencing of small RNA libraries, coupled with genomic surveys, that Cyclostomata is monophyletic: hagfish and lampreys share 4 unique microRNA families, 15 unique paralogues of more primitive microRNA families, and 22 unique substitutions to the mature gene products. Reanalysis of morphological data reveals that support for cyclostome paraphyly was based largely on incorrect character coding, and a revised dataset is not decisive on the mono- vs. paraphyly of cyclostomes. Furthermore, we show fundamental conservation of microRNA expression patterns among lamprey, hagfish, and gnathostome organs, implying that the role of microRNAs within specific organs is coincident with their appearance within the genome and is conserved through time. Together, these data support the monophyly of cyclostomes and suggest that the last common ancestor of all living vertebrates was a more complex organism than conventionally accepted by comparative morphologists and developmental biologists.

  11. microRNAs reveal the interrelationships of hagfish, lampreys, and gnathostomes and the nature of the ancestral vertebrate

    PubMed Central

    Heimberg, Alysha M.; Cowper-Sal·lari, Richard; Sémon, Marie; Donoghue, Philip C. J.; Peterson, Kevin J.

    2010-01-01

    Hagfish and lampreys are the only living representatives of the jawless vertebrates (agnathans), and compared with jawed vertebrates (gnathostomes), they provide insight into the embryology, genomics, and body plan of the ancestral vertebrate. However, this insight has been obscured by controversy over their interrelationships. Morphological cladistic analyses have identified lampreys and gnathostomes as closest relatives, whereas molecular phylogenetic studies recover a monophyletic Cyclostomata (hagfish and lampreys as closest relatives). Here, we show through deep sequencing of small RNA libraries, coupled with genomic surveys, that Cyclostomata is monophyletic: hagfish and lampreys share 4 unique microRNA families, 15 unique paralogues of more primitive microRNA families, and 22 unique substitutions to the mature gene products. Reanalysis of morphological data reveals that support for cyclostome paraphyly was based largely on incorrect character coding, and a revised dataset is not decisive on the mono- vs. paraphyly of cyclostomes. Furthermore, we show fundamental conservation of microRNA expression patterns among lamprey, hagfish, and gnathostome organs, implying that the role of microRNAs within specific organs is coincident with their appearance within the genome and is conserved through time. Together, these data support the monophyly of cyclostomes and suggest that the last common ancestor of all living vertebrates was a more complex organism than conventionally accepted by comparative morphologists and developmental biologists. PMID:20959416

  12. In search of ancestral Kilauea volcano

    USGS Publications Warehouse

    Lipman, P.W.; Sisson, T.W.; Ui, T.; Naka, J.

    2000-01-01

    Submersible observations and samples show that the lower south flank of Hawaii, offshore from Kilauea volcano and the active Hilina slump system, consists entirely of compositionally diverse volcaniclastic rocks; pillow lavas are confined to shallow slopes. Submarine-erupted basalt clasts have strongly variable alkalic and transitional basalt compositions (to 41% SiO2, 10.8% alkalies), contrasting with present-day Kilauea tholeiites. The volcaniclastic rocks provide a unique record of ancestral alkalic growth of an archetypal hotspot volcano, including transition to its tholeiitic shield stage, and associated slope-failure events.

  13. Ancestral European roots of Helicobacter pylori in India

    PubMed Central

    Devi, S Manjulata; Ahmed, Irshad; Francalacci, Paolo; Hussain, M Abid; Akhter, Yusuf; Alvi, Ayesha; Sechi, Leonardo A; Mégraud, Francis; Ahmed, Niyaz

    2007-01-01

    Background The human gastric pathogen Helicobacter pylori is co-evolved with its host and therefore, origins and expansion of multiple populations and sub populations of H. pylori mirror ancient human migrations. Ancestral origins of H. pylori in the vast Indian subcontinent are debatable. It is not clear how different waves of human migrations in South Asia shaped the population structure of H. pylori. We tried to address these issues through mapping genetic origins of present day H. pylori in India and their genomic comparison with hundreds of isolates from different geographic regions. Results We attempted to dissect genetic identity of strains by multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and phylogeographic analysis of haplotypes using MEGA and NETWORK software while incorporating DNA sequences and genotyping data of whole cag pathogenicity-islands (cagPAI). The distribution of cagPAI genes within these strains was analyzed by using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. All the isolates analyzed revealed European ancestry and belonged to H. pylori sub-population, hpEurope. The cagPAI harbored by Indian strains revealed European features upon PCR based analysis and whole PAI sequencing. Conclusion These observations suggest that H. pylori strains in India share ancestral origins with their European counterparts. Further, non-existence of other sub-populations such as hpAfrica and hpEastAsia, at least in our collection of isolates, suggest that the hpEurope strains enjoyed a special fitness advantage in Indian stomachs to out-compete any endogenous strains. These results also might support hypotheses related to gene flow in India through Indo-Aryans and arrival of Neolithic practices and languages from the Fertile Crescent. PMID:17584914

  14. Ancestral European roots of Helicobacter pylori in India.

    PubMed

    Devi, S Manjulata; Ahmed, Irshad; Francalacci, Paolo; Hussain, M Abid; Akhter, Yusuf; Alvi, Ayesha; Sechi, Leonardo A; Mégraud, Francis; Ahmed, Niyaz

    2007-06-20

    The human gastric pathogen Helicobacter pylori is co-evolved with its host and therefore, origins and expansion of multiple populations and sub populations of H. pylori mirror ancient human migrations. Ancestral origins of H. pylori in the vast Indian subcontinent are debatable. It is not clear how different waves of human migrations in South Asia shaped the population structure of H. pylori. We tried to address these issues through mapping genetic origins of present day H. pylori in India and their genomic comparison with hundreds of isolates from different geographic regions. We attempted to dissect genetic identity of strains by multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and phylogeographic analysis of haplotypes using MEGA and NETWORK software while incorporating DNA sequences and genotyping data of whole cag pathogenicity-islands (cagPAI). The distribution of cagPAI genes within these strains was analyzed by using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. All the isolates analyzed revealed European ancestry and belonged to H. pylori sub-population, hpEurope. The cagPAI harbored by Indian strains revealed European features upon PCR based analysis and whole PAI sequencing. These observations suggest that H. pylori strains in India share ancestral origins with their European counterparts. Further, non-existence of other sub-populations such as hpAfrica and hpEastAsia, at least in our collection of isolates, suggest that the hpEurope strains enjoyed a special fitness advantage in Indian stomachs to out-compete any endogenous strains. These results also might support hypotheses related to gene flow in India through Indo-Aryans and arrival of Neolithic practices and languages from the Fertile Crescent.

  15. The Centers for Mendelian Genomics: a new large-scale initiative to identify the genes underlying rare Mendelian conditions.

    PubMed

    Bamshad, Michael J; Shendure, Jay A; Valle, David; Hamosh, Ada; Lupski, James R; Gibbs, Richard A; Boerwinkle, Eric; Lifton, Richard P; Gerstein, Mark; Gunel, Murat; Mane, Shrikant; Nickerson, Deborah A

    2012-07-01

    Next generation exome sequencing (ES) and whole genome sequencing (WGS) are new powerful tools for discovering the gene(s) that underlie Mendelian disorders. To accelerate these discoveries, the National Institutes of Health has established three Centers for Mendelian Genomics (CMGs): the Center for Mendelian Genomics at the University of Washington; the Center for Mendelian Genomics at Yale University; and the Baylor-Johns Hopkins Center for Mendelian Genomics at Baylor College of Medicine and Johns Hopkins University. The CMGs will provide ES/WGS and extensive analysis expertise at no cost to collaborating investigators where the causal gene(s) for a Mendelian phenotype has yet to be uncovered. Over the next few years and in collaboration with the global human genetics community, the CMGs hope to facilitate the identification of the genes underlying a very large fraction of all Mendelian disorders; see http://mendelian.org. Copyright © 2012 Wiley Periodicals, Inc.

  16. Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0

    PubMed Central

    2010-01-01

    Background The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. Findings GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb). It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. Conclusions The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind. PMID:20178631

  17. Merkel Cell Polyomavirus Large T Antigen Disrupts Host Genomic Integrity and Inhibits Cellular Proliferation

    PubMed Central

    Li, Jing; Wang, Xin; Diaz, Jason; Tsang, Sabrina H.; Buck, Christopher B.

    2013-01-01

    Clonal integration of Merkel cell polyomavirus (MCV) DNA into the host genome has been observed in at least 80% of Merkel cell carcinoma (MCC). The integrated viral genome typically carries mutations that truncate the C-terminal DNA binding and helicase domains of the MCV large T antigen (LT), suggesting a selective pressure to remove this MCV LT region during tumor development. In this study, we show that MCV infection leads to the activation of host DNA damage responses (DDR). This activity was mapped to the C-terminal helicase-containing region of the MCV LT. The MCV LT-activated DNA damage kinases, in turn, led to enhanced p53 phosphorylation, upregulation of p53 downstream target genes, and cell cycle arrest. Compared to the N-terminal MCV LT fragment that is usually preserved in mutants isolated from MCC tumors, full-length MCV LT shows a decreased potential to support cellular proliferation, focus formation, and anchorage-independent cell growth. These apparently antitumorigenic effects can be reversed by a dominant-negative p53 inhibitor. Our results demonstrate that MCV LT-induced DDR activates p53 pathway, leading to the inhibition of cellular proliferation. This study reveals a key difference between MCV LT and simian vacuolating virus 40 LT, which activates a DDR but inhibits p53 function. This study also explains, in part, why truncation mutations that remove the MCV LT C-terminal region are necessary for the oncogenic progression of MCV-associated cancers. PMID:23760247

  18. Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies

    PubMed Central

    Huang, Jim C.; Meek, Christopher; Kadie, Carl; Heckerman, David

    2011-01-01

    Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. PMID:21765897

  19. Searching for large genomic rearrangements of the BRCA1 gene in a Nigerian population.

    PubMed

    Zhang, Jing; Fackenthal, James D; Huo, Dezheng; Zheng, Yonglan; Olopade, Olufunmilayo I

    2010-11-01

    BRCA1/2 germline mutations predispose to breast and ovarian cancer. Large genomic rearrangements (LGRs) have widened the mutational spectrum of the BRCA1 gene, but the frequencies vary in different populations. In this study, we want to determine the spectrum of LGRs in BRCA1 gene in Nigerian breast cancer patients. The multiplex ligation-dependent probe amplification (MLPA) assay was used to screen BRCA1 rearrangements in 352 patients who previously tested negative for BRCA1 and BRCA2 point mutations and small insertions/deletions. Positive MLPA result was confirmed and located by long-range PCR. The breakpoints of the candidate rearrangement were characterized by sequencing. A novel deletion of BRCA1 exon 21 (c.5277 + 480_5332 + 672del) was detected in 1 out of 352 Nigerian breast cancer patients (0.3% occurrence frequency). Further analysis of breakpoints revealed that the deletion involves two Alu-elements: one AluSg in intron 20 and the AluY in intron 21. These data suggest that while BRCA1 genomic rearrangement exists, they do not contribute significantly to BRCA1-associated risk in the Nigerian population.

  20. Large-scale analysis of tandem repeat variability in the human genome

    PubMed Central

    Duitama, Jorge; Zablotskaya, Alena; Gemayel, Rita; Jansen, An; Belet, Stefanie; Vermeesch, Joris R.; Verstrepen, Kevin J.; Froyen, Guy

    2014-01-01

    Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats. PMID:24682812

  1. Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma.

    PubMed

    Cerhan, James R; Berndt, Sonja I; Vijai, Joseph; Ghesquières, Hervé; McKay, James; Wang, Sophia S; Wang, Zhaoming; Yeager, Meredith; Conde, Lucia; de Bakker, Paul I W; Nieters, Alexandra; Cox, David; Burdett, Laurie; Monnereau, Alain; Flowers, Christopher R; De Roos, Anneclaire J; Brooks-Wilson, Angela R; Lan, Qing; Severi, Gianluca; Melbye, Mads; Gu, Jian; Jackson, Rebecca D; Kane, Eleanor; Teras, Lauren R; Purdue, Mark P; Vajdic, Claire M; Spinelli, John J; Giles, Graham G; Albanes, Demetrius; Kelly, Rachel S; Zucca, Mariagrazia; Bertrand, Kimberly A; Zeleniuch-Jacquotte, Anne; Lawrence, Charles; Hutchinson, Amy; Zhi, Degui; Habermann, Thomas M; Link, Brian K; Novak, Anne J; Dogan, Ahmet; Asmann, Yan W; Liebow, Mark; Thompson, Carrie A; Ansell, Stephen M; Witzig, Thomas E; Weiner, George J; Veron, Amelie S; Zelenika, Diana; Tilly, Hervé; Haioun, Corinne; Molina, Thierry Jo; Hjalgrim, Henrik; Glimelius, Bengt; Adami, Hans-Olov; Bracci, Paige M; Riby, Jacques; Smith, Martyn T; Holly, Elizabeth A; Cozen, Wendy; Hartge, Patricia; Morton, Lindsay M; Severson, Richard K; Tinker, Lesley F; North, Kari E; Becker, Nikolaus; Benavente, Yolanda; Boffetta, Paolo; Brennan, Paul; Foretova, Lenka; Maynadie, Marc; Staines, Anthony; Lightfoot, Tracy; Crouch, Simon; Smith, Alex; Roman, Eve; Diver, W Ryan; Offit, Kenneth; Zelenetz, Andrew; Klein, Robert J; Villano, Danylo J; Zheng, Tongzhang; Zhang, Yawei; Holford, Theodore R; Kricker, Anne; Turner, Jenny; Southey, Melissa C; Clavel, Jacqueline; Virtamo, Jarmo; Weinstein, Stephanie; Riboli, Elio; Vineis, Paolo; Kaaks, Rudolph; Trichopoulos, Dimitrios; Vermeulen, Roel C H; Boeing, Heiner; Tjonneland, Anne; Angelucci, Emanuele; Di Lollo, Simonetta; Rais, Marco; Birmann, Brenda M; Laden, Francine; Giovannucci, Edward; Kraft, Peter; Huang, Jinyan; Ma, Baoshan; Ye, Yuanqing; Chiu, Brian C H; Sampson, Joshua; Liang, Liming; Park, Ju-Hyun; Chung, Charles C; Weisenburger, Dennis D; Chatterjee, Nilanjan; Fraumeni, Joseph F; Slager, Susan L; Wu, Xifeng; de Sanjose, Silvia; Smedby, Karin E; Salles, Gilles; Skibola, Christine F; Rothman, Nathaniel; Chanock, Stephen J

    2014-11-01

    Diffuse large B cell lymphoma (DLBCL) is the most common lymphoma subtype and is clinically aggressive. To identify genetic susceptibility loci for DLBCL, we conducted a meta-analysis of 3 new genome-wide association studies (GWAS) and 1 previous scan, totaling 3,857 cases and 7,666 controls of European ancestry, with additional genotyping of 9 promising SNPs in 1,359 cases and 4,557 controls. In our multi-stage analysis, five independent SNPs in four loci achieved genome-wide significance marked by rs116446171 at 6p25.3 (EXOC2; P = 2.33 × 10(-21)), rs2523607 at 6p21.33 (HLA-B; P = 2.40 × 10(-10)), rs79480871 at 2p23.3 (NCOA1; P = 4.23 × 10(-8)) and two independent SNPs, rs13255292 and rs4733601, at 8q24.21 (PVT1; P = 9.98 × 10(-13) and 3.63 × 10(-11), respectively). These data provide substantial new evidence for genetic susceptibility to this B cell malignancy and point to pathways involved in immune recognition and immune function in the pathogenesis of DLBCL.

  2. PALMA: mRNA to genome alignments using large margin algorithms.

    PubMed

    Schulze, Uta; Hepp, Bettina; Ong, Cheng Soon; Rätsch, Gunnar

    2007-08-01

    Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines accurate splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm-called PALMA-tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from Caenorhabditis elegans and human, it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/palma

  3. BAL31-NGS approach for identification of telomeres de novo in large genomes.

    PubMed

    Peška, Vratislav; Sitová, Zdeňka; Fajkus, Petr; Fajkus, Jiří

    2017-02-01

    This article describes a novel method to identify as yet undiscovered telomere sequences, which combines next generation sequencing (NGS) with BAL31 digestion of high molecular weight DNA. The method was applied to two groups of plants: i) dicots, genus Cestrum, and ii) monocots, Allium species (e.g. A. ursinum and A. cepa). Both groups consist of species with large genomes (tens of Gb) and a low number of chromosomes (2n=14-16), full of repeat elements. Both genera lack typical telomeric repeats and multiple studies have attempted to characterize alternative telomeric sequences. However, despite interesting hypotheses and suggestions of alternative candidate telomeres (retrotransposons, rDNA, satellite repeats) these studies have not resolved the question. In a novel approach based on the two most general features of eukaryotic telomeres, their repetitive character and sensitivity to BAL31 nuclease digestion, we have taken advantage of the capacity and current affordability of NGS in combination with the robustness of classical BAL31 nuclease digestion of chromosomal termini. While representative samples of most repeat elements were ensured by low-coverage (less than 5%) genomic shot-gun NGS, candidate telomeres were identified as under-represented sequences in BAL31-treated samples.

  4. Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires.

    PubMed

    Burstein, David; Amaro, Francisco; Zusman, Tal; Lifshitz, Ziv; Cohen, Ofir; Gilbert, Jack A; Pupko, Tal; Shuman, Howard A; Segal, Gil

    2016-02-01

    Infection by the human pathogen Legionella pneumophila relies on the translocation of ∼ 300 virulence proteins, termed effectors, which manipulate host cell processes. However, almost no information exists regarding effectors in other Legionella pathogens. Here we sequenced, assembled and characterized the genomes of 38 Legionella species and predicted their effector repertoires using a previously validated machine learning approach. This analysis identified 5,885 predicted effectors. The effector repertoires of different Legionella species were found to be largely non-overlapping, and only seven core effectors were shared by all species studied. Species-specific effectors had atypically low GC content, suggesting exogenous acquisition, possibly from the natural protozoan hosts of these species. Furthermore, we detected numerous new conserved effector domains and discovered new domain combinations, which allowed the inference of as yet undescribed effector functions. The effector collection and network of domain architectures described here can serve as a roadmap for future studies of effector function and evolution.

  5. Genome-wide mapping of large deletions and their population-genetic properties in dairy cattle.

    PubMed

    Mesbah-Uddin, Md; Guldbrandtsen, Bernt; Iso-Touru, Terhi; Vilkki, Johanna; De Koning, Dirk-Jan; Boichard, Didier; Lund, Mogens Sandø; Sahana, Goutam

    2017-09-08

    Large genomic deletions are potential candidate for loss-of-function, which could be lethal as homozygote. Analysing whole genome data of 175 cattle, we report 8,480 large deletions (199 bp-773 KB) with an overall false discovery rate of 8.8%; 82% of which are novel compared with deletions in the dbVar database. Breakpoint sequence analyses revealed that majority (24 of 29 tested) of the deletions contain microhomology/homology at breakpoint, and therefore, most likely generated by microhomology-mediated end joining. We observed higher differentiation among breeds for deletions in some genic-regions, such as ABCA12, TTC1, VWA3B, TSHR, DST/BPAG1, and CD1D. The genes overlapping deletions are on average evolutionarily less conserved compared with known mouse lethal genes (P-value = 2.3 × 10-6). We report 167 natural gene knockouts in cattle that are apparently nonessential as live homozygote individuals are observed. These genes are functionally enriched for immunoglobulin domains, olfactory receptors, and MHC classes (FDR = 2.06 × 10-22, 2.06 × 10-22, 7.01 × 10-6, respectively). We also demonstrate that deletions are enriched for health and fertility related quantitative trait loci (2-and 1.5-fold enrichment, Fisher's P-value = 8.91 × 10-10 and 7.4 × 10-11, respectively). Finally, we identified and confirmed the breakpoint of a ∼525 KB deletion on Chr23:12,291,761-12,817,087 (overlapping BTBD9, GLO1 and DNAH8), causing stillbirth in Nordic Red Cattle. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  6. Software engineering the mixed model for genome-wide association studies on large samples.

    PubMed

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.

  7. The Exceptionally Large Chloroplast Genome of the Green Alga Floydiella terrestris Illuminates the Evolutionary History of the Chlorophyceae

    PubMed Central

    Brouard, Jean-Simon; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2010-01-01

    The Chlorophyceae, an advanced class of chlorophyte green algae, comprises five lineages that form two major clades (Chlamydomonadales + Sphaeropleales and Oedogoniales + Chaetopeltidales + Chaetophorales). The four complete chloroplast DNA (cpDNA) sequences currently available for chlorophyceans uncovered an extraordinarily fluid genome architecture as well as many structural features distinguishing this group from other green algae. We report here the 521,168-bp cpDNA sequence from a member of the Chaetopeltidales (Floydiella terrestris), the sole chlorophycean lineage not previously sampled for chloroplast genome analysis. This genome, which contains 97 conserved genes and 26 introns (19 group I and 7 group II introns), is the largest chloroplast genome ever sequenced. Intergenic regions account for 77.8% of the genome size and are populated by short repeats. Numerous genomic features are shared with the cpDNA of the chaetophoralean Stigeoclonium helveticum, notably the absence of a large inverted repeat and the presence of unique gene clusters and trans-spliced group II introns. Although only one of the Floydiella group I introns encodes a homing endonuclease gene, our finding of five free-standing reading frames having similarity with such genes suggests that chloroplast group I introns endowed with mobility were once more abundant in the Floydiella lineage. Parsimony analysis of structural genomic features and phylogenetic analysis of chloroplast sequence data unambiguously resolved the Oedogoniales as sister to the Chaetopeltidales and Chaetophorales. An evolutionary scenario of the molecular events that shaped the chloroplast genome in the Chlorophyceae is presented. PMID:20624729

  8. Histology of “placoderm” dermal skeletons: Implications for the nature of the ancestral gnathostome

    PubMed Central

    Giles, Sam; Rücklin, Martin

    2013-01-01

    Abstract The vertebrate dermal skeleton has long been interpreted to have evolved from a primitive condition exemplified by chondrichthyans. However, chondrichthyans and osteichthyans evolved from an ancestral gnathostome stem‐lineage in which the dermal skeleton was more extensively developed. To elucidate the histology and skeletal structure of the gnathostome crown‐ancestor we conducted a histological survey of the diversity of the dermal skeleton among the placoderms, a diverse clade or grade of early jawed vertebrates. The dermal skeleton of all placoderms is composed largely of a cancellar architecture of cellular dermal bone, surmounted by dermal tubercles in the most ancestral clades, including antiarchs. Acanthothoracids retain an ancestral condition for the dermal skeleton, and we record its secondary reduction in antiarchs. We also find that mechanisms for remodeling bone and facilitating different growth rates between adjoining plates are widespread throughout the placoderms. J. Morphol., 2013. © 2013 Wiley Periodicals, Inc. PMID:23378262

  9. The search for ancestral nervous systems: an integrative and comparative approach.

    PubMed

    Satterlie, Richard A

    2015-02-15

    Even the most basal multicellular nervous systems are capable of producing complex behavioral acts that involve the integration and combination of simple responses, and decision-making when presented with conflicting stimuli. This requires an understanding beyond that available from genomic investigations, and calls for a integrative and comparative approach, where the power of genomic/transcriptomic techniques is coupled with morphological, physiological and developmental experimentation to identify common and species-specific nervous system properties for the development and elaboration of phylogenomic reconstructions. With careful selection of genes and gene products, we can continue to make significant progress in our search for ancestral nervous system organizations.

  10. The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

    Treesearch

    A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

    2000-01-01

    Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...

  11. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  12. An Ancestral Recombination Graph for Diploid Populations with Skewed Offspring Distribution

    PubMed Central

    Birkner, Matthias; Blath, Jochen; Eldon, Bjarki

    2013-01-01

    A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar “heavily skewed” reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between

  13. Assessing the functional consequence of loss of function variants using electronic medical record and large-scale genomics consortium efforts

    PubMed Central

    Sleiman, Patrick; Bradfield, Jonathan; Mentch, Frank; Almoguera, Berta; Connolly, John; Hakonarson, Hakon

    2014-01-01

    Estimates from large scale genome sequencing studies indicate that each human carries up to 20 genetic variants that are predicted to results in loss of function (LOF) of protein-coding genes. While some are known disease-causing variants or common, tolerated, LOFs in non-essential genes, the majority remain of unknown consequence. We explore the possibility of using imputed GWAS data from large biorepositories such as the electronic medical record and genomics (eMERGE) consortium to determine the effects of rare LOFs. Here, we show that two hypocholesterolemia-associated LOF mutations in the PCSK9 gene can be accurately imputed into large-scale GWAS datasets which raises the possibility of assessing LOFs through genomics-linked medical records. PMID:24808909

  14. Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction.

    PubMed

    De Coninck, Arne; De Baets, Bernard; Kourounis, Drosos; Verbosio, Fabio; Schenk, Olaf; Maenhout, Steven; Fostier, Jan

    2016-05-01

    Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.

  15. Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

    PubMed

    Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

    2015-10-16

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average

  16. Genomic imbalances during transformation from follicular lymphoma to diffuse large B-cell lymphoma.

    PubMed

    Berglund, Mattias; Enblad, Gunilla; Thunberg, Ulf; Amini, Rose-Marie; Sundström, Christer; Roos, Göran; Erlanson, Martin; Rosenquist, Richard; Larsson, Catharina; Lagercrantz, Svetlana

    2007-01-01

    Follicular lymphoma is commonly transformed to a more aggressive diffuse large B-cell lymphoma (DLBCL). In order to provide molecular characterization of this histological and clinical transformation, comparative genomic hybridization was applied to 23 follicular lymphoma and 35 transformed DLBCL tumors from a total of 30 patients. The results were also compared with our published findings in de novo DLBCL. Copy number changes were detected in 70% of follicular lymphoma and in 97% of transformed DLBCL. In follicular lymphoma, the most common alterations were +18q21 (33%), +Xq25-26 (28%), +1q31-32 (23%), and -17p (23%), whereas transformed DLBCL most frequently exhibited +Xq25-26 (36%), +12q15 (29%), +7pter-q22 (25%), +8q21 (21%), and -6q16-21(25%). Transformed DLBCL showed significantly more alterations as compared to follicular lymphoma (P=0.0001), and the alterations -6q16-21 and +7pter-q22 were only found in transformed DLBCL but not in follicular lymphoma (P=0.02). Alterations involving +13q22 were significantly less frequent, whereas -4q13-21 was more common in transformed as compared to de novo DLBCL (P=0.01 and P=0.02, respectively). Clinical progression from follicular lymphoma to transformed DLBCL is on the genetic level associated with acquisition of increasing number of genomic copy number changes, with non-random involvement of specific target regions. The findings support diverse genetic background between transformed and de novo DLBCL.

  17. A Draft Sequence of the Neandertal Genome

    PubMed Central

    Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante

    2016-01-01

    Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178

  18. The genome of woodland strawberry (Fragaria vesca).

    PubMed

    Shulaev, Vladimir; Sargent, Daniel J; Crowhurst, Ross N; Mockler, Todd C; Folkerts, Otto; Delcher, Arthur L; Jaiswal, Pankaj; Mockaitis, Keithanne; Liston, Aaron; Mane, Shrinivasrao P; Burns, Paul; Davis, Thomas M; Slovin, Janet P; Bassil, Nahla; Hellens, Roger P; Evans, Clive; Harkins, Tim; Kodira, Chinnappa; Desany, Brian; Crasta, Oswald R; Jensen, Roderick V; Allan, Andrew C; Michael, Todd P; Setubal, Joao Carlos; Celton, Jean-Marc; Rees, D Jasper G; Williams, Kelly P; Holt, Sarah H; Ruiz Rojas, Juan Jairo; Chatterjee, Mithu; Liu, Bo; Silva, Herman; Meisel, Lee; Adato, Avital; Filichkin, Sergei A; Troggio, Michela; Viola, Roberto; Ashman, Tia-Lynn; Wang, Hao; Dharmawardhana, Palitha; Elser, Justin; Raja, Rajani; Priest, Henry D; Bryant, Douglas W; Fox, Samuel E; Givan, Scott A; Wilhelm, Larry J; Naithani, Sushma; Christoffels, Alan; Salama, David Y; Carter, Jade; Lopez Girona, Elena; Zdepski, Anna; Wang, Wenqin; Kerstetter, Randall A; Schwab, Wilfried; Korban, Schuyler S; Davik, Jahn; Monfort, Amparo; Denoyes-Rothan, Beatrice; Arus, Pere; Mittler, Ron; Flinn, Barry; Aharoni, Asaph; Bennetzen, Jeffrey L; Salzberg, Steven L; Dickerman, Allan W; Velasco, Riccardo; Borodovsky, Mark; Veilleux, Richard E; Folta, Kevin M

    2011-02-01

    The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.

  19. The evolution of chloroplast genes and genomes in ferns.

    PubMed

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  20. Large Genomic Fragment Deletions and Insertions in Mouse Using CRISPR/Cas9

    PubMed Central

    Satheka, Achim Cchitvsanzwhoh; Togo, Jacques; An, Yao; Humphrey, Mabwi; Ban, Luying; Ji, Yan; Jin, Honghong; Feng, Xuechao; Zheng, Yaowu

    2015-01-01

    ZFN, TALENs and CRISPR/Cas9 system have been used to generate point mutations and large fragment deletions and insertions in genomic modifications. CRISPR/Cas9 system is the most flexible and fast developing technology that has been extensively used to make mutations in all kinds of organisms. However, the most mutations reported up to date are small insertions and deletions. In this report, CRISPR/Cas9 system was used to make large DNA fragment deletions and insertions, including entire Dip2a gene deletion, about 65kb in size, and β-galactosidase (lacZ) reporter gene insertion of larger than 5kb in mouse. About 11.8% (11/93) are positive for 65kb deletion from transfected and diluted ES clones. High targeting efficiencies in ES cells were also achieved with G418 selection, 46.2% (12/26) and 73.1% (19/26) for left and right arms respectively. Targeted large fragment deletion efficiency is about 21.4% of live pups or 6.0% of injected embryos. Targeted insertion of lacZ reporter with NEO cassette showed 27.1% (13/48) of targeting rate by ES cell transfection and 11.1% (2/18) by direct zygote injection. The procedures have bypassed in vitro transcription by directly co-injection of zygotes or co-transfection of embryonic stem cells with circular plasmid DNA. The methods are technically easy, time saving, and cost effective in generating mouse models and will certainly facilitate gene function studies. PMID:25803037

  1. Large-scale contamination of microbial isolate genomes by Illumina PhiX control.

    PubMed

    Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia; Kyrpides, Nikos C; Pati, Amrita

    2015-01-01

    With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world's biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community.

  2. Large-scale contamination of microbial isolate genomes by Illumina PhiX control

    PubMed Central

    2015-01-01

    With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world’s biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community. PMID:26203331

  3. Reconstruction of oomycete genome evolution identifies differences in evolutionary trajectories leading to present-day large gene families.

    PubMed

    Seidl, Michael F; Van den Ackerveken, Guido; Govers, Francine; Snel, Berend

    2012-01-01

    The taxonomic class of oomycetes contains numerous pathogens of plants and animals but is related to nonpathogenic diatoms and brown algae. Oomycetes have flexible genomes comprising large gene families that play roles in pathogenicity. The evolutionary processes that shaped the gene content have not yet been studied by applying systematic tree reconciliation of the phylome of these species. We analyzed evolutionary dynamics of ten Stramenopiles. Gene gains, duplications, and losses were inferred by tree reconciliation of 18,459 gene trees constituting the phylome with a highly supported species phylogeny. We reconstructed a strikingly large last common ancestor of the Stramenopiles that contained ~10,000 genes. Throughout evolution, the genomes of pathogenic oomycetes have constantly gained and lost genes, though gene gains through duplications outnumber the losses. The branch leading to the plant pathogenic Phytophthora genus was identified as a major transition point characterized by increased frequency of duplication events that has likely driven the speciation within this genus. Large gene families encoding different classes of enzymes associated with pathogenicity such as glycoside hydrolases are formed by complex and distinct patterns of duplications and losses leading to their expansion in extant oomycetes. This study unveils the large-scale evolutionary dynamics that shaped the genomes of pathogenic oomycetes. By the application of phylogenetic based analyses methods, it provides additional insights that shed light on the complex history of oomycete genome evolution and the emergence of large gene families characteristic for this important class of pathogens.

  4. Genome Sequences of Marine Shrimp Exopalaemon carinicauda Holthuis Provide Insights into Genome Size Evolution of Caridea

    PubMed Central

    Yuan, Jianbo; Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2017-01-01

    Crustacea, particularly Decapoda, contains many economically important species, such as shrimps and crabs. Crustaceans exhibit enormous (nearly 500-fold) variability in genome size. However, limited genome resources are available for investigating these species. Exopalaemon carinicauda Holthuis, an economical caridean shrimp, is a potential ideal experimental animal for research on crustaceans. In this study, we performed low-coverage sequencing and de novo assembly of the E. carinicauda genome. The assembly covers more than 95% of coding regions. E. carinicauda possesses a large complex genome (5.73 Gb), with size twice higher than those of many decapod shrimps. As such, comparative genomic analyses were implied to investigate factors affecting genome size evolution of decapods. However, clues associated with genome duplication were not identified, and few horizontally transferred sequences were detected. Ultimately, the burst of transposable elements, especially retrotransposons, was determined as the major factor influencing genome expansion. A total of 2 Gb repeats were identified, and RTE-BovB, Jockey, Gypsy, and DIRS were the four major retrotransposons that significantly expanded. Both recent (Jockey and Gypsy) and ancestral (DIRS) originated retrotransposons responsible for the genome evolution. The E. carinicauda genome also exhibited potential for the genomic and experimental research of shrimps. PMID:28678163

  5. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    PubMed

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer.

  6. Estimating Divergence Time and Ancestral Effective Population Size of Bornean and Sumatran Orangutan Subspecies Using a Coalescent Hidden Markov Model

    PubMed Central

    Mailund, Thomas; Dutheil, Julien Y.; Hobolth, Asger; Lunter, Gerton; Schierup, Mikkel H.

    2011-01-01

    Due to genetic variation in the ancestor of two populations or two species, the divergence time for DNA sequences from two populations is variable along the genome. Within genomic segments all bases will share the same divergence—because they share a most recent common ancestor—when no recombination event has occurred to split them apart. The size of these segments of constant divergence depends on the recombination rate, but also on the speciation time, the effective population size of the ancestral population, as well as demographic effects and selection. Thus, inference of these parameters may be possible if we can decode the divergence times along a genomic alignment. Here, we present a new hidden Markov model that infers the changing divergence (coalescence) times along the genome alignment using a coalescent framework, in order to estimate the speciation time, the recombination rate, and the ancestral effective population size. The model is efficient enough to allow inference on whole-genome data sets. We first investigate the power and consistency of the model with coalescent simulations and then apply it to the whole-genome sequences of the two orangutan sub-species, Bornean (P. p. pygmaeus) and Sumatran (P. p. abelii) orangutans from the Orangutan Genome Project. We estimate the speciation time between the two sub-species to be thousand years ago and the effective population size of the ancestral orangutan species to be , consistent with recent results based on smaller data sets. We also report a negative correlation between chromosome size and ancestral effective population size, which we interpret as a signature of recombination increasing the efficacy of selection. PMID:21408205

  7. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

    PubMed Central

    de Vries, Paul S.; Sabater-Lleal, Maria; Chasman, Daniel I.; Trompet, Stella; Kleber, Marcus E.; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R.; Marioni, Riccardo E.; Weng, Lu-Chen; Grossmann, Vera; Brody, Jennifer A.; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M.; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J.; Rumley, Ann; Mulas, Antonella; de Craen, Anton J. M.; Grotevendt, Anne; Taylor, Kent D.; Delgado, Graciela E.; Kifley, Annette; Lopez, Lorna M.; Berentzen, Tina L.; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C.; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P. M.; Draisma, Harmen H. M.; Lowe, Gordon D.; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J.; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G.; McEvoy, Mark A.; Starr, John M.; Hysi, Pirro G.; Hernandez, Dena G.; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L.; Slagboom, P. Eline; Zeller, Tanja; Psaty, Bruce M.; Uitterlinden, André G.; de Geus, Eco J. C.; Stott, David J.; Binder, Harald; Hofman, Albert; Franco, Oscar H.; Rotter, Jerome I.; Ferrucci, Luigi; Spector, Tim D.; Deary, Ian J.; März, Winfried; Greinacher, Andreas; Wild, Philipp S.; Cucca, Francesco; Boomsma, Dorret I.; Watkins, Hugh; Tang, Weihong; Ridker, Paul M.; Jukema, Jan W.; Scott, Rodney J.; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J.; Smith, Nicholas L.; Strachan, David P.

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10−8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10−8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development. PMID:28107422

  8. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study.

    PubMed

    de Vries, Paul S; Sabater-Lleal, Maria; Chasman, Daniel I; Trompet, Stella; Ahluwalia, Tarunveer S; Teumer, Alexander; Kleber, Marcus E; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R; Marioni, Riccardo E; Steri, Maristella; Weng, Lu-Chen; Pool, Rene; Grossmann, Vera; Brody, Jennifer A; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Frånberg, Mattias; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J; Rumley, Ann; Mulas, Antonella; de Craen, Anton J M; Grotevendt, Anne; Taylor, Kent D; Delgado, Graciela E; Kifley, Annette; Lopez, Lorna M; Berentzen, Tina L; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P M; Draisma, Harmen H M; Lowe, Gordon D; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G; McEvoy, Mark A; Starr, John M; Hysi, Pirro G; Hernandez, Dena G; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L; Slagboom, P Eline; Zeller, Tanja; Psaty, Bruce M; Uitterlinden, André G; de Geus, Eco J C; Stott, David J; Binder, Harald; Hofman, Albert; Franco, Oscar H; Rotter, Jerome I; Ferrucci, Luigi; Spector, Tim D; Deary, Ian J; März, Winfried; Greinacher, Andreas; Wild, Philipp S; Cucca, Francesco; Boomsma, Dorret I; Watkins, Hugh; Tang, Weihong; Ridker, Paul M; Jukema, Jan W; Scott, Rodney J; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J; Smith, Nicholas L; Strachan, David P; Dehghan, Abbas

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10-8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10-8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development.

  9. Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes

    PubMed Central

    Voss, Stephen R.; Kump, D. Kevin; Putta, Srikrishna; Pauly, Nathan; Reynolds, Anna; Henry, Rema J.; Basa, Saritha; Walker, John A.; Smith, Jeramiah J.

    2011-01-01

    Amphibian genomes differ greatly in DNA content and chromosome size, morphology, and number. Investigations of this diversity are needed to identify mechanisms that have shaped the evolution of vertebrate genomes. We used comparative mapping to investigate the organization of genes in the Mexican axolotl (Ambystoma mexicanum), a species that presents relatively few chromosomes (n = 14) and a gigantic genome (>20 pg/N). We show extensive conservation of synteny between Ambystoma, chicken, and human, and a positive correlation between the length of conserved segments and genome size. Ambystoma segments are estimated to be four to 51 times longer than homologous human and chicken segments. Strikingly, genes demarking the structures of 28 chicken chromosomes are ordered among linkage groups defining the Ambystoma genome, and we show that these same chromosomal segments are also conserved in a distantly related anuran amphibian (Xenopus tropicalis). Using linkage relationships from the amphibian maps, we predict that three chicken chromosomes originated by fusion, nine to 14 originated by fission, and 12–17 evolved directly from ancestral tetrapod chromosomes. We further show that some ancestral segments were fused prior to the divergence of salamanders and anurans, while others fused independently and randomly as chromosome numbers were reduced in lineages leading to Ambystoma and Xenopus. The maintenance of gene order relationships between chromosomal segments that have greatly expanded and contracted in salamander and chicken genomes, respectively, suggests selection to maintain synteny relationships and/or extremely low rates of chromosomal rearrangement. Overall, the results demonstrate the value of data from diverse, amphibian genomes in studies of vertebrate genome evolution. PMID:21482624

  10. Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes.

    PubMed

    Voss, Stephen R; Kump, D Kevin; Putta, Srikrishna; Pauly, Nathan; Reynolds, Anna; Henry, Rema J; Basa, Saritha; Walker, John A; Smith, Jeramiah J

    2011-08-01

    Amphibian genomes differ greatly in DNA content and chromosome size, morphology, and number. Investigations of this diversity are needed to identify mechanisms that have shaped the evolution of vertebrate genomes. We used comparative mapping to investigate the organization of genes in the Mexican axolotl (Ambystoma mexicanum), a species that presents relatively few chromosomes (n = 14) and a gigantic genome (>20 pg/N). We show extensive conservation of synteny between Ambystoma, chicken, and human, and a positive correlation between the length of conserved segments and genome size. Ambystoma segments are estimated to be four to 51 times longer than homologous human and chicken segments. Strikingly, genes demarking the structures of 28 chicken chromosomes are ordered among linkage groups defining the Ambystoma genome, and we show that these same chromosomal segments are also conserved in a distantly related anuran amphibian (Xenopus tropicalis). Using linkage relationships from the amphibian maps, we predict that three chicken chromosomes originated by fusion, nine to 14 originated by fission, and 12-17 evolved directly from ancestral tetrapod chromosomes. We further show that some ancestral segments were fused prior to the divergence of salamanders and anurans, while others fused independently and randomly as chromosome numbers were reduced in lineages leading to Ambystoma and Xenopus. The maintenance of gene order relationships between chromosomal segments that have greatly expanded and contracted in salamander and chicken genomes, respectively, suggests selection to maintain synteny relationships and/or extremely low rates of chromosomal rearrangement. Overall, the results demonstrate the value of data from diverse, amphibian genomes in studies of vertebrate genome evolution.

  11. Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset

    PubMed Central

    Anagnostou, Paolo; Dominici, Valentina; Battaggia, Cinzia; Pagani, Luca; Vilar, Miguel; Wells, R. Spencer; Pettener, Davide; Sarno, Stefania; Boattini, Alessio; Francalacci, Paolo; Colonna, Vincenza; Vona, Giuseppe; Calò, Carla; Destro Bisol, Giovanni; Tofanelli, Sergio

    2017-01-01

    Human populations are often dichotomized into “isolated” and “open” categories using cultural and/or geographical barriers to gene flow as differential criteria. Although widespread, the use of these alternative categories could obscure further heterogeneity due to inter-population differences in effective size, growth rate, and timing or amount of gene flow. We compared intra and inter-population variation measures combining novel and literature data relative to 87,818 autosomal SNPs in 14 open populations and 10 geographic and/or linguistic European isolates. Patterns of intra-population diversity were found to vary considerably more among isolates, probably due to differential levels of drift and inbreeding. The relatively large effective size estimated for some population isolates challenges the generalized view that they originate from small founding groups. Principal component scores based on measures of intra-population variation of isolated and open populations were found to be distributed along a continuum, with an area of intersection between the two groups. Patterns of inter-population diversity were even closer, as we were able to detect some differences between population groups only for a few multidimensional scaling dimensions. Therefore, different lines of evidence suggest that dichotomizing human populations into open and isolated groups fails to capture the actual relations among their genomic features. PMID:28145502

  12. Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset.

    PubMed

    Anagnostou, Paolo; Dominici, Valentina; Battaggia, Cinzia; Pagani, Luca; Vilar, Miguel; Wells, R Spencer; Pettener, Davide; Sarno, Stefania; Boattini, Alessio; Francalacci, Paolo; Colonna, Vincenza; Vona, Giuseppe; Calò, Carla; Destro Bisol, Giovanni; Tofanelli, Sergio

    2017-02-01

    Human populations are often dichotomized into "isolated" and "open" categories using cultural and/or geographical barriers to gene flow as differential criteria. Although widespread, the use of these alternative categories could obscure further heterogeneity due to inter-population differences in effective size, growth rate, and timing or amount of gene flow. We compared intra and inter-population variation measures combining novel and literature data relative to 87,818 autosomal SNPs in 14 open populations and 10 geographic and/or linguistic European isolates. Patterns of intra-population diversity were found to vary considerably more among isolates, probably due to differential levels of drift and inbreeding. The relatively large effective size estimated for some population isolates challenges the generalized view that they originate from small founding groups. Principal component scores based on measures of intra-population variation of isolated and open populations were found to be distributed along a continuum, with an area of intersection between the two groups. Patterns of inter-population diversity were even closer, as we were able to detect some differences between population groups only for a few multidimensional scaling dimensions. Therefore, different lines of evidence suggest that dichotomizing human populations into open and isolated groups fails to capture the actual relations among their genomic features.

  13. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    USDA-ARS?s Scientific Manuscript database

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  14. Assessing the Accuracy of Ancestral Protein Reconstruction Methods

    PubMed Central

    Williams, Paul D; Pollock, David D; Blackburne, Benjamin P; Goldstein, Richard A

    2006-01-01

    The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated. PMID:16789817

  15. Physical mapping of a large plant genome using global high-information-content-fingerprinting: the distal region of the wheat ancestor Aegilops tauschii chromosome 3DS.

    USDA-ARS?s Scientific Manuscript database

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of wheat genome, Aegilops tauschii, is used as a resource for wheat...

  16. Physical mapping of a large plant genome using global high-information content fingerprinting: a distal region of wheat chromosome 3DS

    USDA-ARS?s Scientific Manuscript database

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of wheat. We report the use of the Ae. tauschii, the diploid ancestor of the wheat D genome, for the construction of t...

  17. The Genome Sequence of Avibacterium paragallinarum Strain CL Has a Large Repertoire of Insertion Sequence Elements.

    PubMed

    Horta-Valerdi, Guillermo; Sanchez-Alonso, Maria Patricia; Perez-Marquez, Victor M; Negrete-Abascal, Erasmo; Vaca-Pacheco, Sergio; Hernandez-Gonzalez, Ismael; Gomez-Lunar, Zulema; Olmedo-Álvarez, Gabriela; Vázquez-Cruz, Candelario

    2017-04-13

    The draft genome sequence of Avibacterium paragallinarum strain CL serovar C is reported here. The genome comprises 154 contigs corresponding to 2.4 Mb with 41% G+C content and many insertion sequence (IS) elements, a characteristic not previously reported in A. paragallinarum. Copyright © 2017 Horta-Valerdi et al.

  18. The Genome Sequence of Avibacterium paragallinarum Strain CL Has a Large Repertoire of Insertion Sequence Elements

    PubMed Central

    Horta-Valerdi, Guillermo; Sanchez-Alonso, Maria Patricia; Perez-Marquez, Victor M.; Negrete-Abascal, Erasmo; Vaca-Pacheco, Sergio; Hernandez-Gonzalez, Ismael; Gomez-Lunar, Zulema; Olmedo-Álvarez, Gabriela

    2017-01-01

    ABSTRACT The draft genome sequence of Avibacterium paragallinarum strain CL serovar C is reported here. The genome comprises 154 contigs corresponding to 2.4 Mb with 41% G+C content and many insertion sequence (IS) elements, a characteristic not previously reported in A. paragallinarum. PMID:28408672

  19. Testing the large genome constraint hypothesis: plant traits, habitat and climate seasonality in Liliaceae.

    PubMed

    Carta, Angelino; Peruzzi, Lorenzo

    2016-04-01

    The factors driving genome size evolution in Liliaceae were examined. In particular, we investigated whether species with larger genomes are confined to less stressful environments with a longer vegetative season. We tested our hypotheses by correlating the genome size with other plant traits and environmental variables. To determine the adaptive nature of the genome size, we also compared the performances of Brownian motion (BM) processes with those inferred by Ornstein-Uhlenbeck (OU) models of trait evolution. A positive correlation of genome size with plant size, mean temperature and habitat moisture and a negative correlation with altitude and precipitation seasonality were found. Models of trait evolution revealed a deviation from a drift process or BM. Instead, changes in genome size were significantly associated with precipitation regimes according to an OU process. Specifically, the evolutionary optima towards which the genome size evolves were higher for humid climates and lower for drier ones. Taken together, our results indicate that the genome size increase in Liliaceae is constrained by climate seasonality.

  20. Geography disentangles introgression from ancestral polymorphism in Lake Malawi cichlids.

    PubMed

    Mims, Meryl C; Darrin Hulsey, C; Fitzpatrick, Benjamin M; Streelman, J Todd

    2010-03-01

    Phenotypically diverse Lake Malawi cichlids exhibit similar genomes. The extensive sharing of genetic polymorphism among forms has both intrigued and frustrated biologists trying to understand the nature of diversity in this and other rapidly evolving systems. Shared polymorphism might result from hybridization and/or the retention of ancestrally polymorphic alleles. To examine these alternatives, we used new genomic tools to characterize genetic differentiation in widespread, geographically structured populations of Labeotropheus fuelleborni and Metriaclima zebra. These phenotypically distinct species share mitochondrial DNA (mtDNA) haplotypes and show greater mtDNA differentiation among localities than between species. However, Bayesian analysis of nuclear single nucleotide polymorphism (SNP) data revealed two distinct genetic clusters corresponding perfectly to morphologically diagnosed L. fuelleborni and M. zebra. This result is a function of the resolving power of the multi-locus dataset, not a conflict between nuclear and mitochondrial partitions. Locus-by-locus analysis showed that mtDNA differentiation between species (F(CT)) was nearly identical to the median single-locus SNP F(CT). Finally, we asked whether there is evidence for gene flow at sites of co-occurrence. We used simulations to generate a null distribution for the level of differentiation between co-occurring populations of L. fuelleborni and M. zebra expected if there was no hybridization. The null hypothesis was rejected for the SNP data; populations that co-occur at rock reef sites were slightly more similar than expected by chance, suggesting recent gene flow. The coupling of numerous independent markers with extensive geographic sampling and simulations utilized here provides a framework for assessing the prevalence of gene flow in recently diverged species.

  1. Multiple occurrences of giant virus core genes acquired by eukaryotic genomes: the visible part of the iceberg?

    PubMed

    Filée, Jonathan

    2014-10-01

    Giant Viruses are a widespread group of viruses, characterized by huge genomes composed of a small subset of ancestral, vertically inherited core genes along with a large body of highly variable genes. In this study, I report the acquisition of 23 core ancestral Giant Virus genes by diverse eukaryotic species including various protists, a moss and a cnidarian. The viral genes are inserted in large scaffolds or chromosomes with intron-rich, eukaryotic-like genomic contexts, refuting the possibility of DNA contaminations. Some of these genes are expressed and in the cryptophyte alga Guillardia theta, a possible non-homologous displacement of the eukaryotic DNA primase by a viral D5 helicase/primase is documented. As core Giant Virus genes represent only a tiny fraction of the total genomic repertoire of these viruses, these results suggest that Giant Viruses represent an underestimated source of new genes and functions for their hosts. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines.

    PubMed

    Eppinger, Mark; Baar, Claudia; Linz, Bodo; Raddatz, Günter; Lanz, Christa; Keller, Heike; Morelli, Giovanna; Gressmann, Helga; Achtman, Mark; Schuster, Stephan C

    2006-07-01

    Helicobacter pylori infection of humans is so old that its population genetic structure reflects that of ancient human migrations. A closely related species, Helicobacter acinonychis, is specific for large felines, including cheetahs, lions, and tigers, whereas hosts more closely related to humans harbor more distantly related Helicobacter species. This observation suggests a jump between host species. But who ate whom and when did it happen? In order to resolve this question, we determined the genomic sequence of H. acinonychis strain Sheeba and compared it to genomes from H. pylori. The conserved core genes between the genomes are so similar that the host jump probably occurred within the last 200,000 (range 50,000-400,000) years. However, the Sheeba genome also possesses unique features that indicate the direction of the host jump, namely from early humans to cats. Sheeba possesses an unusually large number of highly fragmented genes, many encoding outer membrane proteins, which may have been destroyed in order to bypass deleterious responses from the feline host immune system. In addition, the few Sheeba-specific genes that were found include a cluster of genes encoding sialylation of the bacterial cell surface carbohydrates, which were imported by horizontal genetic exchange and might also help to evade host immune defenses. These results provide a genomic basis for elucidating molecular events that allow bacteria to adapt to novel animal hosts.

  3. Comparative genome analysis identifies two large deletions in the genome of highly-passaged attenuated Streptococcus agalactiae strain YM001 compared to the parental pathogenic strain HN016.

    PubMed

    Wang, Rui; Li, Liping; Huang, Yan; Luo, Fuguang; Liang, Wanwen; Gan, Xi; Huang, Ting; Lei, Aiying; Chen, Ming; Chen, Lianfu

    2015-11-04

    Streptococcus agalactiae (S. agalactiae), also known as group B Streptococcus (GBS), is an important pathogen for neonatal pneumonia, meningitis, bovine mastitis, and fish meningoencephalitis. The global outbreaks of Streptococcus disease in tilapia cause huge economic losses and threaten human food hygiene safety as well. To investigate the mechanism of S. agalactiae pathogenesis in tilapia and develop attenuated S. agalactiae vaccine, this study sequenced and comparatively analyzed the whole genomes of virulent wild-type S. agalactiae strain HN016 and its highly-passaged attenuated strain YM001 derived from tilapia. We performed Illumina sequencing of DNA prepared from strain HN016 and YM001. Sequencedreads were assembled and nucleotide comparisons, single nucleotide polymorphism (SNP) , indels were analyzed between the draft genomes of HN016 and YM001. Clustered regularly interspaced short palindromic repeats (CRISPRs) and prophage were detected and analyzed in different S. agalactiae strains. The genome of S. agalactiae YM001 was 2,047,957 bp with a GC content of 35.61 %; it contained 2044 genes and 88 RNAs. Meanwhile, the genome of S. agalactiae HN016 was 2,064,722 bp with a GC content of 35.66 %; it had 2063 genes and 101 RNAs. Comparative genome analysis indicated that compared with HN016, YM001 genome had two significant large deletions, at the sizes of 5832 and 11,116 bp respectively, resulting in the deletion of three rRNA and ten tRNA genes, as well as the deletion and functional damage of ten genes related to metabolism, transport, growth, anti-stress, etc. Besides these two large deletions, other ten deletions and 28 single nucleotide variations (SNVs) were also identified, mainly affecting the metabolism- and growth-related genes. The genome of attenuated S. agalactiae YM001 showed significant variations, resulting in the deletion of 10 functional genes, compared to the parental pathogenic strain HN016. The deleted and mutated functional genes all

  4. Evo-Devo: Variations on Ancestral Themes

    PubMed Central

    De Robertis, E.M.

    2008-01-01

    Most animals evolved from a common ancestor, Urbilateria, which already had in place the developmental genetic networks for shaping body plans. Comparative genomics has revealed rather unexpectedly that many of the genes present in bilaterian animal ancestors were lost by individual phyla during evolution. Reconstruction of the archetypal developmental genomic tool-kit present in Urbilateria will help to elucidate the contribution of gene loss and developmental constraints to the evolution of animal body plans. PMID:18243095

  5. Deductions about the Number, Organization, and Evolution of Genes in the Tomato Genome Based on Analysis of a Large Expressed Sequence Tag Collection and Selective Genomic Sequencing

    PubMed Central

    Van der Hoeven, Rutger; Ronning, Catherine; Giovannoni, James; Martin, Gregory; Tanksley, Steven

    2002-01-01

    Analysis of a collection of 120,892 single-pass ESTs, derived from 26 different tomato cDNA libraries and reduced to a set of 27,274 unique consensus sequences (unigenes), revealed that 70% of the unigenes have identifiable homologs in the Arabidopsis genome. Genes corresponding to metabolism have remained most conserved between these two genomes, whereas genes encoding transcription factors are among the fastest evolving. The majority of the 10 largest conserved multigene families share similar copy numbers in tomato and Arabidopsis, suggesting that the multiplicity of these families may have occurred before the divergence of these two species. An exception to this multigene conservation was observed for the E8-like protein family, which is associated with fruit ripening and has higher copy number in tomato than in Arabidopsis. Finally, six BAC clones from different parts of the tomato genome were isolated, genetically mapped, sequenced, and annotated. The combined analysis of the EST database and these six sequenced BACs leads to the prediction that the tomato genome encodes ∼35,000 genes, which are sequestered largely in euchromatic regions corresponding to less than one-quarter of the total DNA in the tomato nucleus. PMID:12119366

  6. Centromere Destiny in Dicentric Chromosomes: New Insights from the Evolution of Human Chromosome 2 Ancestral Centromeric Region.

    PubMed

    Chiatante, Giorgia; Giannuzzi, Giuliana; Calabrese, Francesco Maria; Eichler, Evan E; Ventura, Mario

    2017-07-01

    Dicentric chromosomes are products of genomic rearrangements that place two centromeres on the same chromosome. Due to the presence of two primary constrictions, they are inherently unstable and overcome their instability by epigenetically inactivating and/or deleting one of the two centromeres, thus resulting in functionally monocentric chromosomes that segregate normally during cell division. Our understanding to date of dicentric chromosome formation, behavior and fate has been largely inferred from observational studies in plants and humans as well as artificially produced de novo dicentrics in yeast and in human cells. We investigate the most recent product of a chromosome fusion event fixed in the human lineage, human chromosome 2, whose stability was acquired by the suppression of one centromere, resulting in a unique difference in chromosome number between humans (46 chromosomes) and our most closely related ape relatives (48 chromosomes). Using molecular cytogenetics, sequencing, and comparative sequence data, we deeply characterize the relicts of the chromosome 2q ancestral centromere and its flanking regions, gaining insight into the ancestral organization that can be easily broadened to all acrocentric chromosome centromeres. Moreover, our analyses offered the opportunity to trace the evolutionary history of rDNA and satellite III sequences among great apes, thus suggesting a new hypothesis for the preferential inactivation of some human centromeres, including IIq. Our results suggest two possible centromere inactivation models to explain the evolutionarily stabilization of human chromosome 2 over the last 5-6 million years. Our results strongly favor centromere excision through a one-step process. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.

    PubMed

    Matsumoto, Tomotaka; Akashi, Hiroshi; Yang, Ziheng

    2015-07-01

    Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.

  8. Matrilocal residence is ancestral in Austronesian societies

    PubMed Central

    Jordan, Fiona M.; Gray, Russell D.; Greenhill, Simon J.; Mace, Ruth

    2009-01-01

    The nature of social life in human prehistory is elusive, yet knowing how kinship systems evolve is critical for understanding population history and cultural diversity. Post-marital residence rules specify sex-specific dispersal and kin association, influencing the pattern of genetic markers across populations. Cultural phylogenetics allows us to practise ‘virtual archaeology’ on these aspects of social life that leave no trace in the archaeological record. Here we show that early Austronesian societies practised matrilocal post-marital residence. Using a Markov-chain Monte Carlo comparative method implemented in a Bayesian phylogenetic framework, we estimated the type of residence at each ancestral node in a sample of Austronesian language trees spanning 135 Pacific societies. Matrilocal residence has been hypothesized for proto-Oceanic society (ca 3500 BP), but we find strong evidence that matrilocality was predominant in earlier Austronesian societies ca 5000–4500 BP, at the root of the language family and its early branches. Our results illuminate the divergent patterns of mtDNA and Y-chromosome markers seen in the Pacific. The analysis of present-day cross-cultural data in this way allows us to directly address cultural evolutionary and life-history processes in prehistory. PMID:19324748

  9. Large-scale computational and statistical analyses of high transcription potentialities in 32 prokaryotic genomes

    PubMed Central

    Sinoquet, Christine; Demey, Sylvain; Braun, Frédérique

    2008-01-01

    This article compares 32 bacterial genomes with respect to their high transcription potentialities. The σ70 promoter has been widely studied for Escherichia coli model and a consensus is known. Since transcriptional regulations are known to compensate for promoter weakness (i.e. when the promoter similarity with regard to the consensus is rather low), predicting functional promoters is a hard task. Instead, the research work presented here comes within the scope of investigating potentially high ORF expression, in relation with three criteria: (i) high similarity to the σ70 consensus (namely, the consensus variant appropriate for each genome), (ii) transcription strength reinforcement through a supplementary binding site—the upstream promoter (UP) element—and (iii) enhancement through an optimal Shine-Dalgarno (SD) sequence. We show that in the AT-rich Firmicutes’ genomes, frequencies of potentially strong σ70-like promoters are exceptionally high. Besides, though they contain a low number of strong promoters (SPs), some genomes may show a high proportion of promoters harbouring an UP element. Putative SPs of lesser quality are more frequently associated with an UP element than putative strong promoters of better quality. A meaningful difference is statistically ascertained when comparing bacterial genomes with similarly AT-rich genomes generated at random; the difference is the highest for Firmicutes. Comparing some Firmicutes genomes with similarly AT-rich Proteobacteria genomes, we confirm the Firmicutes specificity. We show that this specificity is neither explained by AT-bias nor genome size bias; neither does it originate in the abundance of optimal SD sequences, a typical and significant feature of Firmicutes more thoroughly analysed in our study. PMID:18440978

  10. Accelerating matchmaking of novel dysmorphology syndromes through clinical and genomic characterization of a large cohort.

    PubMed

    Shaheen, Ranad; Patel, Nisha; Shamseldin, Hanan; Alzahrani, Fatema; Al-Yamany, Ruah; ALMoisheer, Agaadir; Ewida, Nour; Anazi, Shamsa; Alnemer, Maha; Elsheikh, Mohamed; Alfaleh, Khaled; Alshammari, Muneera; Alhashem, Amal; Alangari, Abdullah A; Salih, Mustafa A; Kircher, Martin; Daza, Riza M; Ibrahim, Niema; Wakil, Salma M; Alaqeel, Ahmed; Altowaijri, Ikhlas; Shendure, Jay; Al-Habib, Amro; Faqieh, Eissa; Alkuraya, Fowzan S

    2016-07-01

    Dysmorphology syndromes are among the most common referrals to clinical genetics specialists. Inability to match the dysmorphology pattern to a known syndrome can pose a major diagnostic challenge. With an aim to accelerate the establishment of new syndromes and their genetic etiology, we describe our experience with multiplex consanguineous families that appeared to represent novel autosomal recessive dysmorphology syndromes at the time of evaluation. Combined autozygome/exome analysis of multiplex consanguineous families with apparently novel dysmorphology syndromes. Consistent with the apparent novelty of the phenotypes, our analysis revealed a strong candidate variant in genes that were novel at the time of the analysis in the majority of cases, and 10 of these genes are published here for the first time as novel candidates (CDK9, NEK9, ZNF668, TTC28, MBL2, CADPS, CACNA1H, HYAL2, CTU2, and C3ORF17). A significant minority of the phenotypes (6/31, 19%), however, were caused by genes known to cause Mendelian phenotypes, thus expanding the phenotypic spectrum of the diseases linked to these genes. The conspicuous inheritance pattern and the highly specific phenotypes appear to have contributed to the high yield (90%) of plausible molecular diagnoses in our study cohort. Reporting detailed clinical and genomic analysis of a large series of apparently novel dysmorphology syndromes will likely lead to a trend to accelerate the establishment of novel syndromes and their underlying genes through open exchange of data for the benefit of patients, their families, health-care providers, and the research community.Genet Med 18 7, 686-695.

  11. Enhancing genetic mapping of complex genomes through the design of highly-multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce

    PubMed Central

    Pavy, Nathalie; Pelgas, Betty; Beauseigle, Stéphanie; Blais, Sylvie; Gagnon, France; Gosselin, Isabelle; Lamothe, Manuel; Isabel, Nathalie; Bousquet, Jean

    2008-01-01

    Background To explore the potential value of high-throughput genotyping assays in the analysis of large and complex genomes, we designed two highly multiplexed Illumina bead arrays using the GoldenGate SNP assay for gene mapping in white spruce (Picea glauca [Moench] Voss) and black spruce (Picea mariana [Mill.] B.S.P.). Results Each array included 768 SNPs, identified by resequencing genomic DNA from parents of each mapping population. For white spruce and black spruce, respectively, 69.2% and 77.1% of genotyped SNPs had valid GoldenGate assay scores and segregated in the mapping populations. For each of these successful SNPs, on average, valid genotyping scores were obtained for over 99% of progeny. SNP data were integrated to pre-existing ALFP, ESTP, and SSR markers to construct two individual linkage maps and a composite map for white spruce and black spruce genomes. The white spruce composite map contained 821 markers including 348 gene loci. Also, 835 markers including 328 gene loci were positioned on the black spruce composite map. In total, 215 anchor markers (mostly gene markers) were shared between the two species. Considering lineage divergence at least 10 Myr ago between the two spruces, interspecific comparison of homoeologous linkage groups revealed remarkable synteny and marker colinearity. Conclusion The design of customized highly multiplexed Illumina SNP arrays appears as an efficient procedure to enhance the mapping of expressed genes and make linkage maps more informative and powerful in such species with poorly known genomes. This genotyping approach will open new avenues for co-localizing candidate genes and QTLs, partial genome sequencing, and comparative mapping across conifers. PMID:18205909

  12. Reticulate Evolution of the Rye Genome[W][OPEN

    PubMed Central

    Martis, Mihaela M.; Zhou, Ruonan; Haseneyer, Grit; Schmutzer, Thomas; Vrána, Jan; Kubaláková, Marie; König, Susanne; Kugler, Karl G.; Scholz, Uwe; Hackauf, Bernd; Korzun, Viktor; Schön, Chris-Carolin; Doležel, Jaroslav; Bauer, Eva; Mayer, Klaus F.X.; Stein, Nils

    2013-01-01

    Rye (Secale cereale) is closely related to wheat (Triticum aestivum) and barley (Hordeum vulgare). Due to its large genome (∼8 Gb) and its regional importance, genome analysis of rye has lagged behind other cereals. Here, we established a virtual linear gene order model (genome zipper) comprising 22,426 or 72% of the detected set of 31,008 rye genes. This was achieved by high-throughput transcript mapping, chromosome survey sequencing, and integration of conserved synteny information of three sequenced model grass genomes (Brachypodium distachyon, rice [Oryza sativa], and sorghum [Sorghum bicolor]). This enabled a genome-wide high-density comparative analysis of rye/barley/model grass genome synteny. Seventeen conserved syntenic linkage blocks making up the rye and barley genomes were defined in comparison to model grass genomes. Six major translocations shaped the modern rye genome in comparison to a putative Triticeae ancestral genome. Strikingly dissimilar conserved syntenic gene content, gene sequence diversity signatures, and phylogenetic networks were found for individual rye syntenic blocks. This indicates that introgressive hybridizations (diploid or polyploidy hybrid speciation) and/or a series of whole-genome or chromosome duplications played a role in rye speciation and genome evolution. PMID:24104565

  13. Comparative genomics of protoploid Saccharomycetaceae

    PubMed Central

    Souciet, Jean-Luc; Dujon, Bernard; Gaillardin, Claude; Johnston, Mark; Baret, Philippe V.; Cliften, Paul; Sherman, David J.; Weissenbach, Jean; Westhof, Eric; Wincker, Patrick; Jubin, Claire; Poulain, Julie; Barbe, Valérie; Ségurens, Béatrice; Artiguenave, François; Anthouard, Véronique; Vacherie, Benoit; Val, Marie-Eve; Fulton, Robert S.; Minx, Patrick; Wilson, Richard; Durrens, Pascal; Jean, Géraldine; Marck, Christian; Martin, Tiphaine; Nikolski, Macha; Rolland, Thomas; Seret, Marie-Line; Casarégola, Serge; Despons, Laurence; Fairhead, Cécile; Fischer, Gilles; Lafontaine, Ingrid; Leh, Véronique; Lemaire, Marc; de Montigny, Jacky; Neuvéglise, Cécile; Thierry, Agnès; Blanc-Lenfle, Isabelle; Bleykasten, Claudine; Diffels, Julie; Fritsch, Emilie; Frangeul, Lionel; Goëffon, Adrien; Jauniaux, Nicolas; Kachouri-Lafond, Rym; Payen, Célia; Potier, Serge; Pribylova, Lenka; Ozanne, Christophe; Richard, Guy-Franck; Sacerdot, Christine; Straub, Marie-Laure; Talla, Emmanuel

    2009-01-01

    Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call “protoploid” because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified. PMID:19525356

  14. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

    PubMed

    Berlin, Konstantin; Koren, Sergey; Chin, Chen-Shan; Drake, James P; Landolin, Jane M; Phillippy, Adam M

    2015-06-01

    Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

  15. Leveraging Large-Scale Cancer Genomics Datasets for Germline Discovery - TCGA

    Cancer.gov

    The session will review how data types have changed over time, focusing on how next-generation sequencing is being employed to yield more precise information about the underlying genomic variation that influences tumor etiology and biology.

  16. Exceptionally large mitochondrial fragments to the nucleus in sequenced mollusk genomes.

    PubMed

    Sun, Xiujun; Yang, Aiguo

    2016-01-01

    The available genome sequences of three mollusks (Biomphalaria glabrata, Aplysia californica and Crassostrea gigas) were first used to investigate the nuclear mitochondrial DNAs (NUMTs) in mollusks. The analysis showed that the NUMT contents were high in B. glabrata (17.738 Kb) and C. gigas (17.192 Kb), of which all or almost all mtDNA sequences were transferred to the nucleus, whereas NUMTs are rare (584 bp) in A. californica. The length of NUMTs was 61 to 5492 bp for B. glabrata, 1711 to 15,481 bp for C. gigas, and 124 to 460 bp for A. californica. The largest C. gigas NUMT covered 84.9% (15,481 bp) of its mitochondrial genome, which is rarely found in invertebrates so far. No correlation was found between NUMT content and genome size in the three sequenced mollusk genomes.

  17. Continuing Evolution of Burkholderia mallei Through Genome Reduction and Large-Scale Rearrangements

    DTIC Science & Technology

    2010-01-22

    in Materials and Methods. b NRPS, nonribosomal peptide synthase ; PKS, polyketide synthase ; RND, resistance nodulation-division like pump. Losada et al...genomics, genome erosion, bacterial virulence. ª The Author(s) 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology...creativecommons.org/licenses/by-nc/ 2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original

  18. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    SciTech Connect

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely-related bacterial genomes.

  19. Large-scale evaluation of experimentally determined DNA G+C contents with whole genome sequences of prokaryotes.

    PubMed

    Kim, Mincheol; Park, Sang-Cheol; Baek, Inwoo; Chun, Jongsik

    2015-03-01

    Historically, DNA G+C content has played a critical role in the description of bacterial and archaeal species. Despite its importance in prokaryote taxonomy, its accuracy has been questioned due to methodological heterogeneity and measurement errors of conventional methods. Here we investigated the extent of accuracy of experimentally determined DNA G+C contents by comparing the reference values calculated from whole genome sequences. The large-scale comparison revealed that G+C contents determined by high-performance liquid chromatography and buoyant density centrifugation methods were more similar to the genome-derived reference values than those generated by thermal denaturation method. However, there was a substantial degree of discrepancy in DNA G+C contents between values obtained by conventional methods and genome-derived reference values. The majority of the differences between them fell out of the acceptable range (i.e. 1 mol% G+C content difference) for species delimitation of prokaryotes. In contrast, when average nucleotide identity (ANI) was correlated to G+C difference among genomes, most G+C difference was confined to less than 1% within species. Therefore, erroneous conventional methods are not meaningful in the description of bacterial and archaeal species. For taxonomic purposes, DNA G+C content should be determined by calculating directly from high-quality genome sequences with at least 16× or higher sequencing depth of coverage.

  20. A large-scale, gene-driven mutagenesis approach for the functional analysis of the mouse genome

    PubMed Central

    Hansen, Jens; Floss, Thomas; Van Sloun, Petra; Füchtbauer, Ernst-Martin; Vauti, Franz; Arnold, Hans-Hennig; Schnütgen, Frank; Wurst, Wolfgang; von Melchner, Harald; Ruiz, Patricia

    2003-01-01

    A major challenge of the postgenomic era is the functional characterization of every single gene within the mammalian genome. In an effort to address this challenge, we assembled a collection of mutations in mouse embryonic stem (ES) cells, which is the largest publicly accessible collection of such mutations to date. Using four different gene-trap vectors, we generated 5,142 sequences adjacent to the gene-trap integration sites (gene-trap sequence tags; http://genetrap.de) from >11,000 ES cell clones. Although most of the gene-trap vector insertions occurred randomly throughout the genome, we found both vector-independent and vector-specific integration “hot spots.” Because >50% of the hot spots were vector-specific, we conclude that the most effective way to saturate the mouse genome with gene-trap insertions is by using a combination of gene-trap vectors. When a random sample of gene-trap integrations was passaged to the germ line, 59% (17 of 29) produced an observable phenotype in transgenic mice, a frequency similar to that achieved by conventional gene targeting. Thus, gene trapping allows a large-scale and cost-effective production of ES cell clones with mutations distributed throughout the genome, a resource likely to accelerate genome annotation and the in vivo modeling of human disease. PMID:12904583

  1. Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies.

    PubMed

    Standish, Kristopher A; Carland, Tristan M; Lockwood, Glenn K; Pfeiffer, Wayne; Tatineni, Mahidhar; Huang, C Chris; Lamberth, Sarah; Cherkas, Yauheniya; Brodmerkel, Carrie; Jaeger, Ed; Smith, Lance; Rajagopal, Gunaretnam; Curran, Mark E; Schork, Nicholas J

    2015-09-22

    Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.

  2. A large maize (Zea Mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome

    USDA-ARS?s Scientific Manuscript database

    SNP genotyping arrays have been useful for many applications that require a large number of molecular markers such as high-density genetic mapping, genome-wide association studies (GWAS), and genomic selection for accelerated breeding. We report the establishment of a large SNP array for maize and i...

  3. Reconstructed ancestral enzymes suggest long-term cooling of Earth's photic zone since the Archean

    NASA Astrophysics Data System (ADS)

    Garcia, Amanda K.; Schopf, J. William; Yokobori, Shin-ichi; Akanuma, Satoshi; Yamagishi, Akihiko

    2017-05-01

    Paleotemperatures inferred from the isotopic compositions (δ18O and δ30Si) of marine cherts suggest that Earth’s oceans cooled from 70 ± 15 °C in the Archean to the present ˜15 °C. This interpretation, however, has been subject to question due to uncertainties regarding oceanic isotopic compositions, diagenetic or metamorphic resetting of the isotopic record, and depositional environments. Analyses of the thermostability of reconstructed ancestral enzymes provide an independent method by which to assess the temperature history inferred from the isotopic evidence. Although previous studies have demonstrated extreme thermostability in reconstructed archaeal and bacterial proteins compatible with a hot early Earth, taxa investigated may have inhabited local thermal environments that differed significantly from average surface conditions. We here present thermostability measurements of reconstructed ancestral enzymatically active nucleoside diphosphate kinases (NDKs) derived from light-requiring prokaryotic and eukaryotic phototrophs having widely separated fossil-based divergence ages. The ancestral environmental temperatures thereby determined for these photic-zone organisms--shown in modern taxa to correlate strongly with NDK thermostability--are inferred to reflect ancient surface-environment paleotemperatures. Our results suggest that Earth's surface temperature decreased over geological time from ˜65-80 °C in the Archean, a finding consistent both with previous isotope-based and protein reconstruction-based interpretations. Interdisciplinary studies such as those reported here integrating genomic, geologic, and paleontologic data hold promise for providing new insight into the coevolution of life and environment over Earth history.

  4. Reconstructed ancestral enzymes suggest long-term cooling of Earth's photic zone since the Archean.

    PubMed

    Garcia, Amanda K; Schopf, J William; Yokobori, Shin-Ichi; Akanuma, Satoshi; Yamagishi, Akihiko

    2017-05-02

    Paleotemperatures inferred from the isotopic compositions (δ(18)O and δ(30)Si) of marine cherts suggest that Earth's oceans cooled from 70 ± 15 °C in the Archean to the present ∼15 °C. This interpretation, however, has been subject to question due to uncertainties regarding oceanic isotopic compositions, diagenetic or metamorphic resetting of the isotopic record, and depositional environments. Analyses of the thermostability of reconstructed ancestral enzymes provide an independent method by which to assess the temperature history inferred from the isotopic evidence. Although previous studies have demonstrated extreme thermostability in reconstructed archaeal and bacterial proteins compatible with a hot early Earth, taxa investigated may have inhabited local thermal environments that differed significantly from average surface conditions. We here present thermostability measurements of reconstructed ancestral enzymatically active nucleoside diphosphate kinases (NDKs) derived from light-requiring prokaryotic and eukaryotic phototrophs having widely separated fossil-based divergence ages. The ancestral environmental temperatures thereby determined for these photic-zone organisms--shown in modern taxa to correlate strongly with NDK thermostability--are inferred to reflect ancient surface-environment paleotemperatures. Our results suggest that Earth's surface temperature decreased over geological time from ∼65-80 °C in the Archean, a finding consistent both with previous isotope-based and protein reconstruction-based interpretations. Interdisciplinary studies such as those reported here integrating genomic, geologic, and paleontologic data hold promise for providing new insight into the coevolution of life and environment over Earth history.

  5. Comparative analysis of the primate X-inactivation center region and reconstruction of the ancestral primate XIST locus

    PubMed Central

    Horvath, Julie E.; Sheedy, Christina B.; Merrett, Stephanie L.; Diallo, Abdoulaye Banire; Swofford, David L.; NISC Comparative Sequencing Program; Green, Eric D.; Willard, Huntington F.

    2011-01-01

    Here we provide a detailed comparative analysis across the candidate X-Inactivation Center (XIC) region and the XIST locus in the genomes of six primates and three mammalian outgroup species. Since lemurs and other strepsirrhine primates represent the sister lineage to all other primates, this analysis focuses on lemurs to reconstruct the ancestral primate sequences and to gain insight into the evolution of this region and the genes within it. This comparative evolutionary genomics approach reveals significant expansion in genomic size across the XIC region in higher primates, with minimal size alterations across the XIST locus itself. Reconstructed primate ancestral XIC sequences show that the most dramatic changes during the past 80 million years occurred between the ancestral primate and the lineage leading to Old World monkeys. In contrast, the XIST locus compared between human and the primate ancestor does not indicate any dramatic changes to exons or XIST-specific repeats; rather, evolution of this locus reflects small incremental changes in overall sequence identity and short repeat insertions. While this comparative analysis reinforces that the region around XIST has been subject to significant genomic change, even among primates, our data suggest that evolution of the XIST sequences themselves represents only small lineage-specific changes across the past 80 million years. PMID:21518738

  6. Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach.

    PubMed

    Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi; Clayton, Ellen Wright; Kantarcioglu, Murat; Malin, Bradley

    2017-02-02

    Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.

  7. Genomic shotgun array: a procedure linking large-scale DNA sequencing with regional transcript mapping.

    PubMed

    Li, Ling-Hui; Li, Jian-Chiuan; Lin, Yung-Feng; Lin, Chung-Yen; Chen, Chung-Yung; Tsai, Shih-Feng

    2004-02-11

    To facilitate transcript mapping and to investigate alterations in genomic structure and gene expression in a defined genomic target, we developed a novel microarray-based method to detect transcriptional activity of the human chromosome 4q22-24 region. Loss of heterozygosity of human 4q22-24 is frequently observed in hepatocellular carcinoma (HCC). One hundred and eighteen well-characterized genes have been identified from this region. We took previously sequenced shotgun subclones as templates to amplify overlapping sequences for the genomic segment and constructed a chromosome-region-specific microarray. Using genomic DNA fragments as probes, we detected transcriptional activity from within this region among five different tissues. The hybridization results indicate that there are new transcripts that have not yet been identified by other methods. The existence of new transcripts encoded by genes in this region was confirmed by PCR cloning or cDNA library screening. The procedure reported here allows coupling of shotgun sequencing with transcript mapping and, potentially, detailed analysis of gene expression and chromosomal copy of the genomic sequence for the putative HCC tumor suppressor gene(s) in the 4q candidate region.

  8. Enhancing the pharmaceutical properties of protein drugs by ancestral sequence reconstruction

    PubMed Central

    Zakas, Philip M.; Brown, Harrison C.; Knight, Kristopher; Meeks, Shannon L.; Spencer, H. Trent; Gaucher, Eric A.; Doering, Christopher B.

    2016-01-01

    Optimization of a protein’s pharmaceutical properties is usually carried out by rational design and/or directed evolution. Here we test an alternative approach based on ancestral sequence reconstruction. Using available genomic sequence data on coagulation factor VIII and predictive models of molecular evolution, we engineer protein variants with improved activity, stability. biosynthesis potential, and reduced inhibition by clinical anti-drug antibodies. In principle, this approach can be applied to any protein drug based on a conserved gene sequence. PMID:27669166

  9. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

    PubMed Central

    Kurtz, Stefan; Narechania, Apurva; Stein, Joshua C; Ware, Doreen

    2008-01-01

    Background The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of k-mers, has been previously used to distinguish TEs from low-copy genic regions; but currently available software solutions are impractical due to high memory requirements or specialization for specific user-tasks. Results Here we introduce the Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set of whole genome shotgun sequences from maize (B73) (total size 109 bp.). We analyzed k-mer frequencies for a wide range of k. At this low genome coverage (≈ 0.45×) highly repetitive 20-mers constituted 44% of the genome but represented only 1% of all possible k-mers. Similar low-complexity was seen in the repeat fractions of sorghum and rice. When applying our method to other maize data sets, High-C0t derived sequences showed the greatest enrichment for low-copy sequences. Among annotated TEs, the most highly repetitive were of the Ty3/gypsy class of retrotransposons, followed by the Ty1/copia class, and DNA transposons. Among expressed sequence tags (EST), a notable fraction contained high-copy k-mers, suggesting that transposons are still active in maize. Retrotransposons in Mo17 and McC cultivars were readily detected using the B73 20-mer frequency index, indicating their conservation despite extensive rearrangement across

  10. Large-scale genomics unveil polygenic architecture of human cortical surface area

    PubMed Central

    Chen, Chi-Hua; Peng, Qian; Schork, Andrew J.; Lo, Min-Tzu; Fan, Chun-Chieh; Wang, Yunpeng; Desikan, Rahul S.; Bettella, Francesco; Hagler, Donald J.; McCabe, Connor; Chang, Linda; Akshoomoff, Natacha; Newman, Erik; Ernst, Thomas; Van Zijl, Peter; Kuperman, Joshua; Murray, Sarah; Bloss, Cinnamon; Appelbaum, Mark; Gamst, Anthony; Thompson, Wesley; Bartsch, Hauke; Weiner, Michael; Aisen, Paul; Petersen, Ronald; Jack Jr, Clifford R.; Jagust, William; Trojanowki, John Q.; Toga, Arthur W.; Beckett, Laurel; Green, Robert C.; Saykin, Andrew J.; Morris, John; Shaw, Leslie M.; Khachaturian, Zaven; Sorensen, Greg; Carrillo, Maria; Kuller, Lew; Raichle, Marc; Paul, Steven; Davies, Peter; Fillit, Howard; Hefti, Franz; Holtzman, Davie; Mesulman, M. Marcel; Potter, William; Snyder, Peter J.; Schwartz, Adam; Montine, Tom; Thomas, Ronald G.; Donohue, Michael; Walter, Sarah; Gessert, Devon; Sather, Tamie; Jiminez, Gus; Harvey, Danielle; Bernstein, Matthew; Fox, Nick; Thompson, Paul; Schuff, Norbert; DeCarli, Charles; Borowski, Bret; Gunter, Jeff; Senjem, Matt; Vemuri, Prashanthi; Jones, David; Kantarci, Kejal; Ward, Chad; Koeppe, Robert A.; Foster, Norm; Reiman, Eric M.; Chen, Kewei; Mathis, Chet; Landau, Susan; Cairns, Nigel J.; Householder, Erin; Taylor-Reinwald, Lisa; Lee, Virginia M.Y.; Korecka, Magdalena; Figurski, Michal; Crawford, Karen; Neu, Scott; Foroud, Tatiana M.; Potkin, Steven; Shen, Li; Faber, Kelley; Kim, Sungeun; Nho, Kwangsik; Thal, Leon; Frank, Richard; Buckholtz, Neil; Albert, Marilyn; Hsiao, John; Westlye, Lars T.; Kremen, William S.; Jernigan, Terry L.; Hellard, Stephanie Le; Steen, Vidar M.; Espeseth, Thomas; Huentelman, Matt; Håberg, Asta K.; Agartz, Ingrid; Djurovic, Srdjan; Andreassen, Ole A.; Schork, Nicholas; Dale, Anders M.

    2015-01-01

    Little is known about how genetic variation contributes to neuroanatomical variability, and whether particular genomic regions comprising genes or evolutionarily conserved elements are enriched for effects that influence brain morphology. Here, we examine brain imaging and single-nucleotide polymorphisms (SNPs) data from ∼2,700 individuals. We show that a substantial proportion of variation in cortical surface area is explained by additive effects of SNPs dispersed throughout the genome, with a larger heritable effect for visual and auditory sensory and insular cortices (h2∼0.45). Genome-wide SNPs collectively account for, on average, about half of twin heritability across cortical regions (N=466 twins). We find enriched genetic effects in or near genes. We also observe that SNPs in evolutionarily more conserved regions contributed significantly to the heritability of cortical surface area, particularly, for medial and temporal cortical regions. SNPs in less conserved regions contributed more to occipital and dorsolateral prefrontal cortices. PMID:26189703

  11. Large-scale genomics unveil polygenic architecture of human cortical surface area.

    PubMed

    Chen, Chi-Hua; Peng, Qian; Schork, Andrew J; Lo, Min-Tzu; Fan, Chun-Chieh; Wang, Yunpeng; Desikan, Rahul S; Bettella, Francesco; Hagler, Donald J; Westlye, Lars T; Kremen, William S; Jernigan, Terry L; Le Hellard, Stephanie; Steen, Vidar M; Espeseth, Thomas; Huentelman, Matt; Håberg, Asta K; Agartz, Ingrid; Djurovic, Srdjan; Andreassen, Ole A; Schork, Nicholas; Dale, Anders M

    2015-07-20

    Little is known about how genetic variation contributes to neuroanatomical variability, and whether particular genomic regions comprising genes or evolutionarily conserved elements are enriched for effects that influence brain morphology. Here, we examine brain imaging and single-nucleotide polymorphisms (SNPs) data from ∼2,700 individuals. We show that a substantial proportion of variation in cortical surface area is explained by additive effects of SNPs dispersed throughout the genome, with a larger heritable effect for visual and auditory sensory and insular cortices (h(2)∼0.45). Genome-wide SNPs collectively account for, on average, about half of twin heritability across cortical regions (N=466 twins). We find enriched genetic effects in or near genes. We also observe that SNPs in evolutionarily more conserved regions contributed significantly to the heritability of cortical surface area, particularly, for medial and temporal cortical regions. SNPs in less conserved regions contributed more to occipital and dorsolateral prefrontal cortices.

  12. Large Scale Sequencing of Dothideomycetes Provides Insights into Genome Evolution and Adaptation

    SciTech Connect

    Haridas, Sajeet; Crous, Pedro; Binder, Manfred; Spatafora, Joseph; Grigoriev, Igor

    2015-03-16

    Dothideomycetes is the largest and most diverse class of ascomycete fungi with 23 orders 110 families, 1300 genera and over 19,000 known species. We present comparative analysis of 70 Dothideomycete genomes including over 50 that we sequenced and are as yet unpublished. This extensive sampling has almost quadrupled the previous study of 18 species and uncovered a 10 fold range of genome sizes. We were able to clarify the phylogenetic positions of several species whose origins were unclear in previous morphological and sequence comparison studies. We analyzed selected gene families including proteases, transporters and small secreted proteins and show that major differences in gene content is influenced by speciation.

  13. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  14. Genome-wide nucleotide-level mammalian ancestor reconstruction.

    PubMed

    Paten, Benedict; Herrero, Javier; Fitzgerald, Stephen; Beal, Kathryn; Flicek, Paul; Holmes, Ian; Birney, Ewan

    2008-11-01

    Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will facilitate a greater knowledge of the events that have driven evolution. We present a new evolutionary alignment modeler, called "Ortheus," for inferring the evolutionary history of a multiple alignment, in terms of both substitutions and, importantly, insertions and deletions. Based on a multiple sequence probabilistic transducer model of the type proposed by Holmes, Ortheus uses efficient stochastic graph-based dynamic programming methods. Unlike other methods, Ortheus does not rely on a single fixed alignment from which to work. Ortheus is also more scaleable than previous methods while being fast, stable, and open source. Large-scale simulations show that Ortheus performs close to optimally on a deep mammalian phylogeny. Simulations also indicate that significant proportions of errors due to insertions and deletions can be avoided by not assuming a fixed alignment. We additionally use a challenging hold-out cross-validation procedure to test the method; using the reconstructions to predict extant sequence bases, we demonstrate significant improvements over using closest extant neighbor sequences. Accompanying this paper, a new, public, and genome-wide set of Ortheus ancestor alignments provide an intriguing new resource for evolutionary studies in mammals. As a first piece of analysis, we attempt to recover "fossilized" ancestral pseudogenes. We confidently find 31 cases in which the ancestral sequence had a more complete sequence than any of the extant sequences.

  15. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  16. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    PubMed

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  17. The Korarchaeota: Archaeal orphans representing an ancestral lineage of life

    SciTech Connect

    Elkins, James G.; Kunin, Victor; Anderson, Iain; Barry, Kerrie; Goltsman, Eugene; Lapidus, Alla; Hedlund, Brian; Hugenholtz, Phil; Kyrpides, Nikos; Graham, David; Keller, Martin; Wanner, Gerhard; Richardson, Paul; Stetter, Karl O.

    2007-05-01

    Based on conserved cellular properties, all life on Earth can be grouped into different phyla which belong to the primary domains Bacteria, Archaea, and Eukarya. However, tracing back their evolutionary relationships has been impeded by horizontal gene transfer and gene loss. Within the Archaea, the kingdoms Crenarchaeota and Euryarchaeota exhibit a profound divergence. In order to elucidate the evolution of these two major kingdoms, representatives of more deeply diverged lineages would be required. Based on their environmental small subunit ribosomal (ss RNA) sequences, the Korarchaeota had been originally suggested to have an ancestral relationship to all known Archaea although this assessment has been refuted. Here we describe the cultivation and initial characterization of the first member of the Korarchaeota, highly unusual, ultrathin filamentous cells about 0.16 {micro}m in diameter. A complete genome sequence obtained from enrichment cultures revealed an unprecedented combination of signature genes which were thought to be characteristic of either the Crenarchaeota, Euryarchaeota, or Eukarya. Cell division appears to be mediated through a FtsZ-dependent mechanism which is highly conserved throughout the Bacteria and Euryarchaeota. An rpb8 subunit of the DNA-dependent RNA polymerase was identified which is absent from other Archaea and has been described as a eukaryotic signature gene. In addition, the representative organism possesses a ribosome structure typical for members of the Crenarchaeota. Based on its gene complement, this lineage likely diverged near the separation of the two major kingdoms of Archaea. Further investigations of these unique organisms may shed additional light onto the evolution of extant life.

  18. Twenty years of artificial directional selection have shaped the genome of the Italian Large White pig breed.

    PubMed

    Schiavo, G; Galimberti, G; Calò, D G; Samorè, A B; Bertolini, F; Russo, V; Gallo, M; Buttazzoni, L; Fontanesi, L

    2016-04-01

    In this study, we investigated at the genome-wide level if 20 years of artificial directional selection based on boar genetic evaluation obtained with a classical BLUP animal model shaped the genome of the Italian Large White pig breed. The most influential boars of this breed (n = 192), born from 1992 (the beginning of the selection program of this breed) to 2012, with an estimated breeding value reliability of >0.85, were genotyped with the Illumina Porcine SNP60 BeadChip. After grouping the boars in eight classes according to their year of birth, filtered single nucleotide polymorphisms (SNPs) were used to evaluate the effects of time on genotype frequency changes using multinomial logistic regression models. Of these markers, 493 had a PBonferroni  < 0.10. However, there was an increasing number of SNPs with a decreasing level of allele frequency changes over time, representing a continuous profile across the genome. The largest proportion of the 493 SNPs was on porcine chromosome (SSC) 7, SSC2, SSC8 and SSC18 for a total of 204 haploblocks. Functional annotations of genomic regions, including the 493 shifted SNPs, reported a few Gene Ontology terms that might underly the biological processes that contributed to increase performances of the pigs over the 20 years of the selection program. The obtained results indicated that the genome of the Italian Large White pigs was shaped by a directional selection program derived by the application of methodologies assuming the infinitesimal model that captured a continuous trend of allele frequency changes in the boar population.

  19. Evidence for an Ancestral Association of Human Coronavirus 229E with Bats

    PubMed Central

    Corman, Victor Max; Baldwin, Heather J.; Tateno, Adriana Fumie; Zerbinati, Rodrigo Melim; Annan, Augustina; Owusu, Michael; Nkrumah, Evans Ewald; Maganga, Gael Darren; Oppong, Samuel; Adu-Sarkodie, Yaw; Vallo, Peter; da Silva Filho, Luiz Vicente Ribeiro Ferreira; Leroy, Eric M.; Thiel, Volker; van der Hoek, Lia; Poon, Leo L. M.; Tschapka, Marco

    2015-01-01

    ABSTRACT We previously showed that close relatives of human coronavirus 229E (HCoV-229E) exist in African bats. The small sample and limited genomic characterizations have prevented further analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana for HCoV-229E-related viruses by reverse transcription-PCR (RT-PCR). Only hipposiderid bats tested positive. To compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and diagnostic specimens sampled globally over 10 years. Bat viruses were 5- and 6-fold more diversified than HCoV-229E in the RNA-dependent RNA polymerase (RdRp) and spike genes. In phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with animal viruses. Bat viruses formed three large clades in close and more distant sister relationships. A recently described 229E-related alpaca virus occupied an intermediate phylogenetic position between bat and human viruses. According to taxonomic criteria, human, alpaca, and bat viruses form a single CoV species showing evidence for multiple recombination events. HCoV-229E and the alpaca virus showed a major deletion in the spike S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3′ end. ORF8 also existed in the 229E-related alpaca virus. Reanalysis of HCoV-229E sequences showed a conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as intermediate hosts preceding the establishment of HCoV-229E. IMPORTANCE The ancestral origins of major human coronaviruses (HCoVs) likely involve bat hosts. Here, we provide conclusive genetic evidence for an evolutionary origin of the common cold virus HCoV-229E in

  20. An ancestral miR-1304 allele present in Neanderthals regulates genes involved in enamel formation and could explain dental differences with modern humans.

    PubMed

    Lopez-Valenzuela, Maria; Ramírez, Oscar; Rosas, Antonio; García-Vargas, Samuel; de la Rasilla, Marco; Lalueza-Fox, Carles; Espinosa-Parrilla, Yolanda

    2012-07-01

    Genetic changes in regulatory elements are likely to result in phenotypic effects that might explain population-specific as well as species-specific traits. MicroRNAs (miRNAs) are posttranscriptional repressors involved in the control of almost every biological process. These small noncoding RNAs are present in various phylogenetic groups, and a large number of them remain highly conserved at the sequence level. MicroRNA-mediated regulation depends on perfect matching between the seven nucleotides of its seed region and the target sequence usually located at the 3' untranslated region of the regulated gene. Hence, even single changes in seed regions are predicted to be deleterious as they may affect miRNA target specificity. In accordance to this, purifying selection has strongly acted on these regions. Comparison between the genomes of present-day humans from various populations, Neanderthal, and other nonhuman primates showed an miRNA, miR-1304, that carries a polymorphism on its seed region. The ancestral allele is found in Neanderthal, nonhuman primates, at low frequency (~5%) in modern Asian populations and rarely in Africans. Using miRNA target site prediction algorithms, we found that the derived allele increases the number of putative target genes for the derived miRNA more than ten-fold, indicating an important functional evolution for miR-1304. Analysis of the predicted targets for derived miR-1304 indicates an association with behavior and nervous system development and function. Two of the predicted target genes for the ancestral miR-1304 allele are important genes for teeth formation, enamelin, and amelotin. MicroRNA overexpression experiments using a luciferase-based assay showed that the ancestral version of miR-1304 reduces the enamelin- and amelotin-associated reporter gene expression by 50%, whereas the derived miR-1304 does not have any effect. Deletion of the corresponding target sites for miR-1304 in these dental genes avoided their repression

  1. GenoMetric Query Language: a novel approach to large-scale genomic data management.

    PubMed

    Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano

    2015-06-15

    Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Discovery of novel phosphonate natural products and their biosynthetic pathways by large-scale genome mining

    USDA-ARS?s Scientific Manuscript database

    Genome mining has revolutionized the field of natural products, providing hope that new antibiotics can be discovered in time before all remainders are rendered useless against multidrug resistant pathogens. While this approach has been successful in academic settings focused on small collections or...

  3. Large-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock.

    PubMed

    Stephen, Stuart; Pheasant, Michael; Makunin, Igor V; Mattick, John S

    2008-02-01

    Mammalian genomes contain millions of highly conserved noncoding sequences, many of which are regulatory. The most extreme examples are the 481 ultraconserved elements (UCEs) that are identical over at least 200 bp in human, mouse, and rat and show 96% identity with chicken, which diverged approximately 310 MYA. If the substitution rate in UCEs remained constant, these elements should also be present with a high level of identity in fish (approximately 450 Myr), but this is not the case, suggesting that many appeared in the amniotes or tetrapods or that the molecular clock has slowed down in these lineages, or both. Taking advantage of the availability of multiple genomes, we identified 13,736 UCEs in the human genome that are identical over at least 100 bp in at least 3 of 5 placental mammals, including 2,189 sequences over at least 200 bp, thereby greatly expanding the repertoire of known UCEs, and investigated the evolution of these sequences in opossum, chicken, frog, and fish. We conclude that there was a massive genome-wide acquisition and expansion of UCEs during tetrapod and then amniote evolution, accompanied by a slowdown of the molecular clock, particularly in the amniotes, a process consistent with their functional exaptation in these lineages. The majority of tetrapod-specific UCEs are noncoding and associated with genes involved in regulation of transcription and development. In contrast, fish genomes contain relatively few UCEs, the majority of which are common to all bony vertebrates. These elements are different from other conserved noncoding elements and appear to be important regulatory innovations that became fixed following the emergence of vertebrates from the sea to the land.

  4. Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species

    PubMed Central

    Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G.; O’Brien, Stephen J.; Johnson, Warren E.

    2006-01-01

    Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondrial genome, is one of the largest to be reported in eukaryotes. The Panthera genus numt differs from the numt previously described in the Felis genus in: (1) chromosomal location (F2 – telomeric region vs. D2 – centromeric region), (2) gene make up (from the ND5 to the ATP8 vs. from the CR to the COII), (3) size (12.5 kb vs. 7.9 kb), and (4) structure (single monomer vs. tandemly repeated in Felis). These distinctions indicate that the origin of this large numt fragment in the nuclear genome of the Panthera species is an independent insertion from that of the domestic cat lineage, which has been further supported by phylogenetic analyses. The tiger cymtDNA shared around 90% sequence identity with the homologous numt sequence, suggesting an origin for the Panthera numt at around 3.5 million years ago, prior to the radiation of the five extant Panthera species. PMID:16380222

  5. Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species.

    PubMed

    Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G; O'Brien, Stephen J; Johnson, Warren E

    2006-02-01

    Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondrial genome, is one of the largest to be reported in eukaryotes. The Panthera genus numt differs from the numt previously described in the Felis genus in: (1) chromosomal location (F2-telomeric region vs. D2-centromeric region), (2) gene make up (from the ND5 to the ATP8 vs. from the CR to the COII), (3) size (12.5 vs. 7.9 kb), and (4) structure (single monomer vs. tandemly repeated in Felis). These distinctions indicate that the origin of this large numt fragment in the nuclear genome of the Panthera species is an independent insertion from that of the domestic cat lineage, which has been further supported by phylogenetic analyses. The tiger cymtDNA shared around 90% sequence identity with the homologous numt sequence, suggesting an origin for the Panthera numt at around 3.5 million years ago, prior to the radiation of the five extant Panthera species.

  6. Cloning of complete genomes of large dsDNA viruses by in vitro transposition of an F factor containing transposon.

    PubMed

    Wang, Yongjie; Stojiljković, Nina; Jehle, Johannes A

    2010-07-01

    An improved bacmid technology for cloning complete genomes of large dsDNA viruses with circular genomes has been developed and tested. The system, termed EZ::BAC, is based on Escherichia coli F factor replicon, a chloramphenicol resistant marker gene with the mosaic ends recognized specifically by the transposase of the Tn5. In vitro transposition was carried out for the baculovirus shuttle vector pMON14272 (136kb) and the Autographa californica multiple nucleopolyhedrovirus (AcMNPV) genome (134kb) as target DNAs. Transposon EZ::BAC was inserted randomly into the target DNAs, leading to 9bp duplication of the flanking end at the insertion site. One of the obtained AcMNPV::BACs replicated in Sf21 cells after transfection. The random in vitro generation of viral bacmids using EZ::BAC facilitates the host-independent propagation of intact and functional viral genomes in E. coli cells and does not require sequence information of the target DNA as is necessary for the generation of bacmids in conventional systems. Copyright 2009 Elsevier B.V. All rights reserved.

  7. Characterisation of monotreme caseins reveals lineage-specific expansion of an ancestral casein locus in mammals.

    PubMed

    Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R

    2009-01-01

    Using a milk-cell cDNA sequencing approach we characterised milk-protein sequences from two monotreme species, platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus) and found a full set of caseins and casein variants. The genomic organisation of the platypus casein locus is compared with other mammalian genomes, including the marsupial opossum and several eutherians. Physical linkage of casein genes has been seen in the casein loci of all mammalian genomes examined and we confirm that this is also observed in platypus. However, we show that a recent duplication of beta-casein occurred in the monotreme lineage, as opposed to more ancient duplications of alpha-casein in the eutherian lineage, while marsupials possess only single copies of alpha- and beta-caseins. Despite this variability, the close proximity of the main alpha- and beta-casein genes in an inverted tail-tail orientation and the relative orientation of the more distant kappa-casein genes are similar in all mammalian genome sequences so far available. Overall, the conservation of the genomic organisation of the caseins indicates the early, pre-monotreme development of the fundamental role of caseins during lactation. In contrast, the lineage-specific gene duplications that have occurred within the casein locus of monotremes and eutherians but not marsupials, which may have lost part of the ancestral casein locus, emphasises the independent selection on milk provision strategies to the young, most likely linked to different developmental strategies. The monotremes therefore provide insight into the ancestral drivers for lactation and how these have adapted in different lineages.

  8. Ancestral Rocky Mountian Tectonics: A Sedimentary Record of Ancestral Front Range and Uncompahgre Exhumation

    NASA Astrophysics Data System (ADS)

    Smith, T. M.; Saylor, J. E.; Lapen, T. J.

    2015-12-01

    The Ancestral Rocky Mountains (ARM) encompass multiple crustal provinces with characteristic crystallization ages across the central and western US. Two driving mechanisms have been proposed to explain ARM deformation. (1) Ouachita-Marathon collision SE of the ARM uplifts has been linked to an E-to-W sequence of uplift and is consistent with proposed disruption of a larger Paradox-Central Colorado Trough Basin by exhumation of the Uncompahgre Uplift. Initial exhumation of the Amarillo-Wichita Uplift to the east would provide a unique ~530 Ma signal absent from source areas to the SW, and result in initial exhumation of the Ancestral Front Range. (2) Alternatively, deformation due to flat slab subduction along a hypothesized plate boundary to the SW suggests a SW-to-NE younging of exhumation. This hypothesis suggests a SW-derived Grenville signature, and would trigger uplift of the Uncompahgre first. We analyzed depositional environments, sediment dispersal patterns, and sediment and basement zircon U-Pb and (U-Th)/He ages in 3 locations in the Paradox Basin and Central Colorado Trough (CCT). The Paradox Basin exhibits an up-section transition in fluvial style that suggests a decrease in overbank stability and increased lateral migration. Similarly, the CCT records a long-term progradation of depositional environments from marginal marine to fluvial, indicating that sediment supply in both basins outpaced accommodation. Preliminary provenance results indicate little to no input from the Amarillo-Wichita uplift in either basin despite uniformly westward sediment dispersal systems in both basins. Results also show that the Uncompahgre Uplift was the source for sediment throughout Paradox Basin deposition. These observations are inconsistent with the predictions of scenario 1 above. Rather, they suggest either a synchronous response to tectonic stress across the ARM provinces or an SW-to-NE pattern of deformation.

  9. Large, Male Germ Cell-Specific Hypomethylated DNA Domains With Unique Genomic and Epigenomic Features on the Mouse X Chromosome

    PubMed Central

    Ikeda, Rieko; Shiura, Hirosuke; Numata, Koji; Sugimoto, Michihiko; Kondo, Masayo; Mise, Nathan; Suzuki, Masako; Greally, John M.; Abe, Kuniya

    2013-01-01

    To understand the epigenetic regulation required for germ cell-specific gene expression in the mouse, we analysed DNA methylation profiles of developing germ cells using a microarray-based assay adapted for a small number of cells. The analysis revealed differentially methylated sites between cell types tested. Here, we focused on a group of genomic sequences hypomethylated specifically in germline cells as candidate regions involved in the epigenetic regulation of germline gene expression. These hypomethylated sequences tend to be clustered, forming large (10 kb to ∼9 Mb) genomic domains, particularly on the X chromosome of male germ cells. Most of these regions, designated here as large hypomethylated domains (LoDs), correspond to segmentally duplicated regions that contain gene families showing germ cell- or testis-specific expression, including cancer testis antigen genes. We found an inverse correlation between DNA methylation level and expression of genes in these domains. Most LoDs appear to be enriched with H3 lysine 9 dimethylation, usually regarded as a repressive histone modification, although some LoD genes can be expressed in male germ cells. It thus appears that such a unique epigenomic state associated with the LoDs may constitute a basis for the specific expression of genes contained in these genomic domains. PMID:23861320

  10. Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots

    PubMed Central

    Wang, Yupeng; Ficklin, Stephen P.; Wang, Xiyin; Feltus, F. Alex; Paterson, Andrew H.

    2016-01-01

    Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots. PMID:27195960

  11. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    SciTech Connect

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  12. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms.

    PubMed

    Lowery, Robert K; Uribe, Gabriel; Jimenez, Eric B; Weiss, Mark A; Herrera, Kristian J; Regueiro, Maria; Herrera, Rene J

    2013-11-01

    Analyses of the genetic relationships among modern humans, Neanderthals and Denisovans have suggested that 1-4% of the non-Sub-Saharan African gene pool may be Neanderthal derived, while 6-8% of the Melanesian gene pool may be the product of admixture between the Denisovans and the direct ancestors of Melanesians. In the present study, we analyzed single nucleotide polymorphism (SNP) diversity among a worldwide collection of contemporary human populations with respect to the genetic constitution of these two archaic hominins and Pan troglodytes (chimpanzee). We partitioned SNPs into subsets, including those that are derived in both archaic lineages, those that are ancestral in both archaic lineages and those that are only derived in one archaic lineage. By doing this, we have conducted separate examinations of subsets of mutations with higher probabilities of divergent phylogenetic origins. While previous investigations have excluded SNPs from common ancestors in principal component analyses, we included common ancestral SNPs in our analyses to visualize the relative placement of the Neanderthal and Denisova among human populations. To assess the genetic similarities among the various hominin lineages, we performed genetic structure analyses to provide a comparison of genetic patterns found within contemporary human genomes that may have archaic or common ancestral roots. Our results indicate that 3.6% of the Neanderthal genome is shared with roughly 65.4% of the average European gene pool, which clinally diminishes with distance from Europe. Our results suggest that Neanderthal genetic associations with contemporary non-Sub-Saharan African populations, as well as the genetic affinities observed between Denisovans and Melanesians most likely result from the retention of ancient mutations in these populations.

  13. Writ large: Genomic Dissection of the Effect of Cellular Environment on Immune Response

    PubMed Central

    Yosef, Nir; Regev, Aviv

    2016-01-01

    Cells of the immune system routinely respond to cues from their local environment and feedback to their surrounding through transient responses, choice of differentiation trajectories, plastic changes in cell state, and malleable adaptation to their tissue of residence. Genomic approaches have opened the way for comprehensive interrogation of such orchestrated responses. Focusing on genomic profiling of transcriptional and epigenetic cell state, we discuss how they are applied to investigate immune cells faced with various environmental cues. We highlight some of the emerging principles, on the role of dense regulatory circuitry, epigenetic memory, cell type fluidity, and reuse of regulatory modules, in achieving and maintaining appropriate responses to a changing environment. These provide a first step toward a systematic understanding of molecular circuits in complex tissues. PMID:27846493