Science.gov

Sample records for large ancestral genomes

  1. Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

    PubMed Central

    Miklós, István

    2009-01-01

    Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746

  2. Regulatory genes in the ancestral chordate genomes.

    PubMed

    Satou, Yutaka; Wada, Shuichi; Sasakura, Yasunori; Satoh, Nori

    2008-12-01

    Changes or innovations in gene regulatory networks for the developmental program in the ancestral chordate genome appear to be a major component in the evolutionary process in which tadpole-type larvae, a unique characteristic of chordates, arose. These alterations may include new genetic interactions as well as the acquisition of new regulatory genes. Previous analyses of the Ciona genome revealed that many genes may have emerged after the divergence of the tunicate and vertebrate lineages. In this paper, we examined this possibility by examining a second non-vertebrate chordate genome. We conclude from this analysis that the ancient chordate included almost the same repertory of regulatory genes, but less redundancy than extant vertebrates, and that approximately 10% of vertebrate regulatory genes were innovated after the emergence of vertebrates. Thus, refined regulatory networks arose during vertebrate evolution mainly as preexisting regulatory genes multiplied rather than by generating new regulatory genes. The inferred regulatory gene sets of the ancestral chordate would be an important foundation for understanding how tadpole-type larvae, a unique characteristic of chordates, evolved.

  3. Yeast Ancestral Genome Reconstructions: The Possibilities of Computational Methods

    NASA Astrophysics Data System (ADS)

    Tannier, Eric

    In 2006, a debate has risen on the question of the efficiency of bioinformatics methods to reconstruct mammalian ancestral genomes. Three years later, Gordon et al. (PLoS Genetics, 5(5), 2009) chose not to use automatic methods to build up the genome of a 100 million year old Saccharomyces cerevisiae ancestor. Their manually constructed ancestor provides a reference genome to test whether automatic methods are indeed unable to approach confident reconstructions. Adapting several methodological frameworks to the same yeast gene order data, I discuss the possibilities, differences and similarities of the available algorithms for ancestral genome reconstructions. The methods can be classified into two types: local and global. Studying the properties of both helps to clarify what we can expect from their usage. Both methods propose contiguous ancestral regions that come very close (> 95% identity) to the manually predicted ancestral yeast chromosomes, with a good coverage of the extant genomes.

  4. Deciphering the diploid ancestral genome of the Mesohexaploid Brassica rapa.

    PubMed

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A; Wang, Xiaowu

    2013-05-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers.

  5. DeCoSTAR: Reconstructing the ancestral organization of genes or genomes using reconciled phylogenies.

    PubMed

    Duchemin, Wandrille; Anselmetti, Yoann; Patterson, Murray; Ponty, Yann; Berard, Severine; Chauve, Cedric; Scornavacca, Celine; Daubin, Vincent; Tannier, Eric

    2017-04-08

    DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann-Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously-contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria.

  6. Genome-Wide Inference of Ancestral Recombination Graphs

    PubMed Central

    Rasmussen, Matthew D.; Hubisz, Melissa J.; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  7. Differential loss of ancestral gene families as a source of genomic divergence in animals.

    PubMed Central

    Hughes, Austin L; Friedman, Robert

    2004-01-01

    A phylogenetic approach was used to reconstruct the pattern of an apparent loss of 2106 ancestral gene families in four animal genomes (Caenorhabditis elegans, Drosophila melanogaster, human and fugu). Substantially higher rates of loss of ancestral gene families were found in the invertebrates than in the vertebrates. These results indicate that the differential loss of ancestral gene families can be a significant factor in the evolutionary diversification of organisms. PMID:15101434

  8. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence

    PubMed Central

    2015-01-01

    Background We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. Results We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. Conclusion The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison. PMID:26450112

  9. Ancient hybridizations among the ancestral genomes of bread wheat.

    PubMed

    Marcussen, Thomas; Sandve, Simen R; Heier, Lise; Spannagl, Manuel; Pfeifer, Matthias; Jakobsen, Kjetill S; Wulff, Brande B H; Steuernagel, Burkhard; Mayer, Klaus F X; Olsen, Odd-Arne

    2014-07-18

    The allohexaploid bread wheat genome consists of three closely related subgenomes (A, B, and D), but a clear understanding of their phylogenetic history has been lacking. We used genome assemblies of bread wheat and five diploid relatives to analyze genome-wide samples of gene trees, as well as to estimate evolutionary relatedness and divergence times. We show that the A and B genomes diverged from a common ancestor ~7 million years ago and that these genomes gave rise to the D genome through homoploid hybrid speciation 1 to 2 million years later. Our findings imply that the present-day bread wheat genome is a product of multiple rounds of hybrid speciation (homoploid and polyploid) and lay the foundation for a new framework for understanding the wheat genome as a multilevel phylogenetic mosaic.

  10. Genomic evolution in domestic cattle: ancestral haplotypes and healthy beef.

    PubMed

    Williamson, Joseph F; Steele, Edward J; Lester, Susan; Kalai, Oscar; Millman, John A; Wolrige, Lindsay; Bayard, Dominic; McLure, Craig; Dawkins, Roger L

    2011-05-01

    We have identified numerous Ancestral Haplotypes encoding a 14-Mb region of Bota C19. Three are frequent in Simmental, Angus and Wagyu and have been conserved since common progenitor populations. Others are more relevant to the differences between these 3 breeds including fat content and distribution in muscle. SREBF1 and Growth Hormone, which have been implicated in the production of healthy beef, are included within these haplotypes. However, we conclude that alleles at these 2 loci are less important than other sequences within the haplotypes. Identification of breeds and hybrids is improved by using haplotypes rather than individual alleles.

  11. Whole genome profiling physical map and ancestral annotation of tobacco Hicks Broadleaf.

    PubMed

    Sierro, Nicolas; van Oeveren, Jan; van Eijk, Michiel J T; Martin, Florian; Stormo, Keith E; Peitsch, Manuel C; Ivanov, Nikolai V

    2013-09-01

    Genomics-based breeding of economically important crops such as banana, coffee, cotton, potato, tobacco and wheat is often hampered by genome size, polyploidy and high repeat content. We adapted sequence-based whole-genome profiling (WGP™) technology to obtain insight into the polyploidy of the model plant Nicotiana tabacum (tobacco). N. tabacum is assumed to originate from a hybridization event between ancestors of Nicotiana sylvestris and Nicotiana tomentosiformis approximately 200,000 years ago. This resulted in tobacco having a haploid genome size of 4500 million base pairs, approximately four times larger than the related tomato (Solanum lycopersicum) and potato (Solanum tuberosum) genomes. In this study, a physical map containing 9750 contigs of bacterial artificial chromosomes (BACs) was constructed. The mean contig size was 462 kbp, and the calculated genome coverage equaled the estimated tobacco genome size. We used a method for determination of the ancestral origin of the genome by annotation of WGP sequence tags. This assignment agreed with the ancestral annotation available from the tobacco genetic map, and may be used to investigate the evolution of homoeologous genome segments after polyploidization. The map generated is an essential scaffold for the tobacco genome. We propose the combination of WGP physical mapping technology and tag profiling of ancestral lines as a generally applicable method to elucidate the ancestral origin of genome segments of polyploid species. The physical mapping of genes and their origins will enable application of biotechnology to polyploid plants aimed at accelerating and increasing the precision of breeding for abiotic and biotic stress resistance.

  12. BAC libraries construction from the ancestral diploid genomes of the allotetraploid cultivated peanut

    PubMed Central

    Guimarães, Patricia M; Garsmeur, Olivier; Proite, Karina; Leal-Bertioli, Soraya CM; Seijo, Guilhermo; Chaine, Christian; Bertioli, David J; D'Hont, Angelique

    2008-01-01

    Background Cultivated peanut, Arachis hypogaea is an allotetraploid of recent origin, with an AABB genome. In common with many other polyploids, it seems that a severe genetic bottle-neck was imposed at the species origin, via hybridisation of two wild species and spontaneous chromosome duplication. Therefore, the study of the genome of peanut is hampered both by the crop's low genetic diversity and its polyploidy. In contrast to cultivated peanut, most wild Arachis species are diploid with high genetic diversity. The study of diploid Arachis genomes is therefore attractive, both to simplify the construction of genetic and physical maps, and for the isolation and characterization of wild alleles. The most probable wild ancestors of cultivated peanut are A. duranensis and A. ipaënsis with genome types AA and BB respectively. Results We constructed and characterized two large-insert libraries in Bacterial Artificial Chromosome (BAC) vector, one for each of the diploid ancestral species. The libraries (AA and BB) are respectively c. 7.4 and c. 5.3 genome equivalents with low organelle contamination and average insert sizes of 110 and 100 kb. Both libraries were used for the isolation of clones containing genetically mapped legume anchor markers (single copy genes), and resistance gene analogues. Conclusion These diploid BAC libraries are important tools for the isolation of wild alleles conferring resistances to biotic stresses, comparisons of orthologous regions of the AA and BB genomes with each other and with other legume species, and will facilitate the construction of a physical map. PMID:18230166

  13. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs

    PubMed Central

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2015-01-01

    To provide context for the diversifications of archosaurs, the group that includes crocodilians, dinosaurs and birds, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the relatively rapid evolution of bird genomes represents an autapomorphy within that clade. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these new data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  14. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs.

    PubMed

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2014-12-12

    To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.

  15. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    PubMed

    Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

    2015-04-01

    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.

  16. Monotreme IGF2 expression and ancestral origin of genomic imprinting.

    PubMed

    Killian, J K; Nolan, C M; Stewart, N; Munday, B L; Andersen, N A; Nicol, S; Jirtle, R L

    2001-08-15

    IGF2 (insulin-like growth factor 2) and M6P/IGF2R (mannose 6-phosphate/insulin-like growth factor 2 receptor) are imprinted in marsupials and eutherians but not in birds. These results along with the absence of M6P/IGF2R imprinting in the egg-laying monotremes indicate that the parental imprinting of fetal growth-regulatory genes may be unique to viviparous mammals. In this investigation, we have cloned IGF2 from two monotreme mammals, the platypus and echidna, to further investigate the origin of imprinting. We report herein that like M6P/IGF2R, IGF2 is not imprinted in monotremes. Thus, although IGF2 encodes for a highly conserved growth factor in chordates, it is only imprinted in therian mammals. These findings support a concurrent origin of IGF2 and M6P/IGF2R imprinting in the late Jurassic/early Cretaceous period. The absence of imprinting in monotremes, despite apparent interparental conflicts over maternal-offspring exchange, argues that a fortuitous congruency of genetic and epigenetic events may have limited the phylogenetic breadth of genomic imprinting to therian mammals. J. Exp. Zool. (Mol. Dev. Evol.) 291:205-212, 2001.

  17. Analyses of Charophyte Chloroplast Genomes Help Characterize the Ancestral Chloroplast Genome of Land Plants

    PubMed Central

    Civáň, Peter; Foster, Peter G.; Embley, Martin T.; Séneca, Ana; Cox, Cymon J.

    2014-01-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  18. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    PubMed

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.

  19. Mitochondrial Genome of Palpitomonas bilix: Derived Genome Structure and Ancestral System for Cytochrome c Maturation

    PubMed Central

    Nishimura, Yuki; Tanifuji, Goro; Kamikawa, Ryoma; Yabuki, Akinori; Hashimoto, Tetsuo; Inagaki, Yuji

    2016-01-01

    We here reported the mitochondrial (mt) genome of one of the heterotrophic microeukaryotes related to cryptophytes, Palpitomonas bilix. The P. bilix mt genome was found to be a linear molecule composed of “single copy region” (∼16 kb) and repeat regions (∼30 kb) arranged in an inverse manner at both ends of the genome. Linear mt genomes with large inverted repeats are known for three distantly related eukaryotes (including P. bilix), suggesting that this particular mt genome structure has emerged at least three times in the eukaryotic tree of life. The P. bilix mt genome contains 47 protein-coding genes including ccmA, ccmB, ccmC, and ccmF, which encode protein subunits involved in the system for cytochrome c maturation inherited from a bacterium (System I). We present data indicating that the phylogenetic relatives of P. bilix, namely, cryptophytes, goniomonads, and kathablepharids, utilize an alternative system for cytochrome c maturation, which has most likely emerged during the evolution of eukaryotes (System III). To explain the distribution of Systems I and III in P. bilix and its phylogenetic relatives, two scenarios are possible: (i) System I was replaced by System III on the branch leading to the common ancestor of cryptophytes, goniomonads, and kathablepharids, and (ii) the two systems co-existed in their common ancestor, and lost differentially among the four descendants. PMID:27604877

  20. Exploring the diploid wheat ancestral A genome through sequence comparison at the high-molecular-weight glutenin locus region.

    PubMed

    Dong, Lingli; Huo, Naxin; Wang, Yi; Deal, Karin; Luo, Ming-Cheng; Wang, Daowen; Anderson, Olin D; Gu, Yong Qiang

    2012-12-01

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurately assembly of its genome. To explore the utility of ancestral diploid species of polyploid wheat, sequence variation of T. urartu (A(u)A(u)) was analyzed by comparing its 277-kb large genomic region carrying the important Glu-1 locus with the homologous regions from the A genomes of the diploid T. monococcum (A(m)A(m)), tetraploid T. turgidum (AABB), and hexaploid T. aestivum (AABBDD). Our results revealed that in addition to a high degree of the gene collinearity, nested retroelement structures were also considerably conserved among the A(u) genome and the A genomes in polyploid wheats, suggesting that the majority of the repetitive sequences in the A genomes of polyploid wheats originated from the diploid A(u) genome. The difference in the compared region between A(u) and A is mainly caused by four differential TE insertion and two deletion events between these genomes. The estimated divergence time of A genomes calculated on nucleotide substitution rate in both shared TEs and collinear genes further supports the closer evolutionary relationship of A to A(u) than to A(m). The structure conservation in the repetitive regions promoted us to develop repeat junction markers based on the A(u) sequence for mapping the A genome in hexaploid wheat. Eighty percent of these repeat junction markers were successfully mapped to the corresponding region in hexaploid wheat, suggesting that T. urartu could serve as a useful resource for developing molecular markers for genetic and breeding studies in hexaploid wheat.

  1. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral “Salicoid” Genome Duplication

    PubMed Central

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the “salicoid” duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  2. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral "Salicoid" Genome Duplication.

    PubMed

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-06-27

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the "salicoid" duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants.

  3. Exploiting ancestral mammalian genomes for the prediction of human transcription factor binding sites

    PubMed Central

    2012-01-01

    Background The computational prediction of Transcription Factor Binding Sites (TFBS) remains a challenge due to their short length and low information content. Comparative genomics approaches that simultaneously consider several related species and favor sites that have been conserved throughout evolution improve the accuracy (specificity) of the predictions but are limited due to a phenomenon called binding site turnover, where sequence evolution causes one TFBS to replace another in the same region. In parallel to this development, an increasing number of mammalian genomes are now sequenced and it is becoming possible to infer, to a surprisingly high degree of accuracy, ancestral mammalian sequences. Results We propose a TFBS prediction approach that makes use of the availability of inferred ancestral mammalian genomes to improve its accuracy. This method aims to identify binding loci, which are regions of a few hundred base pairs that have preserved their potential to bind a given transcription factor over evolutionary time. After proposing a neutral evolutionary model of predicted TFBS counts in a DNA region of a given length, we use it to identify regions that have preserved the number of predicted TFBS they contain to an unexpected degree given their divergence. The approach is applied to human chromosome 1 and shows significant gains in accuracy as compared to both existing single-species and multi-species TFBS prediction approaches, in particular for transcription factors that are subject to high turnover rates. Availability The source code and predictions made by the program are available at http://www.cs.mcgill.ca/~blanchem/bindingLoci. PMID:23281809

  4. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure

    PubMed Central

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P.

    2016-01-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  5. Comparative Genomics of Large Mitochondria in Placozoans

    PubMed Central

    Signorovitch, Ana Y; Buss, Leo W; Dellaporta, Stephen L

    2007-01-01

    The first sequenced mitochondrial genome of a placozoan, Trichoplax adhaerens, challenged the conventional wisdom that a compact mitochondrial genome is a common feature among all animals. Three additional placozoan mitochondrial genomes representing highly divergent clades have been sequenced to determine whether the large Trichoplax mtDNA is a shared feature among members of the phylum Placozoa or a uniquely derived condition. All three mitochondrial genomes were found to be very large, 32- to 37-kb, circular molecules, having the typical 12 respiratory chain genes, 24 tRNAs, rnS, and rnL. They share with the Trichoplax mitochondrial genome the absence of atp8, atp9, and all ribosomal protein genes, the presence of several cox1 introns, and a large open reading frame containing an intron group I LAGLIDADG endonuclease domain. The differences in mtDNA size within Placozoa are due to variation in intergenic spacer regions and the presence or absence of long open reading frames of unknown function. Phylogenetic analyses of the 12 respiratory chain genes support the monophyly of Placozoa. The similarities in composition and structure between the three mitochondrial genomes reported here and that of Trichoplax's mtDNA suggest that their uncompacted state is a shared ancestral feature to other nonmetazoans while their gene content is a derived feature shared only among the Metazoa. PMID:17222063

  6. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  7. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

    DOE PAGES

    van Baren, Marijke J.; Bachy, Charles; Reistetter, Emily Nahas; ...

    2016-03-31

    Prasinophytes are widespread marine green algae that are related to plants. Abundance of the genus Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these organisms are important for marine ecology and understanding Virdiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb Micromonas commoda (RCC299) shows they share ≤ 8,142 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequencedmore » eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26%) GC splice donors. Micromonas has more genus-specific protein families (19%) than other genome sequenced prasinophytes (11%). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and most plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other claasses retain the entire PG pathway, like moss and glaucophyte algae. Multiple vascular plants that share a unique bi-domain protein also have the pathway, except the Penicillin-Binding-Protein. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in the PG-pathway retention and implicate a role in chloroplast structure of division in several extant Vridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their extensive divergence. PG biosynthesis genes from the cyanobacterial endosymbiont that became the

  8. Minimal Conflicting Sets for the Consecutive Ones Property in Ancestral Genome Reconstruction

    NASA Astrophysics Data System (ADS)

    Chauve, Cedric; Haus, Utz-Uwe; Stephen, Tamon; You, Vivija P.

    A binary matrix has the Consecutive Ones Property (C1P) if its columns can be ordered in such a way that all 1’s on each row are consecutive. A Minimal Conflicting Set is a set of rows that does not have the C1P, but every proper subset has the C1P. Such submatrices have been considered in comparative genomics applications, but very little is known about their combinatorial structure and efficient algorithms to compute them. We first describe an algorithm that detects rows that belong to Minimal Conflicting Sets. This algorithm has a polynomial time complexity when the number of 1s in each row of the considered matrix is bounded by a constant. Next, we show that the problem of computing all Minimal Conflicting Sets can be reduced to the joint generation of all minimal true clause and maximal false clauses for some monotone boolean function. We use these methods in preliminary experiments on simulated data related to ancestral genome reconstruction.

  9. Ancestral genome reconstruction identifies the evolutionary basis for trait acquisition in polyphosphate accumulating bacteria

    PubMed Central

    Oyserman, Ben O; Moya, Francisco; Lawson, Christopher E; Garcia, Antonio L; Vogt, Mark; Heffernen, Mitchell; Noguera, Daniel R; McMahon, Katherine D

    2016-01-01

    The evolution of complex traits is hypothesized to occur incrementally. Identifying the transitions that lead to extant complex traits may provide a better understanding of the genetic nature of the observed phenotype. A keystone functional group in wastewater treatment processes are polyphosphate accumulating organisms (PAOs), however the evolution of the PAO phenotype has yet to be explicitly investigated and the specific metabolic traits that discriminate non-PAO from PAO are currently unknown. Here we perform the first comprehensive investigation on the evolution of the PAO phenotype using the model uncultured organism Candidatus Accumulibacter phosphatis (Accumulibacter) through ancestral genome reconstruction, identification of horizontal gene transfer, and a kinetic/stoichiometric characterization of Accumulibacter Clade IIA. The analysis of Accumulibacter's last common ancestor identified 135 laterally derived genes, including genes involved in glycogen, polyhydroxyalkanoate, pyruvate and NADH/NADPH metabolisms, as well as inorganic ion transport and regulatory mechanisms. In contrast, pathways such as the TCA cycle and polyphosphate metabolism displayed minimal horizontal gene transfer. We show that the transition from non-PAO to PAO coincided with horizontal gene transfer within Accumulibacter's core metabolism; likely alleviating key kinetic and stoichiometric bottlenecks, such as anaerobically linking glycogen degradation to polyhydroxyalkanoate synthesis. These results demonstrate the utility of investigating the derived genome of a lineage to identify key transitions leading to an extant complex phenotype. PMID:27128993

  10. Genomic organization of the crested ibis MHC provides new insight into ancestral avian MHC structure

    PubMed Central

    Chen, Li-Cheng; Lan, Hong; Sun, Li; Deng, Yan-Li; Tang, Ke-Yi; Wan, Qiu-Hong

    2015-01-01

    The major histocompatibility complex (MHC) plays an important role in immune response. Avian MHCs are not well characterized, only reporting highly compact Galliformes MHCs and extensively fragmented zebra finch MHC. We report the first genomic structure of an endangered Pelecaniformes (crested ibis) MHC containing 54 genes in three regions spanning ~500 kb. In contrast to the loose BG (26 loci within 265 kb) and Class I (11 within 150) genomic structures, the Core Region is condensed (17 within 85). Furthermore, this Region exhibits a COL11A2 gene, followed by four tandem MHC class II αβ dyads retaining two suites of anciently duplicated “αβ” lineages. Thus, the crested ibis MHC structure is entirely different from the known avian MHC architectures but similar to that of mammalian MHCs, suggesting that the fundamental structure of ancestral avian class II MHCs should be “COL11A2-IIαβ1-IIαβ2.” The gene structures, residue characteristics, and expression levels of the five class I genes reveal inter-locus functional divergence. However, phylogenetic analysis indicates that these five genes generate a well-supported intra-species clade, showing evidence for recent duplications. Our analyses suggest dramatic structural variation among avian MHC lineages, help elucidate avian MHC evolution, and provide a foundation for future conservation studies. PMID:25608659

  11. Ancestral genome reconstruction identifies the evolutionary basis for trait acquisition in polyphosphate accumulating bacteria.

    PubMed

    Oyserman, Ben O; Moya, Francisco; Lawson, Christopher E; Garcia, Antonio L; Vogt, Mark; Heffernen, Mitchell; Noguera, Daniel R; McMahon, Katherine D

    2016-12-01

    The evolution of complex traits is hypothesized to occur incrementally. Identifying the transitions that lead to extant complex traits may provide a better understanding of the genetic nature of the observed phenotype. A keystone functional group in wastewater treatment processes are polyphosphate accumulating organisms (PAOs), however the evolution of the PAO phenotype has yet to be explicitly investigated and the specific metabolic traits that discriminate non-PAO from PAO are currently unknown. Here we perform the first comprehensive investigation on the evolution of the PAO phenotype using the model uncultured organism Candidatus Accumulibacter phosphatis (Accumulibacter) through ancestral genome reconstruction, identification of horizontal gene transfer, and a kinetic/stoichiometric characterization of Accumulibacter Clade IIA. The analysis of Accumulibacter's last common ancestor identified 135 laterally derived genes, including genes involved in glycogen, polyhydroxyalkanoate, pyruvate and NADH/NADPH metabolisms, as well as inorganic ion transport and regulatory mechanisms. In contrast, pathways such as the TCA cycle and polyphosphate metabolism displayed minimal horizontal gene transfer. We show that the transition from non-PAO to PAO coincided with horizontal gene transfer within Accumulibacter's core metabolism; likely alleviating key kinetic and stoichiometric bottlenecks, such as anaerobically linking glycogen degradation to polyhydroxyalkanoate synthesis. These results demonstrate the utility of investigating the derived genome of a lineage to identify key transitions leading to an extant complex phenotype.

  12. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    SciTech Connect

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  13. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

    SciTech Connect

    van Baren, Marijke J.; Bachy, Charles; Reistetter, Emily Nahas; Purvine, Samuel O.; Grimwood, Jane; Sudek, Sebastian; Yu, Hang; Poirier, Camille; Deerinck, Thomas J.; Kuo, Alan; Grigoriev, Igor V.; Wong, Chee -Hong; Smith, Richard D.; Callister, Stephen J.; Wei, Chia -Lin; Schmutz, Jeremy; Worden, Alexandra Z.

    2016-03-31

    Prasinophytes are widespread marine green algae that are related to plants. Abundance of the genus Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these organisms are important for marine ecology and understanding Virdiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb Micromonas commoda (RCC299) shows they share ≤ 8,142 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequenced eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26%) GC splice donors. Micromonas has more genus-specific protein families (19%) than other genome sequenced prasinophytes (11%). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and most plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other claasses retain the entire PG pathway, like moss and glaucophyte algae. Multiple vascular plants that share a unique bi-domain protein also have the pathway, except the Penicillin-Binding-Protein. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in the PG-pathway retention and implicate a role in chloroplast structure of division in several extant Vridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their extensive divergence. PG biosynthesis genes from the

  14. The mitochondrial genome of the onychophoran Opisthopatus cinctipes (Peripatopsidae) reflects the ancestral mitochondrial gene arrangement of Panarthropoda and Ecdysozoa.

    PubMed

    Braband, Anke; Cameron, Stephen L; Podsiadlowski, Lars; Daniels, Savel R; Mayer, Georg

    2010-10-01

    The ancestral genome composition in Onychophora (velvet worms) is unknown since only a single species of Peripatidae has been studied thus far, which shows a highly derived gene order with numerous translocated genes. Due to this lack of information from Onychophora, it is difficult to infer the ancestral mitochondrial gene arrangement patterns for Panarthropoda and Ecdysozoa. Hence, we analyzed the complete mitochondrial genome of the onychophoran Opisthopatus cinctipes, a representative of Peripatopsidae. Our data show that O. cinctipes possesses a highly conserved gene order, similar to that found in various arthropods. By comparing our results to those from different outgroups, we reconstruct the ancestral gene arrangement in Panarthropoda and Ecdysozoa. Our phylogenetic analysis of protein-coding gene sequences from 60 protostome species (including outgroups) provides some support for the sister group relationship of Onychophora and Arthropoda, which was not recovered by using a single species of Peripatidae, Epiperipatus biolleyi, in a previous study. A comparison of the strand-specific bias between onychophorans, arthropods, and a priapulid suggests that the peripatid E. biolleyi is less suitable for phylogenetic analyses of Ecdysozoa using mitochondrial genomic data than the peripatopsid O. cinctipes.

  15. Phylogenomics of nonavian reptiles and the structure of the ancestral amniote genome.

    PubMed

    Shedlock, Andrew M; Botka, Christopher W; Zhao, Shaying; Shetty, Jyoti; Zhang, Tingting; Liu, Jun S; Deschavanne, Patrick J; Edwards, Scott V

    2007-02-20

    We report results of a megabase-scale phylogenomic analysis of the Reptilia, the sister group of mammals. Large-scale end-sequence scanning of genomic clones of a turtle, alligator, and lizard reveals diverse, mammal-like landscapes of retroelements and simple sequence repeats (SSRs) not found in the chicken. Several global genomic traits, including distinctive phylogenetic lineages of CR1-like long interspersed elements (LINEs) and a paucity of A-T rich SSRs, characterize turtles and archosaur genomes, whereas higher frequencies of tandem repeats and a lower global GC content reveal mammal-like features in Anolis. Nonavian reptile genomes also possess a high frequency of diverse and novel 50-bp unit tandem duplications not found in chicken or mammals. The frequency distributions of approximately 65,000 8-mer oligonucleotides suggest that rates of DNA-word frequency change are an order of magnitude slower in reptiles than in mammals. These results suggest a diverse array of interspersed and SSRs in the common ancestor of amniotes and a genomic conservatism and gradual loss of retroelements in reptiles that culminated in the minimalist chicken genome. The sequences reported in this paper have been deposited in the GenBank database (accession nos. CZ 250707-CZ 257443 and DX 390731-DX 389174).

  16. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus

    PubMed Central

    Vakirlis, Nikolaos; Sarilar, Véronique; Drillon, Guénola; Fleiss, Aubin; Agier, Nicolas; Meyniel, Jean-Philippe; Blanpain, Lou; Carbone, Alessandra; Devillers, Hugo; Dubois, Kenny; Gillet-Markowska, Alexandre; Graziani, Stéphane; Huu-Vang, Nguyen; Poirel, Marion; Reisser, Cyrielle; Schott, Jonathan; Schacherer, Joseph; Lafontaine, Ingrid; Llorente, Bertrand; Neuvéglise, Cécile; Fischer, Gilles

    2016-01-01

    Reconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements, and protein divergence into a single evolutionary framework. PMID:27247244

  17. Vertebrate codon bias indicates a highly GC-rich ancestral genome.

    PubMed

    Nabiyouni, Maryam; Prakash, Ashwin; Fedorov, Alexei

    2013-04-25

    Two factors are thought to have contributed to the origin of codon usage bias in eukaryotes: 1) genome-wide mutational forces that shape overall GC-content and create context-dependent nucleotide bias, and 2) positive selection for codons that maximize efficient and accurate translation. Particularly in vertebrates, these two explanations contradict each other and cloud the origin of codon bias in the taxon. On the one hand, mutational forces fail to explain GC-richness (~60%) of third codon positions, given the GC-poor overall genomic composition among vertebrates (~40%). On the other hand, positive selection cannot easily explain strict regularities in codon preferences. Large-scale bioinformatic assessment, of nucleotide composition of coding and non-coding sequences in vertebrates and other taxa, suggests a simple possible resolution for this contradiction. Specifically, we propose that the last common vertebrate ancestor had a GC-rich genome (~65% GC). The data suggest that whole-genome mutational bias is the major driving force for generating codon bias. As the bias becomes prominent, it begins to affect translation and can result in positive selection for optimal codons. The positive selection can, in turn, significantly modulate codon preferences.

  18. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage.

    PubMed

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H; Hugenholtz, Philip

    2016-04-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0-1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms.

  19. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage

    PubMed Central

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H.; Hugenholtz, Philip

    2016-01-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0–1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms. PMID:26615204

  20. Evaluation of the TREX1 gene in a large multi-ancestral lupus cohort

    PubMed Central

    Namjou, Bahram; Kothari, Parul H.; Kelly, Jennifer A.; Glenn, Stuart B.; Ojwang, Joshua O.; Adler, Adam; Alarcón-Riquelme, Marta E.; Gallant, Caroline J.; Boackle, Susan A.; Criswell, Lindsey A.; Kimberly, Robert P.; Brown, Elizabeth; Edberg, Jeffrey; Stevens, Anne M.; Jacob, Chaim O.; Tsao, Betty P.; Gilkeson, Gary S.; Kamen, Diane L.; Merrill, Joan T.; Petri, Michelle; Goldman, Rosalind Ramsey; Vila, Luis M.; Anaya, Juan-Manuel; Niewold, Timothy B.; Martin, Javier; Pons-Estel, Bernardo A.; Sabio, Jose M.; Callejas, Jose L.; Vyse, Timothy J.; Bae, Sang-Cheol; Perrino, Fred W.; Freedman, Barry I.; Scofield, R. Hal; Moser, Kathy L.; Gaffney, Patrick M.; James, Judith A.; Langefeld, Carl D.; Kaufman, Kenneth M.; Harley, John B.; Atkinson, John P.

    2011-01-01

    Systemic Lupus Erythematosus (SLE) is a prototypic autoimmune disorder with a complex pathogenesis in which genetic, hormonal and environmental factors play a role. Rare mutations in the TREX1 gene, the major mammalian 3′-5′ exonuclease, have been reported in sporadic SLE cases. Some of these mutations have also been identified in a rare pediatric neurologic condition featuring an inflammatory encephalopathy known as Aicardi-Goutières syndrome (AGS). We sought to investigate the frequency of these mutations in a large multi-ancestral cohort of SLE cases and controls. Methods Forty single-nucleotide polymorphisms (SNPs), including both common and rare variants, across the TREX1 gene were evaluated in ∼8370 patients with SLE and ∼7490 control subjects. Stringent quality control procedures were applied and principal components and admixture proportions were calculated to identify outliers for removal from analysis. Population-based case-control association analyses were performed. P values, false discovery rate q values, and odds ratios with 95% confidence intervals were calculated. Results The estimated frequency of TREX1 mutations in our lupus cohort was 0.5%. Five heterozygous mutations were detected at the Y305C polymorphism in European lupus cases but none were observed in European controls. Five African cases incurred heterozygous mutations at the E266G polymorphism and, again, none were observed in the African controls. A rare homozygous R114H mutation was identified in one Asian SLE patient whereas all genotypes at this mutation in previous reports for SLE were heterozygous. Analysis of common TREX1 SNPs (MAF >10%) revealed a relatively common risk haplotype in European SLE patients with neurologic manifestations, especially seizures, with a frequency of 58% in lupus cases compared to 45% in normal controls (p=0.0008, OR=1.73, 95% CI=1.25-2.39). Finally, the presence or absence of specific autoantibodies in certain populations produced significant

  1. Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice.

    PubMed

    Brozynska, Marta; Copetti, Dario; Furtado, Agnelo; Wing, Rod A; Crayn, Darren; Fox, Glen; Ishikawa, Ryuji; Henry, Robert J

    2016-11-27

    The related A genome species of the Oryza genus are the effective gene pool for rice. Here, we report draft genomes for two Australian wild A genome taxa: O. rufipogon-like population, referred to as Taxon A, and O. meridionalis-like population, referred to as Taxon B. These two taxa were sequenced and assembled by integration of short- and long-read next-generation sequencing (NGS) data to create a genomic platform for a wider rice gene pool. Here, we report that, despite the distinct chloroplast genome, the nuclear genome of the Australian Taxon A has a sequence that is much closer to that of domesticated rice (O. sativa) than to the other Australian wild populations. Analysis of 4643 genes in the A genome clade showed that the Australian annual, O. meridionalis, and related perennial taxa have the most divergent (around 3 million years) genome sequences relative to domesticated rice. A test for admixture showed possible introgression into the Australian Taxon A (diverged around 1.6 million years ago) especially from the wild indica/O. nivara clade in Asia. These results demonstrate that northern Australia may be the centre of diversity of the A genome Oryza and suggest the possibility that this might also be the centre of origin of this group and represent an important resource for rice improvement.

  2. Phylogenomics of primates and their ancestral populations

    PubMed Central

    Siepel, Adam

    2009-01-01

    Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics. PMID:19801602

  3. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs.

    PubMed

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-02-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs-Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa.

  4. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs

    PubMed Central

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-01-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs—Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa. PMID:26419336

  5. Genetic divergence and admixture of ancestral genome groups in the sugarcane variety 'RB867515' (Saccharum spp).

    PubMed

    Maranho, G B; Maranho, R C; Desordi, R; das Neves, A F; Mangolin, C A; Machado, M F P S

    2016-12-02

    We analyzed 80 plants of the sugarcane (Saccharum spp) variety 'RB867515' in order to investigate its diversity and genetic structure at the molecular level. Four simple sequence repeat (SSR) loci (UGSM51, SMC1237, SEGMS1069, and UGSM38) and five expressed sequence tag (EST)-SSR loci (ESTA68, ESTB92, ESTB145, ESTC66, and ESTC84) were used as molecular markers. The polymorphic loci rate was 66.6%. A total of 17 alleles and an average of 1.88 alleles/locus were detected. The number of alleles in the EST-SSR loci was lower than the number of alleles in the SSRs of non-expressed loci. The mean observed heterozygosity among the nine SSR loci was 0.3291. Genetic structure analysis showed that 'RB867515' contains alleles from three ancestral groups (K = 3), but there is little admixing of alleles in the same plant (from 0.8 to 17.3%); only 1.88% of the plants shared alleles from two or three groups. ESTB92, ESTC84, and UGSM38 were monomorphic, but there was evidence of polymorphism in ESTA68, ESTB145, ESTC66, UGSM51, SMC1237, and SEGMS1069, indicating that 'RB867515' has variability at the molecular level and the potential to be used as a parent in breeding programs. The molecular variability observed in 'RB867515' indicates that the clone terminology that is used to identify this cultivar is inconsistent with the original meaning of "clone", which is defined as a sample of genetically identical plants.

  6. The Mitochondrial Genome of the Guanaco Louse, Microthoracius praelongiceps: Insights into the Ancestral Mitochondrial Karyotype of Sucking Lice (Anoplura, Insecta)

    PubMed Central

    Li, Hu; Barker, Stephen C.

    2017-01-01

    Fragmented mitochondrial (mt) genomes have been reported in 11 species of sucking lice (suborder Anoplura) that infest humans, chimpanzees, pigs, horses, and rodents. There is substantial variation among these lice in mt karyotype: the number of minichromosomes of a species ranges from 9 to 20; the number of genes in a minichromosome ranges from 1 to 8; gene arrangement in a minichromosome differs between species, even in the same genus. We sequenced the mt genome of the guanaco louse, Microthoracius praelongiceps, to help establish the ancestral mt karyotype for sucking lice and understand how fragmented mt genomes evolved. The guanaco louse has 12 mt minichromosomes; each minichromosome has 2–5 genes and a non-coding region. The guanaco louse shares many features with rodent lice in mt karyotype, more than with other sucking lice. The guanaco louse, however, is more closely related phylogenetically to human lice, chimpanzee lice, pig lice, and horse lice than to rodent lice. By parsimony analysis of shared features in mt karyotype, we infer that the most recent common ancestor of sucking lice, which lived ∼75 Ma, had 11 minichromosomes; each minichromosome had 1–6 genes and a non-coding region. As sucking lice diverged, split of mt minichromosomes occurred many times in the lineages leading to the lice of humans, chimpanzees, and rodents whereas merger of minichromosomes occurred in the lineage leading to the lice of pigs and horses. Together, splits and mergers of minichromosomes created a very complex and dynamic mt genome organization in the sucking lice. PMID:28164215

  7. The complete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: the ancestral gene arrangement in Lepidoptera

    PubMed Central

    2012-01-01

    Background Lepidoptera encompasses more than 160,000 described species that have been classified into 45–48 superfamilies. The previously determined Lepidoptera mitochondrial genomes (mitogenomes) are limited to six superfamilies of the lineage Ditrysia. Compared with the ancestral insect gene order, these mitogenomes all contain a tRNA rearrangement. To gain new insights into Lepidoptera mitogenome evolution, we sequenced the mitogenomes of two ghost moths that belong to the non-ditrysian lineage Hepialoidea and conducted a comparative mitogenomic analysis across Lepidoptera. Results The mitogenomes of Thitarodes renzhiensis and T. yunnanensis are 16,173 bp and 15,816 bp long with an A + T content of 81.28 % and 82.34 %, respectively. Both mitogenomes include 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and the A + T-rich region. Different tandem repeats in the A + T-rich region mainly account for the size difference between the two mitogenomes. All the protein-coding genes start with typical mitochondrial initiation codons, except for cox1 (CGA) and nad1 (TTG) in both mitogenomes. The anticodon of trnS(AGN) in T. renzhiensis and T. yunnanensis is UCU instead of the mostly used GCU in other sequenced Lepidoptera mitogenomes. The 1,584-bp sequence from rrnS to nad2 was also determined for an unspecified ghost moth (Thitarodes sp.), which has no repetitive sequence in the A + T-rich region. All three Thitarodes species possess the ancestral gene order with trnI-trnQ-trnM located between the A + T-rich region and nad2, which is different from the gene order trnM-trnI-trnQ in all previously sequenced Lepidoptera species. The formerly identified conserved elements of Lepidoptera mitogenomes (i.e. the motif ‘ATAGA’ and poly-T stretch in the A + T-rich region and the long intergenic spacer upstream of nad2) are absent in the Thitarodes mitogenomes. Conclusion The mitogenomes of T. renzhiensis and T

  8. The Demosponge Amphimedon queenslandica: Reconstructing the Ancestral Metazoan Genome and Deciphering the Origin of Animal Multicellularity.

    PubMed

    Degnan, Bernard M; Adamska, Maja; Craigie, Alina; Degnan, Sandie M; Fahey, Bryony; Gauthier, Marie; Hooper, John N A; Larroux, Claire; Leys, Sally P; Lovas, Erica; Richards, Gemma S

    2008-12-01

    INTRODUCTIONSponges are one of the earliest branching metazoans. In addition to undergoing complex development and differentiation, they can regenerate via stem cells and can discern self from nonself ("allorecognition"), making them a useful comparative model for a range of metazoan-specific processes. Molecular analyses of these processes have the potential to reveal ancient homologies shared among all living animals and critical genomic innovations that underpin metazoan multicellularity. Amphimedon queenslandica (Porifera, Demospongiae, Haplosclerida, Niphatidae) is the first poriferan representative to have its genome sequenced, assembled, and annotated. Amphimedon exemplifies many sessile and sedentary marine invertebrates (e.g., corals, ascidians, bryozoans): They disperse during a planktonic larval phase, settle in the vicinity of conspecifics, ward off potential competitors (including incompatible genotypes), and ensure that brooded eggs are fertilized by conspecific sperm. Using genomic and expressed sequence tag (EST) resources from Amphimedon, functional genomic approaches can be applied to a wide range of ecological and population genetic processes, including fertilization, dispersal, and colonization dynamics, host-symbiont interactions, and secondary metabolite production. Unlike most other sponges, Amphimedon produce hundreds of asynchronously developing embryos and larvae year-round in distinct, easily accessible brood chambers. Embryogenesis gives rise to larvae with at least a dozen cell types that are segregated into three layers and patterned along the body axis. In this article, we describe some of the methods currently available for studying A. queenslandica, focusing on the analysis of embryos, larvae, and post-larvae.

  9. Ancient human genomes suggest three ancestral populations for present-day Europeans.

    PubMed

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H; Schraiber, Joshua G; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M; Brisighelli, Francesca; Busby, George B J; Cali, Francesco; Churnosov, Mikhail; Cole, David E C; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A; Gallego Romero, Irene; Gubina, Marina; Hammer, Michael; Henn, Brenna M; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl A; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G; Ruiz-Linares, Andres; Tishkoff, Sarah A; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-09-18

    We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages.

  10. Ancient human genomes suggest three ancestral populations for present-day Europeans

    PubMed Central

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H.; Schraiber, Joshua G.; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I.; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A.; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M.; Brisighelli, Francesca; Busby, George B. J.; Cali, Francesco; Churnosov, Mikhail; Cole, David E. C.; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A.; Romero, Irene Gallego; Gubina, Marina; Hammer, Michael; Henn, Brenna M.; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R.; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W.; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B.; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G.; Ruiz-Linares, Andres; Tishkoff, Sarah A.; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E.; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-01-01

    We sequenced the genomes of a ~7,000 year old farmer from Germany and eight ~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other ancient genomes1–4 with 2,345 contemporary humans to show that most present Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations’ deep relationships and show that EEF had ~44% ancestry from a “Basal Eurasian” population that split prior to the diversification of other non-African lineages. PMID:25230663

  11. Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires

    PubMed Central

    2011-01-01

    Background Genome size and complexity, as measured by the number of genes or protein domains, is remarkably similar in most extant eukaryotes and generally exhibits no correlation with their morphological complexity. Underlying trends in the evolution of the functional content and capabilities of different eukaryotic genomes might be hidden by simultaneous gains and losses of genes. Results We reconstructed the domain repertoires of putative ancestral species at major divergence points, including the last eukaryotic common ancestor (LECA). We show that, surprisingly, during eukaryotic evolution domain losses in general outnumber domain gains. Only at the base of the animal and the vertebrate sub-trees do domain gains outnumber domain losses. The observed gain/loss balance has a distinct functional bias, most strikingly seen during animal evolution, where most of the gains represent domains involved in regulation and most of the losses represent domains with metabolic functions. This trend is so consistent that clustering of genomes according to their functional profiles results in an organization similar to the tree of life. Furthermore, our results indicate that metabolic functions lost during animal evolution are likely being replaced by the metabolic capabilities of symbiotic organisms such as gut microbes. Conclusions While protein domain gains and losses are common throughout eukaryote evolution, losses oftentimes outweigh gains and lead to significant differences in functional profiles. Results presented here provide additional arguments for a complex last eukaryotic common ancestor, but also show a general trend of losses in metabolic capabilities and gain in regulatory complexity during the rise of animals. PMID:21241503

  12. Genesis of the vertebrate FoxP subfamily member genes occurred during two ancestral whole genome duplication events.

    PubMed

    Song, Xiaowei; Tang, Yezhong; Wang, Yajun

    2016-08-22

    The vertebrate FoxP subfamily genes play important roles in the construction of essential functional modules involved in physiological and developmental processes. To explore the adaptive evolution of functional modules associated with the FoxP subfamily member genes, it is necessary to study the gene duplication process. We detected four member genes of the FoxP subfamily in sea lampreys (a representative species of jawless vertebrates) through genome screenings and phylogenetic analyses. Reliable paralogons (i.e. paralogous chromosome segments) have rarely been detected in scaffolds of FoxP subfamily member genes in sea lampreys due to the considerable existence of HTH_Tnp_Tc3_2 transposases. However, these transposases did not alter gene numbers of the FoxP subfamily in sea lampreys. The coincidence between the "1-4" gene duplication pattern of FoxP subfamily genes from invertebrates to vertebrates and two rounds of ancestral whole genome duplication (1R- and 2R-WGD) events reveal that the FoxP subfamily of vertebrates was quadruplicated in the 1R- and 2R-WGD events. Furthermore, we deduced that a synchronous gene duplication process occurred for the FoxP subfamily and for three linked gene families/subfamilies (i.e. MIT family, mGluR group III and PLXNA subfamily) in the 1R- and 2R-WGD events using phylogenetic analyses and mirror-dendrogram methods (i.e. algorithms to test protein-protein interactions). Specifically, the ancestor of FoxP1 and FoxP3 and the ancestor of FoxP2 and FoxP4 were generated in 1R-WGD event. In the subsequent 2R-WGD event, these two ancestral genes were changed into FoxP1, FoxP2, FoxP3 and FoxP4. The elucidation of these gene duplication processes shed light on the phylogenetic relationships between functional modules of the FoxP subfamily member genes.

  13. The vertebrate makorin ubiquitin ligase gene family has been shaped by large-scale duplication and retroposition from an ancestral gonad-specific, maternal-effect gene

    PubMed Central

    2010-01-01

    Background Members of the makorin (mkrn) gene family encode RING/C3H zinc finger proteins with U3 ubiquitin ligase activity. Although these proteins have been described in a variety of eukaryotes such as plants, fungi, invertebrates and vertebrates including human, almost nothing is known about their structural and functional evolution. Results Via partial sequencing of a testis cDNA library from the poeciliid fish Xiphophorus maculatus, we have identified a new member of the makorin gene family, that we called mkrn4. In addition to the already described mkrn1 and mkrn2, mkrn4 is the third example of a makorin gene present in both tetrapods and ray-finned fish. However, this gene was not detected in mouse and rat, suggesting its loss in the lineage leading to rodent murids. Mkrn2 and mkrn4 are located in large ancient duplicated regions in tetrapod and fish genomes, suggesting the possible involvement of ancestral vertebrate-specific genome duplication in the formation of these genes. Intriguingly, many mkrn1 and mkrn2 intronless retrocopies have been detected in mammals but not in other vertebrates, most of them corresponding to pseudogenes. The nature and number of zinc fingers were found to be conserved in Mkrn1 and Mkrn2 but much more variable in Mkrn4, with lineage-specific differences. RT-qPCR analysis demonstrated a highly gonad-biased expression pattern for makorin genes in medaka and zebrafish (ray-finned fishes) and amphibians, but a strong relaxation of this specificity in birds and mammals. All three mkrn genes were maternally expressed before zygotic genome activation in both medaka and zebrafish early embryos. Conclusion Our analysis demonstrates that the makorin gene family has evolved through large-scale duplication and subsequent lineage-specific retroposition-mediated duplications in vertebrates. From the three major vertebrate mkrn genes, mkrn4 shows the highest evolutionary dynamics, with lineage-specific loss of zinc fingers and even complete

  14. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species.

    PubMed

    Huguet-Tapia, Jose C; Lefebure, Tristan; Badger, Jonathan H; Guan, Dongli; Pettis, Gregg S; Stanhope, Michael J; Loria, Rosemary

    2016-01-29

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer.

  15. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species

    PubMed Central

    Huguet-Tapia, Jose C.; Lefebure, Tristan; Badger, Jonathan H.; Guan, Dongli; Stanhope, Michael J.

    2016-01-01

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer. PMID:26826232

  16. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    PubMed Central

    Liu, George E; Matukumalli, Lakshmi K; Sonstegard, Tad S; Shade, Larry L; Van Tassell, Curtis P

    2006-01-01

    Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9) change/site/year) was approximately half of the overall rate (1.9–2.0 × 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies. PMID:16759380

  17. Ancestral Gene Flow and Parallel Organellar Genome Capture Result in Extreme Phylogenomic Discord in a Lineage of Angiosperms.

    PubMed

    Folk, Ryan A; Mandel, Jennifer R; Freudenstein, John V

    2016-09-16

    While hybridization has recently received a resurgence of attention from systematists and evolutionary biologists, there remains a dearth of case studies on ancient, diversified hybrid lineages-clades of organisms that originated through reticulation. Studies on these groups are valuable in that they would speak to the long-term phylogenetic success of lineages following gene flow between species. We present a phylogenomic view of Heuchera, long known for frequent hybridization, incorporating all three independent genomes: targeted nuclear (~400,000 bp), plastid (~160,000 bp), and mitochondrial (~470,000 bp) data. We analyze these data using multiple concatenation and coalescence strategies. The nuclear phylogeny is consistent with previous work and with morphology, confidently suggesting a monophyletic Heuchera By contrast, analyses of both organellar genomes recover a grossly polyphyletic Heuchera,consisting of three primary clades with relationships extensively rearranged within these as well. A minority of nuclear loci also exhibit phylogenetic discord; yet these topologies remarkably never resemble the pattern of organellar loci and largely present low levels of discord inter alia Two independent estimates of the coalescent branch length of the ancestor of Heuchera using nuclear data suggest rare or nonexistent incomplete lineage sorting with related clades, inconsistent with the observed gross polyphyly of organellar genomes (confirmed by simulation of gene trees under the coalescent). These observations, in combination with previous work, strongly suggest hybridization as the cause of this phylogenetic discord. [Ancient hybridization; chloroplast capture; incongruence; phylogenomics; reticulation.].

  18. Remnants of the Legume Ancestral Genome Preserved in Gene-Rich Regions: Insights from Lupinus angustifolius Physical, Genetic, and Comparative Mapping.

    PubMed

    Książkiewicz, Michał; Zielezinski, Andrzej; Wyrwa, Katarzyna; Szczepaniak, Anna; Rychel, Sandra; Karlowski, Wojciech; Wolko, Bogdan; Naganowska, Barbara

    The narrow-leafed lupin (Lupinus angustifolius) was recently considered as a legume reference species. Genetic resources have been developed, including a draft genome sequence, linkage maps, nuclear DNA libraries, and cytogenetic chromosome-specific landmarks. Here, we used a complex approach, involving DNA fingerprinting, sequencing, genetic mapping, and molecular cytogenetics, to localize and analyze L. angustifolius gene-rich regions (GRRs). A L. angustifolius genomic bacterial artificial chromosome (BAC) library was screened with short sequence repeat (SSR)-based probes. Selected BACs were fingerprinted and assembled into contigs. BAC-end sequence (BES) annotation allowed us to choose clones for sequencing, targeting GRRs. Additionally, BESs were aligned to the scaffolds of the genome sequence. The genetic map was supplemented with 35 BES-derived markers, distributed in 14 linkage groups and tagging 37 scaffolds. The identified GRRs had an average gene density of 19.6 genes/100 kb and physical-to-genetic distance ratios of 11 to 109 kb/cM. Physical and genetic mapping was supported by multi-BAC-fluorescence in situ hybridization (FISH), and five new linkage groups were assigned to the chromosomes. Syntenic links to the genome sequences of five legume species (Medicago truncatula, Glycine max, Lotus japonicus, Phaseolus vulgaris, and Cajanus cajan) were identified. The comparative mapping of the two largest lupin GRRs provides novel evidence for ancient duplications in all of the studied species. These regions are conserved among representatives of the main clades of Papilionoideae. Furthermore, despite the complex evolution of legumes, some segments of the nuclear genome were not substantially modified and retained their quasi-ancestral structures. Cytogenetic markers anchored in these regions constitute a platform for heterologous mapping of legume genomes.

  19. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth.

    PubMed

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-12-22

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker.

  20. The Large Genome Constraint Hypothesis: Evolution, Ecology and Phenotype

    PubMed Central

    KNIGHT, CHARLES A.; MOLINARI, NICOLE A.; PETROV, DMITRI A.

    2005-01-01

    • Background and Aims If large genomes are truly saturated with unnecessary ‘junk’ DNA, it would seem natural that there would be costs associated with accumulation and replication of this excess DNA. Here we examine the available evidence to support this hypothesis, which we term the ‘large genome constraint’. We examine the large genome constraint at three scales: evolution, ecology, and the plant phenotype. • Scope In evolution, we tested the hypothesis that plant lineages with large genomes are diversifying more slowly. We found that genera with large genomes are less likely to be highly specious – suggesting a large genome constraint on speciation. In ecology, we found that species with large genomes are under-represented in extreme environments – again suggesting a large genome constraint for the distribution and abundance of species. Ultimately, if these ecological and evolutionary constraints are real, the genome size effect must be expressed in the phenotype and confer selective disadvantages. Therefore, in phenotype, we review data on the physiological correlates of genome size, and present new analyses involving maximum photosynthetic rate and specific leaf area. Most notably, we found that species with large genomes have reduced maximum photosynthetic rates – again suggesting a large genome constraint on plant performance. Finally, we discuss whether these phenotypic correlations may help explain why species with large genomes are trimmed from the evolutionary tree and have restricted ecological distributions. • Conclusion Our review tentatively supports the large genome constraint hypothesis. PMID:15596465

  1. Genome-wide Association Study Identifies HLA 8.1 Ancestral Haplotype Alleles as Major Genetic Risk Factors for Myositis Phenotypes

    PubMed Central

    Miller, Frederick W.; Chen, Wei; O’Hanlon, Terrance P.; Cooper, Robert G.; Vencovsky, Jiri; Rider, Lisa G.; Danko, Katalin; Wedderburn, Lucy R.; Lundberg, Ingrid E.; Pachman, Lauren M.; Reed, Ann M.; Ytterberg, Steven R.; Padyukov, Leonid; Selva-O’Callaghan, Albert; Radstake, Timothy R.; Isenberg, David A.; Chinoy, Hector; Ollier, William E.R.; Scheet, Paul; Peng, Bo; Lee, Annette; Byun, Jinyoung; Lamb, Janine A.; Gregersen, Peter K.; Amos, Christopher I.

    2016-01-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis; 473 juvenile dermatomyositis; 532 polymyositis; and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P < 5 × 10−8) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1haplotype comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations. PMID:26291516

  2. Genome-wide association study identifies HLA 8.1 ancestral haplotype alleles as major genetic risk factors for myositis phenotypes.

    PubMed

    Miller, F W; Chen, W; O'Hanlon, T P; Cooper, R G; Vencovsky, J; Rider, L G; Danko, K; Wedderburn, L R; Lundberg, I E; Pachman, L M; Reed, A M; Ytterberg, S R; Padyukov, L; Selva-O'Callaghan, A; Radstake, T R; Isenberg, D A; Chinoy, H; Ollier, W E R; Scheet, P; Peng, B; Lee, A; Byun, J; Lamb, J A; Gregersen, P K; Amos, C I

    2015-10-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis, 473 juvenile dermatomyositis, 532 polymyositis and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl-tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P<5×10(-8)) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1 comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations.

  3. The diversity of class II transposable elements in mammalian genomes has arisen from ancestral phylogenetic splits during ancient waves of proliferation through the genome.

    PubMed

    Hellen, Elizabeth H B; Brookfield, John F Y

    2013-01-01

    DNA transposons make up 3% of the human genome, approximately the same percentage as genes. However, because of their inactivity, they are often ignored in favor of the more abundant, active, retroelements. Despite this relative ignominy, there are a number of interesting questions to be asked of these transposon families. One particular question relates to the timing of proliferation and inactivation of elements in a family. Does an ongoing process of turnover occur, or is the process more akin to a life cycle for the family, with elements proliferating rapidly before deactivation at a later date? We answer this question by tracing back to the most recent common ancestor (MRCA) of each modern transposon family, using two different methods. The first method identifies the MRCA of the species in which a family of transposon fossils can still be found, which we assume will have existed soon after the true origin date of the transposon family. The second method uses molecular dating techniques to predict the age of the MRCA element from which all elements found in a modern genome are descended. Independent data from five pairs of species are used in the molecular dating analysis: human-chimpanzee, human-orangutan, dog-panda, dog-cat, and cow-pig. Orthologous pairs of elements from host species pairs are included, and the divergence dates of these species are used to constrain the analysis. We discover that, in general, the times to element common ancestry for a given family are the same for the different species pairs, suggesting that there has been no order-specific process of turnover. Furthermore, for most families, the ages of the common ancestor of the host species and of that of the elements are similar, suggesting a life cycle model for the proliferation of transposons. Where these two ages differ, in families found only in Primates and Rodentia, for example, we find that the host species date is later than that of the common ancestor of the elements, implying

  4. Large insert environmental genomic library production.

    PubMed

    Taupp, Marcus; Lee, Sangwon; Hawley, Alyse; Yang, Jinshu; Hallam, Steven J

    2009-09-23

    The vast majority of microbes in nature currently remain inaccessible to traditional cultivation methods. Over the past decade, culture-independent environmental genomic (i.e. metagenomic) approaches have emerged, enabling researchers to bridge this cultivation gap by capturing the genetic content of indigenous microbial communities directly from the environment. To this end, genomic DNA libraries are constructed using standard albeit artful laboratory cloning techniques. Here we describe the construction of a large insert environmental genomic fosmid library with DNA derived from the vertical depth continuum of a seasonally hypoxic fjord. This protocol is directly linked to a series of connected protocols including coastal marine water sampling [1], large volume filtration of microbial biomass [2] and a DNA extraction and purification protocol [3]. At the outset, high quality genomic DNA is end-repaired with the creation of 5 -phosphorylated blunt ends. End-repaired DNA is subjected to pulsed-field gel electrophoresis (PFGE) for size selection and gel extraction is performed to recover DNA fragments between 30 and 60 thousand base pairs (Kb) in length. Size selected DNA is purified away from the PFGE gel matrix and ligated to the phosphatase-treated blunt-end fosmid CopyControl vector pCC1 (EPICENTRE http://www.epibio.com/item.asp?ID=385). Linear concatemers of pCC1 and insert DNA are subsequently headfull packaged into phage particles by lambda terminase, with subsequent infection of phage-resistant E. coli cells. Successfully transduced clones are recovered on LB agar plates under antibiotic selection and archived in 384-well plate format using an automated colony picking robot (Qpix2, GENETIX). The current protocol draws from various sources including the CopyControl Fosmid Library Production Kit from EPICENTRE and the published works of multiple research groups [4-7]. Each step is presented with best practice in mind. Whenever possible we highlight subtleties

  5. Comparative genome maps of the pangolin, hedgehog, sloth, anteater and human revealed by cross-species chromosome painting: further insight into the ancestral karyotype and genome evolution of eutherian mammals.

    PubMed

    Yang, Fengtang; Graphodatsky, Alexander S; Li, Tangliang; Fu, Beiyuan; Dobigny, Gauthier; Wang, Jinghuan; Perelman, Polina L; Serdukova, Natalya A; Su, Weiting; O'Brien, Patricia Cm; Wang, Yingxiang; Ferguson-Smith, Malcolm A; Volobouev, Vitaly; Nie, Wenhui

    2006-01-01

    To better understand the evolution of genome organization of eutherian mammals, comparative maps based on chromosome painting have been constructed between human and representative species of three eutherian orders: Xenarthra, Pholidota, and Eulipotyphla, as well as between representative species of the Carnivora and Pholidota. These maps demonstrate the conservation of such syntenic segment associations as HSA3/21, 4/8, 7/16, 12/22, 14/15 and 16/19 in Eulipotyphla, Pholidota and Xenarthra and thus further consolidate the notion that they form part of the ancestral karyotype of the eutherian mammals. Our study has revealed many potential ancestral syntenic associations of human chromosomal segments that serve to link the families as well as orders within the major superordinial eutherian clades defined by molecular markers. The HSA2/8 and 7/10 associations could be the cytogenetic signatures that unite the Xenarthrans, while the HSA1/19p could be a putative signature that links the Afrotheria and Xenarthra. But caution is required in the interpretation of apparently shared syntenic associations as detailed analyses also show examples of apparent convergent evolution that differ in breakpoints and extent of the involved segments.

  6. Core-SINE blocks comprise a large fraction of monotreme genomes; implications for vertebrate chromosome evolution.

    PubMed

    Kirby, Patrick J; Greaves, Ian K; Koina, Edda; Waters, Paul D; Marshall Graves, Jennifer A

    2007-01-01

    The genomes of the egg-laying platypus and echidna are of particular interest because monotremes are the most basal mammal group. The chromosomal distribution of an ancient family of short interspersed repeats (SINEs), the core-SINEs, was investigated to better understand monotreme genome organization and evolution. Previous studies have identified the core-SINE as the predominant SINE in the platypus genome, and in this study we quantified, characterized and localized subfamilies. Dot blot analysis suggested that a very large fraction (32% of the platypus and 16% of the echidna genome) is composed of Mon core-SINEs. Core-SINE-specific primers were used to amplify PCR products from platypus and echidna genomic DNA. Sequence analysis suggests a common consensus sequence Mon 1-B, shared by platypus and echidna, as well as platypus-specific Mon 1-C and echidna specific Mon 1-D consensus sequences. FISH mapping of the Mon core-SINE products to platypus metaphase spreads demonstrates that the Mon-1C subfamily is responsible for the striking Mon core-SINE accumulation in the distal regions of the six large autosomal pairs and the largest X chromosome. This unusual distribution highlights the dichotomy between the seven large chromosome pairs and the 19 smaller pairs in the monotreme karyotype, which has some similarity to the macro- and micro-chromosomes of birds and reptiles, and suggests that accumulation of repetitive sequences may have enlarged small chromosomes in an ancestral vertebrate. In the forthcoming sequence of the platypus genome there are still large gaps, and the extensive Mon core-SINE accumulation on the distal regions of the six large autosomal pairs may provide one explanation for this missing sequence.

  7. Eukaryotic large nucleo-cytoplasmic DNA viruses: Clusters of orthologous genes and reconstruction of viral genome evolution

    PubMed Central

    2009-01-01

    Background The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes. Results A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions. Conclusions The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses. PMID:20017929

  8. A consensus map in cultivated hexaploid oat reveals conserved grass synteny with substantial sub-genome rearrangement

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Hexaploid oat (Avena sativa, 2n = 6x = 42) is a member of the Poaceae family with a very large genome (~13 Gb) containing 21 chromosome pairs: seven from each of two similar ancestral diploids (A and D) and seven from a more diverged ancestral diploid (C). Physical rearrangements among ancestral oat...

  9. Simplified DGS procedure for large-scale genome structural study.

    PubMed

    Jung, Yong-Chul; Xu, Jia; Chen, Jun; Kim, Yeong; Winchester, David; Wang, San Ming

    2009-11-01

    Ditag genome scanning (DGS) uses next-generation DNA sequencing to sequence the ends of ditag fragments produced by restriction enzymes. These sequences are compared to known genome sequences to determine their structure. In order to use DGS for large-scale genome structural studies, we have substantially revised the original protocol by replacing the in vivo genomic DNA cloning with in vitro adaptor ligation, eliminating the ditag concatemerization steps, and replacing the 454 sequencer with Solexa or SOLiD sequencers for ditag sequence collection. This revised protocol further increases genome coverage and resolution and allows DGS to be used to analyze multiple genomes simultaneously.

  10. Precision Editing of Large Animal Genomes

    PubMed Central

    Tan, Wenfang (Spring); Carlson, Daniel F.; Walton, Mark W.; Fahrenkrug, Scott C.; Hackett, Perry B.

    2013-01-01

    Transgenic animals are an important source of protein and nutrition for most humans and will play key roles in satisfying the increasing demand for food in an ever-increasing world population. The past decade has experienced a revolution in the development of methods that permit the introduction of specific alterations to complex genomes. This precision will enhance genome-based improvement of farm animals for food production. Precision genetics also will enhance the development of therapeutic biomaterials and models of human disease as resources for the development of advanced patient therapies. PMID:23084873

  11. In the fast lane: large-scale bacterial genome engineering.

    PubMed

    Fehér, Tamás; Burland, Valerie; Pósfai, György

    2012-07-31

    The last few years have witnessed rapid progress in bacterial genome engineering. The long-established, standard ways of DNA synthesis, modification, transfer into living cells, and incorporation into genomes have given way to more effective, large-scale, robust genome modification protocols. Expansion of these engineering capabilities is due to several factors. Key advances include: (i) progress in oligonucleotide synthesis and in vitro and in vivo assembly methods, (ii) optimization of recombineering techniques, (iii) introduction of parallel, large-scale, combinatorial, and automated genome modification procedures, and (iv) rapid identification of the modifications by barcode-based analysis and sequencing. Combination of the brute force of these techniques with sophisticated bioinformatic design and modeling opens up new avenues for the analysis of gene functions and cellular network interactions, but also in engineering more effective producer strains. This review presents a summary of recent technological advances in bacterial genome engineering.

  12. Identification of large-scale genomic variation in cancer genomes using in silico reference models

    PubMed Central

    Killcoyne, Sarah; del Sol, Antonio

    2016-01-01

    Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed. PMID:26264669

  13. Exon capture optimization in amphibians with large genomes.

    PubMed

    McCartney-Melstad, Evan; Mount, Genevieve G; Shaffer, H Bradley

    2016-09-01

    Gathering genomic-scale data efficiently is challenging for nonmodel species with large, complex genomes. Transcriptome sequencing is accessible for organisms with large genomes, and sequence capture probes can be designed from such mRNA sequences to enrich and sequence exonic regions. Maximizing enrichment efficiency is important to reduce sequencing costs, but relatively few data exist for exon capture experiments in nonmodel organisms with large genomes. Here, we conducted a replicated factorial experiment to explore the effects of several modifications to standard protocols that might increase sequence capture efficiency for amphibians and other taxa with large, complex genomes. Increasing the amounts of c0 t-1 repetitive sequence blocker and individual input DNA used in target enrichment reactions reduced the rates of PCR duplication. This reduction led to an increase in the percentage of unique reads mapping to target sequences, essentially doubling overall efficiency of the target capture from 10.4% to nearly 19.9% and rendering target capture experiments more efficient and affordable. Our results indicate that target capture protocols can be modified to efficiently screen vertebrates with large genomes, including amphibians.

  14. Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

    PubMed

    Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

    2014-12-01

    Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits.

  15. Territorial Polymers and Large Scale Genome Organization

    NASA Astrophysics Data System (ADS)

    Grosberg, Alexander

    2012-02-01

    Chromatin fiber in interphase nucleus represents effectively a very long polymer packed in a restricted volume. Although polymer models of chromatin organization were considered, most of them disregard the fact that DNA has to stay not too entangled in order to function properly. One polymer model with no entanglements is the melt of unknotted unconcatenated rings. Extensive simulations indicate that rings in the melt at large length (monomer numbers) N approach the compact state, with gyration radius scaling as N^1/3, suggesting every ring being compact and segregated from the surrounding rings. The segregation is consistent with the known phenomenon of chromosome territories. Surface exponent β (describing the number of contacts between neighboring rings scaling as N^β) appears only slightly below unity, β 0.95. This suggests that the loop factor (probability to meet for two monomers linear distance s apart) should decay as s^-γ, where γ= 2 - β is slightly above one. The later result is consistent with HiC data on real human interphase chromosomes, and does not contradict to the older FISH data. The dynamics of rings in the melt indicates that the motion of one ring remains subdiffusive on the time scale well above the stress relaxation time.

  16. Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    PubMed

    Quail, Mike A; Matthews, Lucy; Sims, Sarah; Lloyd, Christine; Beasley, Helen; Baxter, Simon W

    2011-01-01

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

  17. Asymptotic Distributions of Coalescence Times and Ancestral Lineage Numbers for Populations with Temporally Varying Size

    PubMed Central

    Chen, Hua; Chen, Kun

    2013-01-01

    The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages n − An(t) follows a Poisson distribution, and as m → n, n(n−1)Tm/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference. PMID:23666939

  18. Asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size.

    PubMed

    Chen, Hua; Chen, Kun

    2013-07-01

    The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages n - An(t) follows a Poisson distribution, and as m → n, $$n\\left(n-1\\right){T}_{m}/2N\\left(0\\right)$$ follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference.

  19. Ancestral gene synteny reconstruction improves extant species scaffolding.

    PubMed

    Anselmetti, Yoann; Berry, Vincent; Chauve, Cedric; Chateau, Annie; Tannier, Eric; Bérard, Sèverine

    2015-01-01

    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes.

  20. Ancestral gene synteny reconstruction improves extant species scaffolding

    PubMed Central

    2015-01-01

    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes. PMID:26450761

  1. Types and rates of sequence evolution at the high-molecular-weight glutenin locus in hexaploid wheat and its ancestral genomes.

    PubMed

    Gu, Yong Qiang; Salse, Jérôme; Coleman-Derr, Devin; Dupin, Adeline; Crossman, Curt; Lazo, Gerard R; Huo, Naxin; Belcram, Harry; Ravel, Catherine; Charmet, Gilles; Charles, Mathieu; Anderson, Olin D; Chalhoub, Boulos

    2006-11-01

    The Glu-1 locus, encoding the high-molecular-weight glutenin protein subunits, controls bread-making quality in hexaploid wheat (Triticum aestivum) and represents a recently evolved region unique to Triticeae genomes. To understand the molecular evolution of this locus region, three orthologous Glu-1 regions from the three subgenomes of a single hexaploid wheat species were sequenced, totaling 729 kb of sequence. Comparing each Glu-1 region with its corresponding homologous region from the D genome of diploid wheat, Aegilops tauschii, and the A and B genomes of tetraploid wheat, Triticum turgidum, revealed that, in addition to the conservation of microsynteny in the genic regions, sequences in the intergenic regions, composed of blocks of nested retroelements, are also generally conserved, although a few nonshared retroelements that differentiate the homologous Glu-1 regions were detected in each pair of the A and D genomes. Analysis of the indel frequency and the rate of nucleotide substitution, which represent the most frequent types of sequence changes in the Glu-1 regions, demonstrated that the two A genomes are significantly more divergent than the two B genomes, further supporting the hypothesis that hexaploid wheat may have more than one tetraploid ancestor.

  2. Large-scale data mining pilot project in human genome

    SciTech Connect

    Musick, R.; Fidelis, R.; Slezak, T.

    1997-05-01

    This whitepaper briefly describes a new, aggressive effort in large- scale data Livermore National Labs. The implications of `large- scale` will be clarified Section. In the short term, this effort will focus on several @ssion-critical questions of Genome project. We will adapt current data mining techniques to the Genome domain, to quantify the accuracy of inference results, and lay the groundwork for a more extensive effort in large-scale data mining. A major aspect of the approach is that we will be fully-staffed data warehousing effort in the human Genome area. The long term goal is strong applications- oriented research program in large-@e data mining. The tools, skill set gained will be directly applicable to a wide spectrum of tasks involving a for large spatial and multidimensional data. This includes applications in ensuring non-proliferation, stockpile stewardship, enabling Global Ecology (Materials Database Industrial Ecology), advancing the Biosciences (Human Genome Project), and supporting data for others (Battlefield Management, Health Care).

  3. Large-scale investigation of genomic markers for severe periodontitis.

    PubMed

    Suzuki, Asami; Ji, Guijin; Numabe, Yukihiro; Ishii, Keisuke; Muramatsu, Masaaki; Kamoi, Kyuichi

    2004-09-01

    The purpose of the present study was to investigate the genomic markers for periodontitis, using large-scale single-nucleotide polymorphism (SNP) association studies comparing healthy volunteers and patients with periodontitis. Genomic DNA was obtained from 19 healthy volunteers and 22 patients with severe periodontitis, all of whom were Japanese. The subjects were genotyped at 637 SNPs in 244 genes on a large scale, using the TaqMan polymerase chain reaction (PCR) system. Statistically significant differences in allele and genotype frequencies were analyzed with Fisher's exact test. We found statistically significant differences (P < 0.01) between the healthy volunteers and patients with severe periodontitis in the following genes; gonadotropin-releasing hormone 1 (GNRH1), phosphatidylinositol 3-kinase regulatory 1 (PIK3R1), dipeptidylpeptidase 4 (DPP4), fibrinogen-like 2 (FGL2), and calcitonin receptor (CALCR). These results suggest that SNPs in the GNRH1, PIK3R1, DPP4, FGL2, and CALCR genes are genomic markers for severe periodontitis. Our findings indicate the necessity of analyzing SNPs in genes on a large scale (i.e., genome-wide approach), to identify genomic markers for periodontitis.

  4. The maize genome as a model for efficient sequence analysis of large plant genomes.

    PubMed

    Rabinowicz, Pablo D; Bennetzen, Jeffrey L

    2006-04-01

    The genomes of flowering plants vary in size from about 0.1 to over 100 gigabase pairs (Gbp), mostly because of polyploidy and variation in the abundance of repetitive elements in intergenic regions. High-quality sequences of the relatively small genomes of Arabidopsis (0.14 Gbp) and rice (0.4 Gbp) have now been largely completed. The sequencing of plant genomes that have a more representative size (the mean for flowering plant genomes is 5.6 Gbp) has been seen as a daunting task, partly because of their size and partly because of the numerous highly conserved repeats. Nevertheless, creative strategies and powerful new tools have been generated recently in the plant genetics community, so that sequencing large plant genomes is now a realistic possibility. Maize (2.4-2.7 Gbp) will be the first gigabase-size plant genome to be sequenced using these novel approaches. Pilot studies on maize indicate that the new gene-enrichment, gene-finishing and gene-orientation technologies are efficient, robust and comprehensive. These strategies will succeed in sequencing the gene-space of large genome plants, and in locating all of these genes and adjacent sequences on the genetic and physical maps.

  5. Large-scale genomic analysis of ovarian carcinomas.

    PubMed

    Gorringe, Kylie L; Campbell, Ian G

    2009-04-01

    Epithelial ovarian cancers are typified by frequent genomic aberrations that have been difficult to unravel. Recently, high-resolution array technologies have provided the first glimpse of the remarkable complexity of these aberrations with some ovarian cancers containing hundreds of copy number breakpoints, micro-deletions and amplifications. Many of these alterations contain cancer-related genes suggesting that the majority is disease-associated and not just the product of random genomic instability. Future developments such as next-generation sequencing and integrated analysis of data from multiple array platforms on large numbers of samples are poised to revolutionize our understanding of this complex disease.

  6. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    SciTech Connect

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan; DiFazio, Steven P; Tuskan, Gerald A

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel and fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.

  7. Next-generation sequencing and large genome assemblies

    PubMed Central

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-01-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed. PMID:22676195

  8. Ancestral Relationships Using Metafounders: Finite Ancestral Populations and Across Population Relationships

    PubMed Central

    Legarra, Andres; Christensen, Ole F.; Vitezica, Zulma G.; Aguilar, Ignacio; Misztal, Ignacy

    2015-01-01

    Recent use of genomic (marker-based) relationships shows that relationships exist within and across base population (breeds or lines). However, current treatment of pedigree relationships is unable to consider relationships within or across base populations, although such relationships must exist due to finite size of the ancestral population and connections between populations. This complicates the conciliation of both approaches and, in particular, combining pedigree with genomic relationships. We present a coherent theoretical framework to consider base population in pedigree relationships. We suggest a conceptual framework that considers each ancestral population as a finite-sized pool of gametes. This generates across-individual relationships and contrasts with the classical view which each population is considered as an infinite, unrelated pool. Several ancestral populations may be connected and therefore related. Each ancestral population can be represented as a “metafounder,” a pseudo-individual included as founder of the pedigree and similar to an “unknown parent group.” Metafounders have self- and across relationships according to a set of parameters, which measure ancestral relationships, i.e., homozygozities within populations and relationships across populations. These parameters can be estimated from existing pedigree and marker genotypes using maximum likelihood or a method based on summary statistics, for arbitrarily complex pedigrees. Equivalences of genetic variance and variance components between the classical and this new parameterization are shown. Segregation variance on crosses of populations is modeled. Efficient algorithms for computation of relationship matrices, their inverses, and inbreeding coefficients are presented. Use of metafounders leads to compatibility of genomic and pedigree relationship matrices and to simple computing algorithms. Examples and code are given. PMID:25873631

  9. Distinguishing Recent Admixture from Ancestral Population Structure

    PubMed Central

    Slatkin, Montgomery

    2017-01-01

    We develop and test two methods for distinguishing between recent admixture and ancestral population structure as explanations for greater similarity of one of two populations to an outgroup population. This problem arose when Neanderthals were found to be slightly more similar to nonAfrican than to African populations. The excess similarity is consistent with both recent admixture from Neanderthals into the ancestors of nonAfricans and subdivision in the ancestral population. Although later studies showed that there had been recent admixture, distinguishing between these two classes of models will be important in other situations, particularly when high-coverage genomes cannot be obtained for all populations. One of our two methods is based on the properties of the doubly conditioned frequency spectrum combined with the unconditional frequency spectrum. This method does not require a linkage map and can be used when there is relatively low coverage. The second method uses the extent of linkage disequilibrium among closely linked markers. PMID:28186554

  10. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    PubMed

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution.

  11. Ancestral polyploidy in seed plants and angiosperms.

    PubMed

    Jiao, Yuannian; Wickett, Norman J; Ayyampalayam, Saravanaraj; Chanderbali, André S; Landherr, Lena; Ralph, Paula E; Tomsho, Lynn P; Hu, Yi; Liang, Haiying; Soltis, Pamela S; Soltis, Douglas E; Clifton, Sandra W; Schlarbaum, Scott E; Schuster, Stephan C; Ma, Hong; Leebens-Mack, Jim; dePamphilis, Claude W

    2011-05-05

    Whole-genome duplication (WGD), or polyploidy, followed by gene loss and diploidization has long been recognized as an important evolutionary force in animals, fungi and other organisms, especially plants. The success of angiosperms has been attributed, in part, to innovations associated with gene or whole-genome duplications, but evidence for proposed ancient genome duplications pre-dating the divergence of monocots and eudicots remains equivocal in analyses of conserved gene order. Here we use comprehensive phylogenomic analyses of sequenced plant genomes and more than 12.6 million new expressed-sequence-tag sequences from phylogenetically pivotal lineages to elucidate two groups of ancient gene duplications-one in the common ancestor of extant seed plants and the other in the common ancestor of extant angiosperms. Gene duplication events were intensely concentrated around 319 and 192 million years ago, implicating two WGDs in ancestral lineages shortly before the diversification of extant seed plants and extant angiosperms, respectively. Significantly, these ancestral WGDs resulted in the diversification of regulatory genes important to seed and flower development, suggesting that they were involved in major innovations that ultimately contributed to the rise and eventual dominance of seed plants and angiosperms.

  12. Are palaeoscolecids ancestral ecdysozoans?

    PubMed

    Harvey, Thomas H P; Dong, Xiping; Donoghue, Philip C J

    2010-01-01

    The reconstruction of ancestors is a central aim of comparative anatomy and evolutionary developmental biology, not least in attempts to understand the relationship between developmental and organismal evolution. Inferences based on living taxa can and should be tested against the fossil record, which provides an independent and direct view onto historical character combinations. Here, we consider the nature of the last common ancestor of living ecdysozoans through a detailed analysis of palaeoscolecids, an early and extinct group of introvert-bearing worms that have been proposed to be ancestral ecdysozoans. In a review of palaeoscolecid anatomy, including newly resolved details of the internal and external cuticle structure, we identify specific characters shared with various living nematoid and scalidophoran worms, but not with panarthropods. Considered within a formal cladistic context, these characters provide most overall support for a stem-priapulid affinity, meaning that palaeoscolecids are far-removed from the ecdysozoan ancestor. We conclude that previous interpretations in which palaeoscolecids occupy a deeper position in the ecdysozoan tree lack particular morphological support and rely instead on a paucity of preserved characters. This bears out a more general point that fossil taxa may appear plesiomorphic merely because they preserve only plesiomorphies, rather than the mélange of primitive and derived characters anticipated of organisms properly allocated to a position deep within animal phylogeny.

  13. The Mitochondrial Genome of the Leaf-Cutter Ant Atta laevigata: A Mitogenome with a Large Number of Intergenic Spacers

    PubMed Central

    Rodovalho, Cynara de Melo; Lyra, Mariana Lúcio; Ferro, Milene; Bacci, Maurício

    2014-01-01

    In this paper we describe the nearly complete mitochondrial genome of the leaf-cutter ant Atta laevigata, assembled using transcriptomic libraries from Sanger and Illumina next generation sequencing (NGS), and PCR products. This mitogenome was found to be very large (18,729 bp), given the presence of 30 non-coding intergenic spacers (IGS) spanning 3,808 bp. A portion of the putative control region remained unsequenced. The gene content and organization correspond to that inferred for the ancestral pancrustacea, except for two tRNA gene rearrangements that have been described previously in other ants. The IGS were highly variable in length and dispersed through the mitogenome. This pattern was also found for the other hymenopterans in particular for the monophyletic Apocrita. These spacers with unknown function may be valuable for characterizing genome evolution and distinguishing closely related species and individuals. NGS provided better coverage than Sanger sequencing, especially for tRNA and ribosomal subunit genes, thus facilitating efforts to fill in sequence gaps. The results obtained showed that data from transcriptomic libraries contain valuable information for assembling mitogenomes. The present data also provide a source of molecular markers that will be very important for improving our understanding of genomic evolutionary processes and phylogenetic relationships among hymenopterans. PMID:24828084

  14. The mitochondrial genome of the leaf-cutter ant Atta laevigata: a mitogenome with a large number of intergenic spacers.

    PubMed

    Rodovalho, Cynara de Melo; Lyra, Mariana Lúcio; Ferro, Milene; Bacci, Maurício

    2014-01-01

    In this paper we describe the nearly complete mitochondrial genome of the leaf-cutter ant Atta laevigata, assembled using transcriptomic libraries from Sanger and Illumina next generation sequencing (NGS), and PCR products. This mitogenome was found to be very large (18,729 bp), given the presence of 30 non-coding intergenic spacers (IGS) spanning 3,808 bp. A portion of the putative control region remained unsequenced. The gene content and organization correspond to that inferred for the ancestral pancrustacea, except for two tRNA gene rearrangements that have been described previously in other ants. The IGS were highly variable in length and dispersed through the mitogenome. This pattern was also found for the other hymenopterans in particular for the monophyletic Apocrita. These spacers with unknown function may be valuable for characterizing genome evolution and distinguishing closely related species and individuals. NGS provided better coverage than Sanger sequencing, especially for tRNA and ribosomal subunit genes, thus facilitating efforts to fill in sequence gaps. The results obtained showed that data from transcriptomic libraries contain valuable information for assembling mitogenomes. The present data also provide a source of molecular markers that will be very important for improving our understanding of genomic evolutionary processes and phylogenetic relationships among hymenopterans.

  15. Genomic evidence for a large-Z effect.

    PubMed

    Ellegren, Hans

    2009-01-22

    The 'large-X effect' suggests that sex chromosomes play a disproportionate role in adaptive evolution. Theoretical work indicates that this effect may be most pronounced in genetic systems with female heterogamety under both good-genes and Fisher's runaway models of sexual selection (males ZZ, females ZW). Here, I use a comparative genomic approach (alignments of several thousands of chicken-zebra finch-human-mouse-opossum orthologues) to show that avian Z-linked genes are highly overrepresented among those bird-mammalian orthologues that show evidence of accelerated rate of functional evolution in birds relative to mammals; the data suggest a twofold excess of such genes on the Z chromosome. A reciprocal analysis of genes accelerated in mammals found no evidence for an excess of X-linkage. This would be compatible with theoretical expectations for differential selection on sex-linked genes under male and female heterogamety, although the power in this case was not sufficient to statistically show that 'large-Z' was more pronounced than 'large-X'. Accelerated Z-linked genes include a variety of functional categories and are characterized by higher non-synonymous to synonymous substitution rate ratios than both accelerated autosomal and non-accelerated genes. This points at a genomic 'large-Z effect', which is widespread and of general significance for adaptive divergence in birds.

  16. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  17. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    SciTech Connect

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  18. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications

    PubMed Central

    2013-01-01

    Background Vertebrate color vision is dependent on four major color opsin subtypes: RH2 (green opsin), SWS1 (ultraviolet opsin), SWS2 (blue opsin), and LWS (red opsin). Together with the dim-light receptor rhodopsin (RH1), these form the family of vertebrate visual opsins. Vertebrate genomes contain many multi-membered gene families that can largely be explained by the two rounds of whole genome duplication (WGD) in the vertebrate ancestor (2R) followed by a third round in the teleost ancestor (3R). Related chromosome regions resulting from WGD or block duplications are said to form a paralogon. We describe here a paralogon containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage-gated calcium channels (CACNA1-L). Results Sequence-based phylogenies and analyses of conserved synteny show that the above-mentioned gene families, and many neighboring gene families, expanded in the early vertebrate WGDs. This allows us to deduce the following evolutionary scenario: The vertebrate ancestor had a chromosome containing the genes for two visual opsins, one GNAT, one GNAI, two OT/VP-Rs and one CACNA1-L gene. This chromosome was quadrupled in 2R. Subsequent gene losses resulted in a set of five visual opsin genes, three GNAT and GNAI genes, six OT/VP-R genes and four CACNA1-L genes. These regions were duplicated again in 3R resulting in additional teleost genes for some of the families. Major chromosomal rearrangements have taken place in the teleost genomes. By comparison with the corresponding chromosomal regions in the spotted gar, which diverged prior to 3R, we could time these rearrangements to post-3R. Conclusions We present an extensive analysis of the paralogon housing the visual opsin, GNAT and GNAI, OT/VP-R, and CACNA1-L gene families. The combined data imply that the early vertebrate WGD events contributed to the

  19. The Ancestral Gene for Transcribed, Low-Copy Repeats in the Prader-Willi/Angleman Region Encodes a Large Protein Implicated in Protein Trafficking that is Deficient in Mice with Neuromuscular and

    SciTech Connect

    Ji, Y.

    1999-01-01

    Transcribed, low-copy repeat elements are associated with the breakpoint regions of common deletions in Prader-Willi and Angelman syndromes. We report here the identification of the ancestral gene ( HERC2 ) and a family of duplicated, truncated copies that comprise these low-copy repeats. This gene encodes a highly conserved giant protein, HERC2, that is distantly related to p532 (HERC1), a guanine nucleotide exchange factor (GEF) implicated in vesicular trafficking. The mouse genome contains a single Herc2 locus, located in the jdf2 (juvenile development and fertility-2) interval of chromosome 7C. We have identified single nucleotide splice junction mutations in Herc2 in three independent N-ethyl-N-nitrosourea-induced jdf2 mutant alleles, each leading to exon skipping with premature termination of translation and/or deletion of conserved amino acids. Therefore, mutations in Herc2 lead to the neuromuscular secretory vesicle and sperm acrosome defects, other developmental abnormalities and juvenile lethality of jdf2 mice. Combined, these findings suggest that HERC2 is an important gene encoding a GEF involved in protein trafficking and degradation pathways in the cell.

  20. What was the ancestral sex-determining mechanism in amniote vertebrates?

    PubMed

    Johnson Pokorná, Martina; Kratochvíl, Lukáš

    2016-02-01

    Amniote vertebrates, the group consisting of mammals and reptiles including birds, possess various mechanisms of sex determination. Under environmental sex determination (ESD), the sex of individuals depends on the environmental conditions occurring during their development and therefore there are no sexual differences present in their genotypes. Alternatively, through the mode of genotypic sex determination (GSD), sex is determined by a sex-specific genotype, i.e. by the combination of sex chromosomes at various stages of differentiation at conception. As well as influencing sex determination, sex-specific parts of genomes may, and often do, develop specific reproductive or ecological roles in their bearers. Accordingly, an individual with a mismatch between phenotypic (gonadal) and genotypic sex, for example an individual sex-reversed by environmental effects, should have a lower fitness due to the lack of specialized, sex-specific parts of their genome. In this case, evolutionary transitions from GSD to ESD should be less likely than transitions in the opposite direction. This prediction contrasts with the view that GSD was the ancestral sex-determining mechanism for amniote vertebrates. Ancestral GSD would require several transitions from GSD to ESD associated with an independent dedifferentiation of sex chromosomes, at least in the ancestors of crocodiles, turtles, and lepidosaurs (tuataras and squamate reptiles). In this review, we argue that the alternative theory postulating ESD as ancestral in amniotes is more parsimonious and is largely concordant with the theoretical expectations and current knowledge of the phylogenetic distribution and homology of sex-determining mechanisms.

  1. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine

    PubMed Central

    Biesecker, Leslie G.; Mullikin, James C.; Facio, Flavia M.; Turner, Clesson; Cherukuri, Praveen F.; Blakesley, Robert W.; Bouffard, Gerard G.; Chines, Peter S.; Cruz, Pedro; Hansen, Nancy F.; Teer, Jamie K.; Maskeri, Baishali; Young, Alice C.; Manolio, Teri A.; Wilson, Alexander F.; Finkel, Toren; Hwang, Paul; Arai, Andrew; Remaley, Alan T.; Sachdev, Vandana; Shamburek, Robert; Cannon, Richard O.; Green, Eric D.

    2009-01-01

    ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicine, including the exploration of issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and archiving, analyzing, and displaying sequence data. In the initial phase of ClinSeq, we are enrolling roughly 1000 participants; the evaluation of each includes obtaining a detailed family and medical history, as well as a clinical evaluation. The participants are being consented broadly for research on many traits and for whole-genome sequencing. Initially, Sanger-based sequencing of 300–400 genes thought to be relevant to atherosclerosis is being performed, with the resulting data analyzed for rare, high-penetrance variants associated with specific clinical traits. The participants are also being consented to allow the contact of family members for additional studies of sequence variants to explore their potential association with specific phenotypes. Here, we present the general considerations in designing ClinSeq, preliminary results based on the generation of an initial 826 Mb of sequence data, the findings for several genes that serve as positive controls for the project, and our views about the potential implications of ClinSeq. The early experiences with ClinSeq illustrate how large-scale medical sequencing can be a practical, productive, and critical component of research in genomic medicine. PMID:19602640

  2. Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries

    SciTech Connect

    Branscomb, E.; Slezak, T.; Pae, R.; Carrano, A.V. ); Galas, D.; Waterman, M. )

    1990-01-01

    The authors present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. Such ordering projects involve a large investment of effort involving many repetitious experiments. Their primary purpose here is to provide methods of maximizing the efficiency of such efforts. To this end, they adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. The main advantages of this approach are that (1) it allows the relatively straightforward incorporation of the observed statistical properties of the data; (2) it permits the efficiency of a particular experimental method for detecting overlap to be quantitatively defined so that alternative experimental designs may be compared and optimized; and (3) it yields a direct estimate of the probability that any two library members overlap. This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called contigs.' These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome.

  3. ProCARs: Progressive Reconstruction of Ancestral Gene Orders

    PubMed Central

    2015-01-01

    Background In the context of ancestral gene order reconstruction from extant genomes, there exist two main computational approaches: rearrangement-based, and homology-based methods. The rearrangement-based methods consist in minimizing a total rearrangement distance on the branches of a species tree. The homology-based methods consist in the detection of a set of potential ancestral contiguity features, followed by the assembling of these features into Contiguous Ancestral Regions (CARs). Results In this paper, we present a new homology-based method that uses a progressive approach for both the detection and the assembling of ancestral contiguity features into CARs. The method is based on detecting a set of potential ancestral adjacencies iteratively using the current set of CARs at each step, and constructing CARs progressively using a 2-phase assembling method. Conclusion We show the usefulness of the method through a reconstruction of the boreoeutherian ancestral gene order, and a comparison with three other homology-based methods: AnGeS, InferCARs and GapAdj. The program, written in Python, and the dataset used in this paper are available at http://bioinfo.lifl.fr/procars/. PMID:26040958

  4. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  5. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    PubMed Central

    Jayapal, Karthik P; Lian, Wei; Glod, Frank; Sherman, David H; Hu, Wei-Shou

    2007-01-01

    Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Results We identified five large S. coelicolor genomic islands (larger than 25 kb) and 18 smaller islets absent in S. lividans chromosome. Many of these regions show anomalous GC bias and codon usage patterns. Six of them are in close vicinity of tRNA genes while nine are flanked with near perfect repeat sequences indicating that these are probable recent evolutionary acquisitions into S. coelicolor. Embedded within these segments are at least four DNA methylases and two probable methyl-sensing restriction endonucleases. Comparison with S. coelicolor transcriptome and proteome data revealed that some of the missing genes are active during the course of growth and differentiation in S. coelicolor. In particular, a pair of methylmalonyl CoA mutase (mcm) genes involved in polyketide precursor biosynthesis, an acyl-CoA dehydrogenase implicated in timing of actinorhodin synthesis and bldB, a developmentally significant regulator whose mutation causes complete abrogation of antibiotic synthesis belong to this category. Conclusion Our findings provide tangible hints for elucidating the genetic basis of important phenotypic differences between these two streptomycetes. Importantly, absence of certain genes in S. lividans identified here could potentially explain the relative ease of DNA transformations and the conditional lack of actinorhodin synthesis in S. lividans. PMID:17623098

  6. Genomic analysis of regulatory network dynamics reveals large topological changes

    NASA Astrophysics Data System (ADS)

    Luscombe, Nicholas M.; Madan Babu, M.; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A.; Gerstein, Mark

    2004-09-01

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here-particularly the large-scale topological changes and hub transience-will apply to other biological networks, including complex sub-systems in higher eukaryotes.

  7. Volume visualization of multiple alignment of large genomicDNA

    SciTech Connect

    Shah, Nameeta; Dillard, Scott E.; Weber, Gunther H.; Hamann, Bernd

    2005-07-25

    Genomes of hundreds of species have been sequenced to date, and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. As a result, tools using 1D representations are incapable of providing informatory overview for extremely large data sets. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We demonstrate our technique using multi-millions-basepair-long aligned DNA sequence data and compare it with traditional 1D line plots. The results show that our technique is superior in providing an overview of entire data sets. Our technique, coupled with 1D line plots, results in effective multi-resolution visualization of very large aligned sequence data sets.

  8. Genomic analysis of regulatory network dynamics reveals large topological changes.

    PubMed

    Luscombe, Nicholas M; Babu, M Madan; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A; Gerstein, Mark

    2004-09-16

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here--particularly the large-scale topological changes and hub transience--will apply to other biological networks, including complex sub-systems in higher eukaryotes.

  9. CGCI Investigators Reveal Comprehensive Landscape of Diffuse Large B-Cell Lymphoma (DLBCL) Genomes | Office of Cancer Genomics

    Cancer.gov

    Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.  

  10. Ancestral vertebrate complexity of the opioid system.

    PubMed

    Larhammar, Dan; Bergqvist, Christina; Sundström, Görel

    2015-01-01

    The evolution of the opioid peptides and nociceptin/orphanin as well as their receptors has been difficult to resolve due to variable evolutionary rates. By combining sequence comparisons with information on the chromosomal locations of the genes, we have deduced the following evolutionary scenario: The vertebrate predecessor had one opioid precursor gene and one receptor gene. The two genome doublings before the vertebrate radiation resulted in three peptide precursor genes whereupon a fourth copy arose by a local gene duplication. These four precursors diverged to become the prepropeptides for endorphin (POMC), enkephalins, dynorphins, and nociceptin, respectively. The ancestral receptor gene was quadrupled in the genome doublings leading to delta, kappa, and mu and the nociceptin/orphanin receptor. This scenario is corroborated by new data presented here for coelacanth and spotted gar, representing two basal branches in the vertebrate tree. A third genome doubling in the ancestor of teleost fishes generated additional gene copies. These results show that the opioid system was quite complex already in the first vertebrates and that it has more components in teleost fishes than in mammals. From an evolutionary point of view, nociceptin and its receptor can be considered full-fledged members of the opioid system.

  11. Antarctic krill population genomics: apparent panmixia, but genome complexity and large population size muddy the water.

    PubMed

    Deagle, Bruce E; Faux, Cassandra; Kawaguchi, So; Meyer, Bettina; Jarman, Simon N

    2015-10-01

    Antarctic krill (Euphausia superba; hereafter krill) are an incredibly abundant pelagic crustacean which has a wide, but patchy, distribution in the Southern Ocean. Several studies have examined the potential for population genetic structuring in krill, but DNA-based analyses have focused on a limited number of markers and have covered only part of their circum-Antarctic range. We used mitochondrial DNA and restriction site-associated DNA sequencing (RAD-seq) to investigate genetic differences between krill from five sites, including two from East Antarctica. Our mtDNA results show no discernible genetic structuring between sites separated by thousands of kilometres, which is consistent with previous studies. Using standard RAD-seq methodology, we obtained over a billion sequences from >140 krill, and thousands of variable nucleotides were identified at hundreds of loci. However, downstream analysis found that markers with sufficient coverage were primarily from multicopy genomic regions. Careful examination of these data highlights the complexity of the RAD-seq approach in organisms with very large genomes. To characterize the multicopy markers, we recorded sequence counts from variable nucleotide sites rather than the derived genotypes; we also examined a small number of manually curated genotypes. Although these analyses effectively fingerprinted individuals, and uncovered a minor laboratory batch effect, no population structuring was observed. Overall, our results are consistent with panmixia of krill throughout their distribution. This result may indicate ongoing gene flow. However, krill's enormous population size creates substantial panmictic inertia, so genetic differentiation may not occur on an ecologically relevant timescale even if demographically separate populations exist.

  12. The common ancestral core of vertebrate and fungal telomerase RNAs

    PubMed Central

    Qi, Xiaodong; Li, Yang; Honda, Shinji; Hoffmann, Steve; Marz, Manja; Mosig, Axel; Podlevsky, Joshua D.; Stadler, Peter F.; Selker, Eric U.; Chen, Julian J.-L.

    2013-01-01

    Telomerase is a ribonucleoprotein with an intrinsic telomerase RNA (TER) component. Within yeasts, TER is remarkably large and presents little similarity in secondary structure to vertebrate or ciliate TERs. To better understand the evolution of fungal telomerase, we identified 74 TERs from Pezizomycotina and Taphrinomycotina subphyla, sister clades to budding yeasts. We initially identified TER from Neurospora crassa using a novel deep-sequencing–based approach, and homologous TER sequences from available fungal genome databases by computational searches. Remarkably, TERs from these non-yeast fungi have many attributes in common with vertebrate TERs. Comparative phylogenetic analysis of highly conserved regions within Pezizomycotina TERs revealed two core domains nearly identical in secondary structure to the pseudoknot and CR4/5 within vertebrate TERs. We then analyzed N. crassa and Schizosaccharomyces pombe telomerase reconstituted in vitro, and showed that the two RNA core domains in both systems can reconstitute activity in trans as two separate RNA fragments. Furthermore, the primer-extension pulse-chase analysis affirmed that the reconstituted N. crassa telomerase synthesizes TTAGGG repeats with high processivity, a common attribute of vertebrate telomerase. Overall, this study reveals the common ancestral cores of vertebrate and fungal TERs, and provides insights into the molecular evolution of fungal TER structure and function. PMID:23093598

  13. Targeted Large-Scale Deletion of Bacterial Genomes Using CRISPR-Nickases.

    PubMed

    Standage-Beier, Kylie; Zhang, Qi; Wang, Xiao

    2015-11-20

    Programmable CRISPR-Cas systems have augmented our ability to produce precise genome manipulations. Here we demonstrate and characterize the ability of CRISPR-Cas derived nickases to direct targeted recombination of both small and large genomic regions flanked by repetitive elements in Escherichia coli. While CRISPR directed double-stranded DNA breaks are highly lethal in many bacteria, we show that CRISPR-guided nickase systems can be programmed to make precise, nonlethal, single-stranded incisions in targeted genomic regions. This induces recombination events and leads to targeted deletion. We demonstrate that dual-targeted nicking enables deletion of 36 and 97 Kb of the genome. Furthermore, multiplex targeting enables deletion of 133 Kb, accounting for approximately 3% of the entire E. coli genome. This technology provides a framework for methods to manipulate bacterial genomes using CRISPR-nickase systems. We envision this system working synergistically with preexisting bacterial genome engineering methods.

  14. Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing

    PubMed Central

    2013-01-01

    Background Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses. Results Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly. Conclusions We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes. PMID:23590730

  15. Decoding Synteny Blocks and Large-Scale Duplications in Mammalian and Plant Genomes

    NASA Astrophysics Data System (ADS)

    Peng, Qian; Alekseyev, Max A.; Tesler, Glenn; Pevzner, Pavel A.

    The existing synteny block reconstruction algorithms use anchors (e.g., orthologous genes) shared over all genomes to construct the synteny blocks for multiple genomes. This approach, while efficient for a few genomes, cannot be scaled to address the need to construct synteny blocks in many mammalian genomes that are currently being sequenced. The problem is that the number of anchors shared among all genomes quickly decreases with the increase in the number of genomes. Another problem is that many genomes (plant genomes in particular) had extensive duplications, which makes decoding of genomic architecture and rearrangement analysis in plants difficult. The existing synteny block generation algorithms in plants do not address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolution history of duplications. We present a new algorithm based on the A-Bruijn graph framework that overcomes these difficulties and provides a unified approach to synteny block reconstruction for multiple genomes, and for genomes with large duplications.

  16. Estimation of ancestral inbreeding effects on stillbirth, calving ease and birthweight in German Holstein dairy cattle.

    PubMed

    Hinrichs, D; Bennewitz, J; Wellmann, R; Thaller, G

    2015-02-01

    In this study, the effect of different measurements of ancestral inbreeding on birthweight, calving ease and stillbirth were analysed. Three models were used to estimate the effect of ancestral inbreeding, and the estimated regression coefficient of phenotypic data on different measurements of ancestral inbreeding was used to quantify the effect of ancestral inbreeding. The first model included only one measurement of inbreeding, whereas the second model included the classical inbreeding coefficients and one alternative inbreeding coefficient. The third model included the classical inbreeding coefficients, the interaction between classical inbreeding and ancestral inbreeding, and the classical inbreeding coefficients of the dam. Phenotypic data for this study were collected from February 1998 to December 2008 on three large commercial milk farms. During this time, 36,477 calving events were recorded. All calves were weighed after birth, and 8.08% of the calves died within 48 h after calving. Calving ease was recorded on a scale between 1 and 4 (1 = easy birth, 4 = surgery), and 69.95, 20.91, 8.92 and 0.21% of the calvings were scored with 1, 2, 3 and 4, respectively. The average inbreeding coefficient of inbred animals was 0.03, and average ancestral inbreeding coefficients were 0.08 and 0.01, depending on how ancestral inbreeding was calculated. Approximately 26% of classically non-inbred animals showed ancestral inbreeding. Correlations between different inbreeding coefficients ranged between 0.46 and 0.99. No significant effect of ancestral inbreeding was found for calving ease, because the number of animals with reasonable high level of ancestral inbreeding was too low. Significant effects of ancestral inbreeding were estimated for birthweight and stillbirth. Unfavourable effects of ancestral inbreeding were observed for birthweight. However, favourable purging effects were estimated for stillbirth, indicating that purging could be partly beneficial for genetic

  17. FastML: a web server for probabilistic reconstruction of ancestral sequences.

    PubMed

    Ashkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, Tal

    2012-07-01

    Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.il/.

  18. FastML: a web server for probabilistic reconstruction of ancestral sequences

    PubMed Central

    Ashkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, Tal

    2012-01-01

    Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.il/. PMID:22661579

  19. Shrinking genomes? Evidence from genome size variation in Crepis (Compositae).

    PubMed

    Enke, N; Fuchs, J; Gemeinholzer, B

    2011-01-01

    Large-scale surveys of genome size evolution in angiosperms show that the ancestral genome was most likely small, with a tendency towards an increase in DNA content during evolution. Due to polyploidisation and self-replicating DNA elements, angiosperm genomes were considered to have a 'one-way ticket to obesity' (Bennetzen & Kellogg 1997). New findings on how organisms can lose DNA challenged the hypotheses of unidirectional evolution of genome size. The present study is based on the classical work of Babcock (1947a) on karyotype evolution within Crepis and analyses karyotypic diversification within the genus in a phylogenetic context. Genome size of 21 Crepis species was estimated using flow cytometry. Additional data of 17 further species were taken from the literature. Within 30 diploid Crepis species there is a striking trend towards genome contraction. The direction of genome size evolution was analysed by reconstructing ancestral character states on a molecular phylogeny based on ITS sequence data. DNA content is correlated to distributional aspects as well as life form. Genome size is significantly higher in perennials than in annuals. Within sampled species, very small genomes are only present in Mediterranean or European species, whereas their Central and East Asian relatives have larger 1C values.

  20. Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.

  1. GEnomes Management Application (GEM.app): A new software tool for large-scale collaborative genome analysis

    PubMed Central

    Gonzalez, Michael A.; Acosta Lebrigio, Rafael F.; Van Booven, Derek; Ulloa, Rick H.; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schule, Rebecca; Zuchner, Stephan

    2015-01-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ~1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for non-bioinformaticians to make NGS data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 seconds across ~1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. PMID:23463597

  2. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment.

    PubMed

    Kim, Jonghwan; Bhinge, Akshay A; Morgan, Xochitl C; Iyer, Vishwanath R

    2005-01-01

    Identifying the chromosomal targets of transcription factors is important for reconstructing the transcriptional regulatory networks underlying global gene expression programs. We have developed an unbiased genomic method called sequence tag analysis of genomic enrichment (STAGE) to identify the direct binding targets of transcription factors in vivo. STAGE is based on high-throughput sequencing of concatemerized tags derived from target DNA enriched by chromatin immunoprecipitation. We first used STAGE in yeast to confirm that RNA polymerase III genes are the most prominent targets of the TATA-box binding protein. We optimized the STAGE protocol and developed analysis methods to allow the identification of transcription factor targets in human cells. We used STAGE to identify several previously unknown binding targets of human transcription factor E2F4 that we independently validated by promoter-specific PCR and microarray hybridization. STAGE provides a means of identifying the chromosomal targets of DNA-associated proteins in any sequenced genome.

  3. Ancestral Chromatin Configuration Constrains Chromatin Evolution on Differentiating Sex Chromosomes in Drosophila.

    PubMed

    Zhou, Qi; Bachtrog, Doris

    2015-06-01

    Sex chromosomes evolve distinctive types of chromatin from a pair of ancestral autosomes that are usually euchromatic. In Drosophila, the dosage-compensated X becomes enriched for hyperactive chromatin in males (mediated by H4K16ac), while the Y chromosome acquires silencing heterochromatin (enriched for H3K9me2/3). Drosophila autosomes are typically mostly euchromatic but the small dot chromosome has evolved a heterochromatin-like milieu (enriched for H3K9me2/3) that permits the normal expression of dot-linked genes, but which is different from typical pericentric heterochromatin. In Drosophila busckii, the dot chromosomes have fused to the ancestral sex chromosomes, creating a pair of 'neo-sex' chromosomes. Here we collect genomic, transcriptomic and epigenomic data from D. busckii, to investigate the evolutionary trajectory of sex chromosomes from a largely heterochromatic ancestor. We show that the neo-sex chromosomes formed <1 million years ago, but nearly 60% of neo-Y linked genes have already become non-functional. Expression levels are generally lower for the neo-Y alleles relative to their neo-X homologs, and the silencing heterochromatin mark H3K9me2, but not H3K9me3, is significantly enriched on silenced neo-Y genes. Despite rampant neo-Y degeneration, we find that the neo-X is deficient for the canonical histone modification mark of dosage compensation (H4K16ac), relative to autosomes or the compensated ancestral X chromosome, possibly reflecting constraints imposed on evolving hyperactive chromatin in an originally heterochromatic environment. Yet, neo-X genes are transcriptionally more active in males, relative to females, suggesting the evolution of incipient dosage compensation on the neo-X. Our data show that Y degeneration proceeds quickly after sex chromosomes become established through genomic and epigenetic changes, and are consistent with the idea that the evolution of sex-linked chromatin is influenced by its ancestral configuration.

  4. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE PAGES

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; ...

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  5. BactoGeNIE: A large-scale comparative genome visualization for big displays

    SciTech Connect

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.

  6. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-04-28

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.

  7. Engineering large viral DNA genomes using the CRISPR-Cas9 system.

    PubMed

    Suenaga, Tadahiro; Kohyama, Masako; Hirayasu, Kouyuki; Arase, Hisashi

    2014-09-01

    Manipulation of viral genomes is essential for studying viral gene function and utilizing viruses for therapy. Several techniques for viral genome engineering have been developed. Homologous recombination in virus-infected cells has traditionally been used to edit viral genomes; however, the frequency of the expected recombination is quite low. Alternatively, large viral genomes have been edited using a bacterial artificial chromosome (BAC) plasmid system. However, cloning of large viral genomes into BAC plasmids is both laborious and time-consuming. In addition, because it is possible for insertion into the viral genome of drug selection markers or parts of BAC plasmids to affect viral function, artificial genes sometimes need to be removed from edited viruses. Herpes simplex virus (HSV), a common DNA virus with a genome length of 152 kbp, causes labialis, genital herpes and encephalitis. Mutant HSV is a candidate for oncotherapy, in which HSV is used to kill tumor cells. In this study, the clustered regularly interspaced short palindromic repeat-Cas9 system was used to very efficiently engineer HSV without inserting artificial genes into viral genomes. Not only gene-ablated HSV but also gene knock-in HSV were generated using this method. Furthermore, selection with phenotypes of edited genes promotes the isolation efficiencies of expectedly mutated viral clones. Because our method can be applied to other DNA viruses such as Epstein-Barr virus, cytomegaloviruses, vaccinia virus and baculovirus, our system will be useful for studying various types of viruses, including clinical isolates.

  8. Genome evolution in Reptilia, the sister group of mammals.

    PubMed

    Janes, Daniel E; Organ, Christopher L; Fujita, Matthew K; Shedlock, Andrew M; Edwards, Scott V

    2010-01-01

    The genomes of birds and nonavian reptiles (Reptilia) are critical for understanding genome evolution in mammals and amniotes generally. Despite decades of study at the chromosomal and single-gene levels, and the evidence for great diversity in genome size, karyotype, and sex chromosome diversity, reptile genomes are virtually unknown in the comparative genomics era. The recent sequencing of the chicken and zebra finch genomes, in conjunction with genome scans and the online publication of the Anolis lizard genome, has begun to clarify the events leading from an ancestral amniote genome--predicted to be large and to possess a diverse repeat landscape on par with mammals and a birdlike sex chromosome system--to the small and highly streamlined genomes of birds. Reptilia exhibit a wide range of evolutionary rates of different subgenomes and, from isochores to mitochondrial DNA, provide a critical contrast to the genomic paradigms established in mammals.

  9. Recreating a functional ancestral archosaur visual pigment.

    PubMed

    Chang, Belinda S W; Jönsson, Karolina; Kazmi, Manija A; Donoghue, Michael J; Sakmar, Thomas P

    2002-09-01

    The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods.

  10. The draft genome of the large yellow croaker reveals well-developed innate immunity

    PubMed Central

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-01-01

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies. PMID:25407894

  11. Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

    PubMed Central

    Biji, Christopher Leela; Madhu, Manu K; Vishnu, Vineetha; K, Satheesh Kumar; Vijayakumar; Nair, Achuthsankar S

    2015-01-01

    The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD. Availability The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/ PMID:26124572

  12. Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...

  13. Feasibility of Large-Scale Genomic Testing to Facilitate Enrollment Onto Genomically Matched Clinical Trials

    PubMed Central

    Meric-Bernstam, Funda; Brusco, Lauren; Shaw, Kenna; Horombe, Chacha; Kopetz, Scott; Davies, Michael A.; Routbort, Mark; Piha-Paul, Sarina A.; Janku, Filip; Ueno, Naoto; Hong, David; De Groot, John; Ravi, Vinod; Li, Yisheng; Luthra, Raja; Patel, Keyur; Broaddus, Russell; Mendelsohn, John; Mills, Gordon B.

    2015-01-01

    Purpose We report the experience with 2,000 consecutive patients with advanced cancer who underwent testing on a genomic testing protocol, including the frequency of actionable alterations across tumor types, subsequent enrollment onto clinical trials, and the challenges for trial enrollment. Patients and Methods Standardized hotspot mutation analysis was performed in 2,000 patients, using either an 11-gene (251 patients) or a 46- or 50-gene (1,749 patients) multiplex platform. Thirty-five genes were considered potentially actionable based on their potential to be targeted with approved or investigational therapies. Results Seven hundred eighty-nine patients (39%) had at least one mutation in potentially actionable genes. Eighty-three patients (11%) with potentially actionable mutations went on genotype-matched trials targeting these alterations. Of 230 patients with PIK3CA/AKT1/PTEN/BRAF mutations that returned for therapy, 116 (50%) received a genotype-matched drug. Forty patients (17%) were treated on a genotype-selected trial requiring a mutation for eligibility, 16 (7%) were treated on a genotype-relevant trial targeting a genomic alteration without biomarker selection, and 40 (17%) received a genotype-relevant drug off trial. Challenges to trial accrual included patient preference of noninvestigational treatment or local treatment, poor performance status or other reasons for trial ineligibility, lack of trials/slots, and insurance denial. Conclusion Broad implementation of multiplex hotspot testing is feasible; however, only a small portion of patients with actionable alterations were actually enrolled onto genotype-matched trials. Increased awareness of therapeutic implications and access to novel therapeutics are needed to optimally leverage results from broad-based genomic testing. PMID:26014291

  14. Array-comparative genomic hybridization profiling of immunohistochemical subgroups of diffuse large B-cell lymphoma shows distinct genomic alterations

    PubMed Central

    Guo, Ying; Takeuchi, Ichiro; Karnan, Sivasundaram; Miyata, Tomoko; Ohshima, Koichi; Seto, Masao

    2014-01-01

    Diffuse large B-cell lymphoma (DLBCL) displays striking heterogeneity at the clinical, genetic and molecular levels. Subtypes include germinal center B-cell-like (GCB) DLBCL and activated B-cell-like (ABC) DLBCL, according to microarray analysis, and germinal center type or non-germinal center type by immunohistochemistry. Although some reports have described genomic aberrations based upon microarray classification system, genomic aberrations based upon immunohistochemical classifications have rarely been reported. The present study aimed to ascertain the relationship between genomic aberrations and subtypes identified by immunohistochemistry, and to study the pathogenetic character of Chinese DLBCL. We conducted immunohistochemistry using antibodies against CD10, BCL6 and MUM1 in 59 samples of DLBCL from Chinese patients, and then performed microarray-based comparative genomic hybridization for each case. Characteristic genomic differences were found between GCB and non-GCB DLBCL from the array data. The GCB type was characterized by more gains at 7q (7q22.1, P < 0.05) and losses at 16q (P ≤ 0.05), while the non-GCB type was characterized by gains at 11q24.3 and 3q13.2 (P < 0.05). We found completely different mutations in BCL6+ and BCL6− non-GCB type DLBCL, whereby the BCL6− group had a higher number of gains at 1q and a loss at 14q32.13 (P ≤ 0.005), while the BCL6+ group showed a higher number of gains at 14q23.1 (P = 0.15) and losses at 6q (P = 0.07). The BCL6− group had a higher frequency of genomic imbalances compared to the BCL6+ group. In conclusion, the BCL6+ and BCL6− non-GCB type of DLBCL appear to have different mechanisms of pathogenesis. PMID:24843885

  15. Large-Scale Comparative Genomics Meta-Analysis of Campylobacter jejuni Isolates Reveals Low Level of Genome Plasticity

    PubMed Central

    Taboada, Eduardo N.; Acedillo, Rey R.; Carrillo, Catherine D.; Findlay, Wendy A.; Medeiros, Diane T.; Mykytczuk, Oksana L.; Roberts, Michael J.; Valencia, C. Alexander; Farber, Jeffrey M.; Nash, John H. E.

    2004-01-01

    We have used comparative genomic hybridization (CGH) on a full-genome Campylobacter jejuni microarray to examine genome-wide gene conservation patterns among 51 strains isolated from food and clinical sources. These data have been integrated with data from three previous C. jejuni CGH studies to perform a meta-analysis that included 97 strains from the four separate data sets. Although many genes were found to be divergent across multiple strains (n = 350), many genes (n = 249) were uniquely variable in single strains. Thus, the strains in each data set comprise strains with a unique genetic diversity not found in the strains in the other data sets. Despite the large increase in the collective number of variable C. jejuni genes (n = 599) found in the meta-analysis data set, nearly half of these (n = 276) mapped to previously defined variable loci, and it therefore appears that large regions of the C. jejuni genome are genetically stable. A detailed analysis of the microarray data revealed that divergent genes could be differentiated on the basis of the amplitudes of their differential microarray signals. Of 599 variable genes, 122 could be classified as highly divergent on the basis of CGH data. Nearly all highly divergent genes (117 of 122) had divergent neighbors and showed high levels of intraspecies variability. The approach outlined here has enabled us to distinguish global trends of gene conservation in C. jejuni and has enabled us to define this group of genes as a robust set of variable markers that can become the cornerstone of a new generation of genotyping methods that use genome-wide C. jejuni gene variability data. PMID:15472310

  16. Discovery of STL polyomavirus, a polyomavirus of ancestral recombinant origin that encodes a unique T antigen by alternative splicing.

    PubMed

    Lim, Efrem S; Reyes, Alejandro; Antonio, Martin; Saha, Debasish; Ikumapayi, Usman N; Adeyemi, Mitchell; Stine, O Colin; Skelton, Rebecca; Brennan, Daniel C; Mkakosya, Rajhab S; Manary, Mark J; Gordon, Jeffrey I; Wang, David

    2013-02-20

    The family Polyomaviridae is comprised of circular double-stranded DNA viruses, several of which are associated with diseases, including cancer, in immunocompromised patients. Here we describe a novel polyomavirus recovered from the fecal microbiota of a child in Malawi, provisionally named STL polyomavirus (STLPyV). We detected STLPyV in clinical stool specimens from USA and The Gambia at up to 1% frequency. Complete genome comparisons of two STLPyV strains demonstrated 5.2% nucleotide divergence. Alternative splicing of the STLPyV early region yielded a unique form of T antigen, which we named 229T, in addition to the expected large and small T antigens. STLPyV has a mosaic genome and shares an ancestral recombinant origin with MWPyV. The discovery of STLPyV highlights a novel alternative splicing strategy and advances our understanding of the complex evolutionary history of polyomaviruses.

  17. Chloroplast genomes of two conifers lack a large inverted repeat and are extensively rearranged.

    PubMed Central

    Strauss, S H; Palmer, J D; Howe, G T; Doerksen, A H

    1988-01-01

    Chloroplast genomes of Douglas-fir [Pseudotsuga menziesii (Mirb.) Franco] and radiata (Monterey) pine [Pinus radiata D. Don], two conifers from the widespread Pinaceae, were mapped and their genomes were compared to other land plants. Douglas-fir and radiata pine lack the large (20-25 kilobases) inverted repeat that characterizes most land plants. To our knowledge, this is only the second recorded loss of this ancient and highly conserved inverted repeat among all lineages of land plants thus far examined. Loss of the repeat largely accounts for the small size of the conifer genome, 120 kilobase, versus 140-160 kilobases in most land plants. Douglas-fir possesses a major inversion of 40-50 kilobases relative to radiata pine and nonconiferous plants. Nucleotide sequence differentiation between Douglas-fir and radiata pine was estimated to be 3.8%. Both conifer genomes possess a number of rearrangements relative to Osmunda, a fern, Ginkgo, a gymnosperm, and Petunia, an angiosperm. Among land plants, structural changes of this degree have occurred primarily within tribes of the legume family (Fabaceae) that have also lost the inverted repeat. These results support the hypothesis that the presence of the large inverted repeat stabilizes the chloroplast genome against major structural rearrangements. PMID:2836862

  18. Inference of Ancestral Recombination Graphs through Topological Data Analysis

    PubMed Central

    Cámara, Pablo G.; Levine, Arnold J.; Rabadán, Raúl

    2016-01-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  19. Whole genome molecular phylogeny of large dsDNA viruses using composition vector method

    PubMed Central

    Gao, Lei; Qi, Ji

    2007-01-01

    Background One important mechanism by which large DNA viruses increase their genome size is the addition of modules acquired from other viruses, host genomes or gene duplications. Phylogenetic analysis of large DNA viruses, especially using methods based on alignment, is often difficult due to the presence of horizontal gene transfer events. The recent composition vector approach, not sensitive to such events, is applied here to reconstruct the phylogeny of 124 large DNA viruses. Results The results are mostly consistent with the biologist's systematics with only a few outliers and can also provide some information for those unclassified viruses and cladistic relationships of several families. Conclusion With composition vector approach we obtained the phylogenetic tree of large DNA viruses, which not only give results comparable to biologist's systematics but also provide a new way for recovering the phylogeny of viruses. PMID:17359548

  20. Software engineering the mixed model for genome-wide association studies on large samples

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...

  1. Physical mapping resources for large plant genomes: radiation hybrids for wheat D-genome progenitor Aegilops tauschii

    PubMed Central

    2012-01-01

    Background Development of a high quality reference sequence is a daunting task in crops like wheat with large (~17Gb), highly repetitive (>80%) and polyploid genome. To achieve complete sequence assembly of such genomes, development of a high quality physical map is a necessary first step. However, due to the lack of recombination in certain regions of the chromosomes, genetic mapping, which uses recombination frequency to map marker loci, alone is not sufficient to develop high quality marker scaffolds for a sequence ready physical map. Radiation hybrid (RH) mapping, which uses radiation induced chromosomal breaks, has proven to be a successful approach for developing marker scaffolds for sequence assembly in animal systems. Here, the development and characterization of a RH panel for the mapping of D-genome of wheat progenitor Aegilops tauschii is reported. Results Radiation dosages of 350 and 450 Gy were optimized for seed irradiation of a synthetic hexaploid (AABBDD) wheat with the D-genome of Ae. tauschii accession AL8/78. The surviving plants after irradiation were crossed to durum wheat (AABB), to produce pentaploid RH1s (AABBD), which allows the simultaneous mapping of the whole D-genome. A panel of 1,510 RH1 plants was obtained, of which 592 plants were generated from the mature RH1 seeds, and 918 plants were rescued through embryo culture due to poor germination (<3%) of mature RH1 seeds. This panel showed a homogenous marker loss (2.1%) after screening with SSR markers uniformly covering all the D-genome chromosomes. Different marker systems mostly detected different lines with deletions. Using markers covering known distances, the mapping resolution of this RH panel was estimated to be <140kb. Analysis of only 16 RH lines carrying deletions on chromosome 2D resulted in a physical map with cM/cR ratio of 1:5.2 and 15 distinct bins. Additionally, with this small set of lines, almost all the tested ESTs could be mapped. A set of 399 most informative RH

  2. Identification and analysis of genomic regions with large between-population differentiation in humans.

    PubMed

    Myles, S; Tang, K; Somel, M; Green, R E; Kelso, J; Stoneking, M

    2008-01-01

    The primary aim of genetic association and linkage studies is to identify genetic variants that contribute to phenotypic variation within human populations. Since the overwhelming majority of human genetic variation is found within populations, these methods are expected to be effective and can likely be extrapolated from one human population to another. However, they may lack power in detecting the genetic variants that contribute to phenotypes that differ greatly between human populations. Phenotypes that show large differences between populations are expected to be associated with genomic regions exhibiting large allele frequency differences between populations. Thus, from genome-wide polymorphism data genomic regions with large allele frequency differences between populations can be identified, and evaluated as candidates for large between-population phenotypic differences. Here we use allele frequency data from approximately 1.5 million SNPs from three human populations, and present an algorithm that identifies genomic regions containing SNPs with extreme Fst. We demonstrate that our candidate regions have reduced heterozygosity in Europeans and Chinese relative to African-Americans, and are likely enriched with genes that have experienced positive natural selection. We identify genes that are likely responsible for phenotypes known to differ dramatically between human populations and present several candidates worthy of future investigation. Our list of high Fst genomic regions is a first step in identifying the genetic variants that contribute to large phenotypic differences between populations, many of which have likely experienced positive natural selection. Our approach based on between population differences can compliment traditional within population linkage and association studies to uncover novel genotype-phenotype relationships.

  3. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

    PubMed Central

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952

  4. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

    PubMed Central

    2013-01-01

    Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of

  5. Final report. Human artificial episomal chromosome (HAEC) for building large genomic libraries

    SciTech Connect

    Jean-Michael H. Vos

    1999-12-09

    Collections of human DNA fragments are maintained for research purposes as clones in bacterial host cells. However for unknown reasons, some regions of the human genome appear to be unclonable or unstable in bacteria. Their team has developed a system using episomes (extrachromosomal, autonomously replication DNA) that maintains large DNA fragments in human cells. This human artificial episomal chromosomal (HAEC) system may prove useful for coverage of these especially difficult regions. In the broader biomedical community, the HAEC system also shows promise for use in functional genomics and gene therapy. Recent improvements to the HAEC system and its application to mapping, sequencing, and functionally studying human and mouse DNA are summarized. Mapping and sequencing the human genome and model organisms are only the first steps in determining the function of various genetic units critical for gene regulation, DNA replication, chromatin packaging, chromosomal stability, and chromatid segregation. Such studies will require the ability to transfer and manipulate entire functional units into mammalian cells.

  6. Biological Consequences of Ancient Gene Acquisition and Duplication in the Large Genome of Candidatus Solibacter usitatus Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Hauser, Loren John; Land, Miriam L; Xie, Gary; Kuske, Cheryl R

    2011-01-01

    Members of the bacterial phylum Acidobacteria are widespread in soils and sediments worldwide, and are abundant in many soils. Acidobacteria are challenging to culture in vitro, and many basic features of their biology and functional roles in the soil have not been determined. Candidatus Solibacter usitatus strain Ellin6076 has a 9.9 Mb genome that is approximately 2 5 times as large as the other sequenced Acidobacteria genomes. Bacterial genome sizes typically range from 0.5 to 10 Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Our comparative genome analyses indicate that the Ellin6076 large genome has arisen by horizontal gene transfer via ancient bacteriophage and/or plasmid-mediated transduction, and widespread small-scale gene duplications, resulting in an increased number of paralogs. Low amino acid sequence identities among functional group members, and lack of conserved gene order and orientation in regions containing similar groups of paralogs, suggest that most of the paralogs are not the result of recent duplication events. The genome sizes of additional cultured Acidobacteria strains were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 3 had larger genomes than those of subdivision 1, but none were as large as the Ellin6076 genome. The large genome of Ellin6076 may not be typical of the phylum, and encodes traits that could provide a selective metabolic, defensive and regulatory advantage in the soil environment.

  7. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    SciTech Connect

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  8. Large-scale analysis of the yeast genome by transposon tagging and gene disruption.

    PubMed

    Ross-Macdonald, P; Coelho, P S; Roemer, T; Agarwal, S; Kumar, A; Jansen, R; Cheung, K H; Sheehan, A; Symoniatis, D; Umansky, L; Heidtman, M; Nelson, F K; Iwasaki, H; Hager, K; Gerstein, M; Miller, P; Roeder, G S; Snyder, M

    1999-11-25

    Economical methods by which gene function may be analysed on a genomic scale are relatively scarce. To fill this need, we have developed a transposon-tagging strategy for the genome-wide analysis of disruption phenotypes, gene expression and protein localization, and have applied this method to the large-scale analysis of gene function in the budding yeast Saccharomyces cerevisiae. Here we present the largest collection of defined yeast mutants ever generated within a single genetic background--a collection of over 11,000 strains, each carrying a transposon inserted within a region of the genome expressed during vegetative growth and/or sporulation. These insertions affect nearly 2,000 annotated genes, representing about one-third of the 6,200 predicted genes in the yeast genome. We have used this collection to determine disruption phenotypes for nearly 8,000 strains using 20 different growth conditions; the resulting data sets were clustered to identify groups of functionally related genes. We have also identified over 300 previously non-annotated open reading frames and analysed by indirect immunofluorescence over 1,300 transposon-tagged proteins. In total, our study encompasses over 260,000 data points, constituting the largest functional analysis of the yeast genome ever undertaken.

  9. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Shih, Patrick [Kerfeld Lab, UC Berkeley and JGI

    2016-07-12

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  10. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

    PubMed Central

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852

  11. Large genomic deletions inactivate the BRCA2 gene in breast cancer families

    PubMed Central

    Agata, S; Dalla, P; Callegaro, M; Scaini, M; Menin, C; Ghiotto, C; Nicoletto, O; Zavagno, G; Chieco-Bianchi, L; D'Andrea, E; Montagna, M

    2005-01-01

    Background: BRCA1 and BRCA2 are the two major genes responsible for the breast and ovarian cancers that cluster in families with a genetically determined predisposition. However, regardless of the mutation detection method employed, the percentage of families without identifiable alterations of these genes exceeds 50%, even when applying stringent criteria for family selection. A small but significant increase in mutation detection rate has resulted from the discovery of large genomic alterations in BRCA1. A few studies have addressed the question of whether BRCA2 might be inactivated by the same kinds of alteration, but most were either done on a relatively small number of samples or employed cumbersome mutation detection methods of variable sensitivity. Objective: To analyse 121 highly selected families using the recently available BRCA2 multiplex ligation dependent probe amplification (MLPA) technique. Results: Three different large genomic deletions were identified and confirmed by analysis of the mutant transcript and genomic characterisation of the breakpoints. Conclusions: Contrary to initial suggestions, the presence of BRCA2 genomic rearrangements is worth investigating in high risk breast or ovarian cancer families. PMID:16199546

  12. A rare example of germ-line chromothripsis resulting in large genomic imbalance.

    PubMed

    Anderson, Sarah E; Kamath, Arveen; Pilz, Daniela T; Morgan, Sian M

    2016-04-01

    Chromothripsis is a recently described 'chromosome catastrophe' phenomenon in which multiple genomic rearrangements are generated in a single catastrophic event. Chromothripsis has most frequently been associated with cancer, but there have also been rare reports of chromothripsis in patients with developmental disorders and congenital anomalies. In contrast to the massive DNA loss that often accompanies chromothripsis in cancer, only minimal DNA loss has been reported in the majority of cases of chromothripsis that have occurred in the germ line. Presumably, this is because in most instances, large genomic losses would be lethal in utero. We report on a female patient with developmental delay and dysmorphism. G-banded chromosome analysis detected a subtle, interstitial deletion of chromosome 13 and a complex rearrangement of one X chromosome. Subsequent array comparative genomic hybridisation studies indicated nine deletions on the X chromosome ranging from 327 kb to 8 Mb in size. A 4.4 Mb deletion on chromosome 13 was also confirmed, compatible with the patient's clinical phenotype. We propose that this is a rare example of constitutional chromothripsis in association with relatively large genomic imbalances and that these have been tolerated in this case as they have occurred in a female on the X chromosome, which has undergone preferential X inactivation.

  13. Patterns and mechanisms of ancestral histone protein inheritance in budding yeast.

    PubMed

    Radman-Livaja, Marta; Verzijlbergen, Kitty F; Weiner, Assaf; van Welsem, Tibor; Friedman, Nir; Rando, Oliver J; van Leeuwen, Fred

    2011-06-01

    Replicating chromatin involves disruption of histone-DNA contacts and subsequent reassembly of maternal histones on the new daughter genomes. In bulk, maternal histones are randomly segregated to the two daughters, but little is known about the fine details of this process: do maternal histones re-assemble at preferred locations or close to their original loci? Here, we use a recently developed method for swapping epitope tags to measure the disposition of ancestral histone H3 across the yeast genome over six generations. We find that ancestral H3 is preferentially retained at the 5' ends of most genes, with strongest retention at long, poorly transcribed genes. We recapitulate these observations with a quantitative model in which the majority of maternal histones are reincorporated within 400 bp of their pre-replication locus during replication, with replication-independent replacement and transcription-related retrograde nucleosome movement shaping the resulting distributions of ancestral histones. We find a key role for Topoisomerase I in retrograde histone movement during transcription, and we find that loss of Chromatin Assembly Factor-1 affects replication-independent turnover. Together, these results show that specific loci are enriched for histone proteins first synthesized several generations beforehand, and that maternal histones re-associate close to their original locations on daughter genomes after replication. Our findings further suggest that accumulation of ancestral histones could play a role in shaping histone modification patterns.

  14. Ancestral origins of the prion protein gene D178N mutation in the Basque Country.

    PubMed

    Rodríguez-Martínez, Ana B; Barreau, Christian; Coupry, Isabelle; Yagüe, Jordi; Sánchez-Valle, Raquel; Galdós-Alcelay, Luis; Ibáñez, Agustín; Digón, Antón; Fernández-Manchola, Ignacio; Goizet, Cyril; Castro, Azucena; Cuevas, Nerea; Alvarez-Alvarez, Maite; de Pancorbo, Marian M; Arveiler, Benoît; Zarranz, Juan J

    2005-06-01

    Fatal familial insomnia (FFI) and familial Creutzfeldt-Jakob disease (fCJD) are familial prion diseases with autosomal dominant inheritance of the D178N mutation. FFI has been reported in at least 27 pedigrees around the world. Twelve apparently unrelated FFI and fCJD pedigrees with the characteristic D178N mutation have been reported in the Prion Diseases Registry of the Basque Country since 1993. The high incidence of familial prion diseases in this region may reflect a unique ancestral origin of the chromosome carrying this mutation. In order to investigate this putative founder effect, we developed "happy typing", a new approach to the happy mapping method, which consists of the physical isolation of large haploid genomic DNA fragments and their analysis by the Polymerase Chain Reaction in order to perform haplotypic analysis instead of pedigree analysis. Six novel microsatellite markers, located in a 150-kb genomic segment flanking the PRNP gene were characterized for typing haploid DNA fragments of 285 kb in size. A common haplotype was found in patients from the Basque region, strongly suggesting a founder effect. We propose that "happy typing" constitutes an efficient method for determining disease-associated haplotypes, since the analysis of a single affected individual per pedigree should provide sufficient evidence.

  15. Molecular analysis of small grain cereal genomes: Current status and prospects

    SciTech Connect

    Moore, G.; Gale, M.D. ); Flavell, R.B. ); Kurata, N. )

    1993-05-01

    Recent developments in cereal genome analysis include generation of RFLP maps, flow sorting of chromosomes, identification of landmareks for genes and a more advanced model for cereal genome organization. These developments ar reviewed together with new prospects for the isolation of defined genes from large cereal genomes and for the production of a composite map of the ancestral grass genome to aid in the genetic analysis of all the Gramineae. The advances that can now come from comparative genome mapping are likely to promote further the new era of plant genetics. 65 refs., 2 figs.

  16. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism.

    PubMed

    Warren, René L; Keeling, Christopher I; Yuen, Macaire Man Saint; Raymond, Anthony; Taylor, Greg A; Vandervalk, Benjamin P; Mohamadi, Hamid; Paulino, Daniel; Chiu, Readman; Jackman, Shaun D; Robertson, Gordon; Yang, Chen; Boyle, Brian; Hoffmann, Margarete; Weigel, Detlef; Nelson, David R; Ritland, Carol; Isabel, Nathalie; Jaquish, Barry; Yanchuk, Alvin; Bousquet, Jean; Jones, Steven J M; MacKay, John; Birol, Inanc; Bohlmann, Joerg

    2015-07-01

    White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.

  17. DNA from Dust: Comparative Genomics of Large DNA Viruses in Field Surveillance Samples.

    PubMed

    Pandey, Utsav; Bell, Andrew S; Renner, Daniel W; Kennedy, David A; Shreve, Jacob T; Cairns, Chris L; Jones, Matthew J; Dunn, Patricia A; Read, Andrew F; Szpara, Moriah L

    2016-01-01

    The intensification of the poultry industry over the last 60 years facilitated the evolution of increased virulence and vaccine breaks in Marek's disease virus (MDV-1). Full-genome sequences are essential for understanding why and how this evolution occurred, but what is known about genome-wide variation in MDV comes from laboratory culture. To rectify this, we developed methods for obtaining high-quality genome sequences directly from field samples without the need for sequence-based enrichment strategies prior to sequencing. We applied this to the first characterization of MDV-1 genomes from the field, without prior culture. These viruses were collected from vaccinated hosts that acquired naturally circulating field strains of MDV-1, in the absence of a disease outbreak. This reflects the current issue afflicting the poultry industry, where virulent field strains continue to circulate despite vaccination and can remain undetected due to the lack of overt disease symptoms. We found that viral genomes from adjacent field sites had high levels of overall DNA identity, and despite strong evidence of purifying selection, had coding variations in proteins associated with virulence and manipulation of host immunity. Our methods empower ecological field surveillance, make it possible to determine the basis of viral virulence and vaccine breaks, and can be used to obtain full genomes from clinical samples of other large DNA viruses, known and unknown. IMPORTANCE Despite both clinical and laboratory data that show increased virulence in field isolates of MDV-1 over the last half century, we do not yet understand the genetic basis of its pathogenicity. Our knowledge of genome-wide variation between strains of this virus comes exclusively from isolates that have been cultured in the laboratory. MDV-1 isolates tend to lose virulence during repeated cycles of replication in the laboratory, raising concerns about the ability of cultured isolates to accurately reflect virus in

  18. DNA from Dust: Comparative Genomics of Large DNA Viruses in Field Surveillance Samples

    PubMed Central

    Pandey, Utsav; Bell, Andrew S.; Renner, Daniel W.; Kennedy, David A.; Shreve, Jacob T.; Cairns, Chris L.; Jones, Matthew J.; Dunn, Patricia A.; Read, Andrew F.

    2016-01-01

    ABSTRACT The intensification of the poultry industry over the last 60 years facilitated the evolution of increased virulence and vaccine breaks in Marek’s disease virus (MDV-1). Full-genome sequences are essential for understanding why and how this evolution occurred, but what is known about genome-wide variation in MDV comes from laboratory culture. To rectify this, we developed methods for obtaining high-quality genome sequences directly from field samples without the need for sequence-based enrichment strategies prior to sequencing. We applied this to the first characterization of MDV-1 genomes from the field, without prior culture. These viruses were collected from vaccinated hosts that acquired naturally circulating field strains of MDV-1, in the absence of a disease outbreak. This reflects the current issue afflicting the poultry industry, where virulent field strains continue to circulate despite vaccination and can remain undetected due to the lack of overt disease symptoms. We found that viral genomes from adjacent field sites had high levels of overall DNA identity, and despite strong evidence of purifying selection, had coding variations in proteins associated with virulence and manipulation of host immunity. Our methods empower ecological field surveillance, make it possible to determine the basis of viral virulence and vaccine breaks, and can be used to obtain full genomes from clinical samples of other large DNA viruses, known and unknown. IMPORTANCE Despite both clinical and laboratory data that show increased virulence in field isolates of MDV-1 over the last half century, we do not yet understand the genetic basis of its pathogenicity. Our knowledge of genome-wide variation between strains of this virus comes exclusively from isolates that have been cultured in the laboratory. MDV-1 isolates tend to lose virulence during repeated cycles of replication in the laboratory, raising concerns about the ability of cultured isolates to accurately

  19. Comparative Genomics of Amphibian-like Ranaviruses, Nucleocytoplasmic Large DNA Viruses of Poikilotherms

    PubMed Central

    Price, Stephen J.

    2015-01-01

    Recent research on genome evolution of large DNA viruses has highlighted a number of incredibly dynamic processes that can facilitate rapid adaptation. The genomes of amphibian-like ranaviruses – double-stranded DNA viruses infecting amphibians, reptiles, and fish (family Iridoviridae) – were examined to assess variation in genome content and evolutionary processes. The viruses studied were closely related, but their genome content varied considerably, with 29 genes identified that were not present in all of the major clades. Twenty-one genes had evidence of recombination, while a virus isolated from a captive reptile appeared to be a mosaic of two divergent parents. Positive selection was also found to be acting on more than a quarter of Ranavirus genes and was found most frequently in the Spanish common midwife toad virus, which has had a severe impact on amphibian host communities. Efforts to resolve the root of this group by inclusion of an outgroup were inconclusive, but a set of core genes were identified, which recovered a well-supported species tree. PMID:27812275

  20. A Glimpse of Nucleo-Cytoplasmic Large DNA Virus Biodiversity through the Eukaryotic Genomics Window

    PubMed Central

    Gallot-Lavallée, Lucie; Blanc, Guillaume

    2017-01-01

    The nucleocytoplasmic large DNA viruses (NCLDV) are a group of extremely complex double-stranded DNA viruses, which are major parasites of a variety of eukaryotes. Recent studies showed that certain eukaryotes contain fragments of NCLDV DNA integrated in their genome, when surprisingly many of these organisms were not previously shown to be infected by NCLDVs. We performed an update survey of NCLDV genes hidden in eukaryotic sequences to measure the incidence of this phenomenon in common public sequence databases. A total of 66 eukaryotic genomic or transcriptomic datasets—many of which are from algae and aquatic protists—contained at least one of the five most consistently conserved NCLDV core genes. Phylogenetic study of the eukaryotic NCLDV-like sequences identified putative new members of already recognized viral families, as well as members of as yet unknown viral clades. Genomic evidence suggested that most of these sequences resulted from viral DNA integrations rather than contaminating viruses. Furthermore, the nature of the inserted viral genes helped predicting original functional capacities of the donor viruses. These insights confirm that genomic insertions of NCLDV DNA are common in eukaryotes and can be exploited to delineate the contours of NCLDV biodiversity. PMID:28117696

  1. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

    PubMed

    Gao, Song; Bertrand, Denis; Chia, Burton K H; Nagarajan, Niranjan

    2016-05-11

    The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies.

  2. The ancestral gene repertoire of animal stem cells.

    PubMed

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-12-22

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the "germ-line multipotency program" and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells.

  3. The ancestral gene repertoire of animal stem cells

    PubMed Central

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-01-01

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the “germ-line multipotency program” and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  4. In Silico Prediction of Scaffold/Matrix Attachment Regions in Large Genomic Sequences

    PubMed Central

    Frisch, Matthias; Frech, Kornelie; Klingenhoff, Andreas; Cartharius, Kerstin; Liebich, Ines; Werner, Thomas

    2002-01-01

    Scaffold/matrix attachment regions (S/MARs) are essential regulatory DNA elements of eukaryotic cells. They are major determinants of locus control of gene expression and can shield gene expression from position effects. Experimental detection of S/MARs requires substantial effort and is not suitable for large-scale screening of genomic sequences. In silico prediction of S/MARs can provide a crucial first selection step to reduce the number of candidates. We used experimentally defined S/MAR sequences as the training set and generated a library of new S/MAR-associated, AT-rich patterns described as weight matrices. A new tool called SMARTest was developed that identifies potential S/MARs by performing a density analysis based on the S/MAR matrix library (http://www.genomatix.de/cgi-bin/smartest_pd/smartest.pl). S/MAR predictions were evaluated by using six genomic sequences from animal and plant for which S/MARs and non-S/MARs were experimentally mapped. SMARTest reached a sensitivity of 38% and a specificity of 68%. In contrast to previous algorithms, the SMARTest approach does not depend on the sequence context and is suitable to analyze long genomic sequences up to the size of whole chromosomes. To demonstrate the feasibility of large-scale S/MAR prediction, we analyzed the recently published chromosome 22 sequence and found 1198 S/MAR candidates. PMID:11827955

  5. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  6. Efficient and rapid generation of large genomic variants in rats and mice using CRISMERE

    PubMed Central

    Birling, Marie-Christine; Schaeffer, Laurence; André, Philippe; Lindner, Loic; Maréchal, Damien; Ayadi, Abdel; Sorg, Tania; Pavlovic, Guillaume; Hérault, Yann

    2017-01-01

    Modelling Down syndrome (DS) in mouse has been crucial for the understanding of the disease and the evaluation of therapeutic targets. Nevertheless, the modelling so far has been limited to the mouse and, even in this model, generating duplication of genomic regions has been labour intensive and time consuming. We developed the CRISpr MEdiated REarrangement (CRISMERE) strategy, which takes advantage of the CRISPR/Cas9 system, to generate most of the desired rearrangements from a single experiment at much lower expenses and in less than 9 months. Deletions, duplications, and inversions of genomic regions as large as 24.4 Mb in rat and mouse founders were observed and germ line transmission was confirmed for fragment as large as 3.6 Mb. Interestingly we have been able to recover duplicated regions from founders in which we only detected deletions. CRISMERE is even more powerful than anticipated it allows the scientific community to manipulate the rodent and probably other genomes in a fast and efficient manner which was not possible before. PMID:28266534

  7. Genome-scale phylogenetic function annotation of large and diverse protein families.

    PubMed

    Engelhardt, Barbara E; Jordan, Michael I; Srouji, John R; Brenner, Steven E

    2011-11-01

    The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu.

  8. Centromere destiny in dicentric chromosomes: New insights from the evolution of human chromosome 2 ancestral centromeric region.

    PubMed

    Chiatante, Giorgia; Giannuzzi, Giuliana; Calabrese, Francesco Maria; Eichler, Evan E; Ventura, Mario

    2017-03-15

    Dicentric chromosomes are products of genomic rearrangements that place two centromeres on the same chromosome. Due to the presence of two primary constrictions, they are inherently unstable and overcome their instability by epigenetically inactivating and/or deleting one of the two centromeres, thus resulting in functionally monocentric chromosomes that segregate normally during cell division. Our understanding to date of dicentric chromosome formation, behavior and fate has been largely inferred from observational studies in plants and humans as well as artificially produced de novo dicentrics in yeast and in human cells. We investigate the most recent product of a chromosome fusion event fixed in the human lineage, human chromosome 2, whose stability was acquired by the suppression of one centromere, resulting in a unique difference in chromosome number between humans (46 chromosomes) and our most closely related ape relatives (48 chromosomes). Using molecular cytogenetics, sequencing and comparative sequence data, we deeply characterize the relicts of the chromosome 2q ancestral centromere and its flanking regions, gaining insight into the ancestral organization that can be easily broadened to all acrocentric chromosome centromeres. Moreover, our analyses offered the opportunity to trace the evolutionary history of rDNA and satellite III sequences among great apes, thus suggesting a new hypothesis for the preferential inactivation of some human centromeres, including IIq. Our results suggest two possible centromere inactivation models to explain the evolutionarily stabilization of human chromosome 2 over the last 5-6 million years. Our results strongly favor centromere excision through a one-step process.

  9. Ultra Large Gene Families: A Matter of Adaptation or Genomic Parasites?

    PubMed Central

    Schiffer, Philipp H.; Gravemeyer, Jan; Rauscher, Martina; Wiehe, Thomas

    2016-01-01

    Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly) deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR)-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term “run-away evolution”. This process might ultimately lead to the failure of genomic integrity and drive species to extinction. PMID:27509525

  10. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.

    PubMed

    Ghedin, Elodie; Sengamalay, Naomi A; Shumway, Martin; Zaborsky, Jennifer; Feldblyum, Tamara; Subbu, Vik; Spiro, David J; Sitz, Jeff; Koo, Hean; Bolotov, Pavel; Dernovoy, Dmitry; Tatusova, Tatiana; Bao, Yiming; St George, Kirsten; Taylor, Jill; Lipman, David J; Fraser, Claire M; Taubenberger, Jeffery K; Salzberg, Steven L

    2005-10-20

    Influenza viruses are remarkably adept at surviving in the human population over a long timescale. The human influenza A virus continues to thrive even among populations with widespread access to vaccines, and continues to be a major cause of morbidity and mortality. The virus mutates from year to year, making the existing vaccines ineffective on a regular basis, and requiring that new strains be chosen for a new vaccine. Less-frequent major changes, known as antigenic shift, create new strains against which the human population has little protective immunity, thereby causing worldwide pandemics. The most recent pandemics include the 1918 'Spanish' flu, one of the most deadly outbreaks in recorded history, which killed 30-50 million people worldwide, the 1957 'Asian' flu, and the 1968 'Hong Kong' flu. Motivated by the need for a better understanding of influenza evolution, we have developed flexible protocols that make it possible to apply large-scale sequencing techniques to the highly variable influenza genome. Here we report the results of sequencing 209 complete genomes of the human influenza A virus, encompassing a total of 2,821,103 nucleotides. In addition to increasing markedly the number of publicly available, complete influenza virus genomes, we have discovered several anomalies in these first 209 genomes that demonstrate the dynamic nature of influenza transmission and evolution. This new, large-scale sequencing effort promises to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations. All data from this project are being deposited, without delay, in public archives.

  11. Biological consequences of ancient gene acquisition and duplication in the large genome soil bacterium, ""solibacter usitatus"" strain Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Xie, Gary; Kuske, Cheryl R; Hauser, Loren; Land, Miriam

    2009-01-01

    Bacterial genome sizes range from ca. 0.5 to 10Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Sequenced genomes of strains in the phylum Acidobacteria revealed that 'Solibacter usistatus' strain Ellin6076 harbors a 9.9 Mb genome. This large genome appears to have arisen by horizontal gene transfer via ancient bacteriophage and plasmid-mediated transduction, as well as widespread small-scale gene duplications. This has resulted in an increased number of paralogs that are potentially ecologically important (ecoparalogs). Low amino acid sequence identities among functional group members and lack of conserved gene order and orientation in the regions containing similar groups of paralogs suggest that most of the paralogs were not the result of recent duplication events. The genome sizes of cultured subdivision 1 and 3 strains in the phylum Acidobacteria were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 1 were estimated to have smaller genome sizes ranging from ca. 2.0 to 4.8 Mb, whereas members of subdivision 3 had slightly larger genomes, from ca. 5.8 to 9.9 Mb. It is hypothesized that the large genome of strain Ellin6076 encodes traits that provide a selective metabolic, defensive and regulatory advantage in the variable soil environment.

  12. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    PubMed

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  13. Mitochondrial introgression suggests extensive ancestral hybridization events among Saccharomyces species.

    PubMed

    Peris, David; Arias, Armando; Orlić, Sandi; Belloch, Carmela; Pérez-Través, Laura; Querol, Amparo; Barrio, Eladio

    2017-03-01

    Horizontal gene transfer (HGT) in eukaryotic plastids and mitochondrial genomes is common, and plays an important role in organism evolution. In yeasts, recent mitochondrial HGT has been suggested between S. cerevisiae and S. paradoxus. However, few strains have been explored given the lack of accurate mitochondrial genome annotations. Mitochondrial genome sequences are important to understand how frequent these introgressions occur, and their role in cytonuclear incompatibilities and fitness. Indeed, most of the Bateson-Dobzhansky-Muller genetic incompatibilities described in yeasts are driven by cytonuclear incompatibilities. We herein explored the mitochondrial inheritance of several worldwide distributed wild Saccharomyces species and their hybrids isolated from different sources and geographic origins. We demonstrated the existence of several recombination points in mitochondrial region COX2-ORF1, likely mediated by either the activity of the protein encoded by the ORF1 (F-SceIII) gene, a free-standing homing endonuclease, or mostly facilitated by A+T tandem repeats and regions of integration of GC clusters. These introgressions were shown to occur among strains of the same species and among strains of different species, which suggests a complex model of Saccharomyces evolution that involves several ancestral hybridization events in wild environments.

  14. Reverse engineering and analysis of large genome-scale gene networks.

    PubMed

    Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas

    2013-01-07

    Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web.

  15. Sequence capture and next-generation sequencing of ultraconserved elements in a large-genome salamander.

    PubMed

    Newman, Catherine E; Austin, Christopher C

    2016-12-01

    Amidst the rapid advancement in next-generation sequencing (NGS) technology over the last few years, salamanders have been left behind. Salamanders have enormous genomes-up to 40 times the size of the human genome-and this poses challenges to generating NGS data sets of quality and quantity similar to those of other vertebrates. However, optimization of laboratory protocols is time-consuming and often cost prohibitive, and continued omission of salamanders from novel phylogeographic research is detrimental to species facing decline. Here, we use a salamander endemic to the southeastern United States, Plethodon serratus, to test the utility of an established protocol for sequence capture of ultraconserved elements (UCEs) in resolving intraspecific phylogeographic relationships and delimiting cryptic species. Without modifying the standard laboratory protocol, we generated a data set consisting of over 600 million reads for 85 P. serratus samples. Species delimitation analyses support recognition of seven species within P. serratus sensu lato, and all phylogenetic relationships among the seven species are fully resolved under a coalescent model. Results also corroborate previous data suggesting nonmonophyly of the Ouachita and Louisiana regions. Our results demonstrate that established UCE protocols can successfully be used in phylogeographic studies of salamander species, providing a powerful tool for future research on evolutionary history of amphibians and other organisms with large genomes.

  16. Large-insert genome analysis technology detects structural variation in Pseudomonas aeruginosa clinical strains from cystic fibrosis patients.

    PubMed

    Hayden, Hillary S; Gillett, Will; Saenphimmachak, Channakhone; Lim, Regina; Zhou, Yang; Jacobs, Michael A; Chang, Jean; Rohmer, Laurence; D'Argenio, David A; Palmieri, Anthony; Levy, Ruth; Haugen, Eric; Wong, Gane K S; Brittnacher, Mitch J; Burns, Jane L; Miller, Samuel I; Olson, Maynard V; Kaul, Rajinder

    2008-06-01

    Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the "genome universe" of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation.

  17. Mosaic Uniparental Disomies and Aneuploidies as Large Structural Variants of the Human Genome

    PubMed Central

    Rodríguez-Santiago, Benjamín; Malats, Núria; Rothman, Nathaniel; Armengol, Lluís; Garcia-Closas, Montse; Kogevinas, Manolis; Villa, Olaya; Hutchinson, Amy; Earl, Julie; Marenne, Gaëlle; Jacobs, Kevin; Rico, Daniel; Tardón, Adonina; Carrato, Alfredo; Thomas, Gilles; Valencia, Alfonso; Silverman, Debra; Real, Francisco X.; Chanock, Stephen J.; Pérez-Jurado, Luis A.

    2010-01-01

    Mosaicism is defined as the coexistence of cells with different genetic composition within an individual, caused by postzygotic somatic mutation. Although somatic mosaicism for chromosomal abnormalities is a well-established cause of developmental and somatic disorders and has also been detected in different tissues, its frequency and extent in the adult normal population are still unknown. We provide here a genome-wide survey of mosaic genomic variation obtained by analyzing Illumina 1M SNP array data from blood or buccal DNA samples of 1991 adult individuals from the Spanish Bladder Cancer/EPICURO genome-wide association study. We found mosaic abnormalities in autosomes in 1.7% of samples, including 23 segmental uniparental disomies, 8 complete trisomies, and 11 large (1.5–37 Mb) copy-number variants. Alterations were observed across the different autosomes with recurrent events in chromosomes 9 and 20. No case-control differences were found in the frequency of events or the percentage of cells affected, thus indicating that most rearrangements found are not central to the development of bladder cancer. However, five out of six events tested were detected in both blood and bladder tissue from the same individual, indicating an early developmental origin. The high cellular frequency of the anomalies detected and their presence in normal adult individuals suggest that this type of mosaicism is a widespread phenomenon in the human genome. Somatic mosaicism should be considered in the expanding repertoire of inter- and intraindividual genetic variation, some of which may cause somatic human diseases but also contribute to modifying inherited disorders and/or late-onset multifactorial traits. PMID:20598279

  18. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    PubMed Central

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  19. Common 5' beta-globin RFLP haplotypes harbour a surprising level of ancestral sequence mosaicism.

    PubMed

    Webster, Matthew T; Clegg, John B; Harding, Rosalind M

    2003-07-01

    Blocks of linkage disequilibrium (LD) in the human genome represent segments of ancestral chromosomes. To investigate the relationship between LD and genealogy, we analysed diversity associated with restriction fragment length polymorphism (RFLP) haplotypes of the 5' beta-globin gene complex. Genealogical analyses were based on sequence alleles that spanned a 12.2-kb interval, covering 3.1 kb around the psibeta gene and 6.2 kb of the delta-globin gene and its 5' flanking sequence known as the R/T region. Diversity was sampled from a Kenyan Luo population where recent malarial selection has contributed to substantial LD. A single common sequence allele spanning the 12.2-kb interval exclusively identified the ancestral chromosome bearing the "Bantu" beta(s) (sickle-cell) RFLP haplotype. Other common 5' RFLP haplotypes comprised interspersed segments from multiple ancestral chromosomes. Nucleotide diversity was similar between psibeta and R/T-delta-globin but was non-uniformly distributed within the R/T-delta-globin region. High diversity associated with the 5' R/T identified two ancestral lineages that probably date back more than 2 million years. Within this genealogy, variation has been introduced into the 3' R/T by gene conversion from other ancestral chromosomes. Diversity in delta-globin was found to lead through parts of the main genealogy but to coalesce in a more recent ancestor. The well-known recombination hotspot is clearly restricted to the region 3' of delta-globin. Our analyses show that, whereas one common haplotype in a block of high LD represents a long segment from a single ancestral chromosome, others are mosaics of short segments from multiple ancestors related in genealogies of unsuspected complexity.

  20. Inferring Ancestral Recombination Graphs from Bacterial Genomic Data

    PubMed Central

    Vaughan, Timothy G.; Welch, David; Drummond, Alexei J.; Biggs, Patrick J.; George, Tessy; French, Nigel P.

    2017-01-01

    Homologous recombination is a central feature of bacterial evolution, yet it confounds traditional phylogenetic methods. While a number of methods specific to bacterial evolution have been developed, none of these permit joint inference of a bacterial recombination graph and associated parameters. In this article, we present a new method which addresses this shortcoming. Our method uses a novel Markov chain Monte Carlo algorithm to perform phylogenetic inference under the ClonalOrigin model. We demonstrate the utility of our method by applying it to ribosomal multilocus sequence typing data sequenced from pathogenic and nonpathogenic Escherichia coli serotype O157 and O26 isolates collected in rural New Zealand. The method is implemented as an open source BEAST 2 package, Bacter, which is available via the project web page at http://tgvaughan.github.io/bacter. PMID:28007885

  1. Cross-Platform Assessment of Genomic Imbalance Confirms the Clinical Relevance of Genomic Complexity and Reveals Loci with Potential Pathogenic Roles in Diffuse Large B-Cell Lymphoma

    PubMed Central

    Dias, Lizalynn M.; Thodima, Venkata; Friedman, Julia; Ma, Charles; Guttapalli, Asha; Mendiratta, Geetu; Siddiqi, Imran N.; Syrbu, Sergei; Chaganti, R. S. K.; Houldsworth, Jane

    2016-01-01

    Genomic copy number alterations (CNAs) in diffuse large B-cell lymphoma (DLBCL) have roles in disease pathogenesis but overall clinical relevance remains unclear. Herein, an unbiased algorithm was uniformly applied across three genome profiling datasets comprising 392 newly-diagnosed DLBCL specimens that defined 32 overlapping CNAs, involving 36 minimal common regions (MCRs). Scoring criteria were established for 50 aberrations within the MCRs while considering peak gains/losses. Application of these criteria to independent datasets revealed novel candidate genes with coordinated expression, such as CNOT2, potentially with pathogenic roles. No one single aberration significantly associated with patient outcome across datasets, but genomic complexity, defined by imbalance in more than one MCR, significantly portended adverse outcome in two of three independent datasets. Thus, the standardized scoring of CNAs currently developed can be uniformly applied across platforms, affording robust validation of genomic imbalance and complexity in DLBCL and overall clinical utility as biomarkers of patient outcome. PMID:26294112

  2. In search of ancestral Kilauea volcano

    USGS Publications Warehouse

    Lipman, P.W.; Sisson, T.W.; Ui, T.; Naka, J.

    2000-01-01

    Submersible observations and samples show that the lower south flank of Hawaii, offshore from Kilauea volcano and the active Hilina slump system, consists entirely of compositionally diverse volcaniclastic rocks; pillow lavas are confined to shallow slopes. Submarine-erupted basalt clasts have strongly variable alkalic and transitional basalt compositions (to 41% SiO2, 10.8% alkalies), contrasting with present-day Kilauea tholeiites. The volcaniclastic rocks provide a unique record of ancestral alkalic growth of an archetypal hotspot volcano, including transition to its tholeiitic shield stage, and associated slope-failure events.

  3. Large genomic rearrangement of BRCA1 and BRCA2 genes in familial breast cancer patients in Korea.

    PubMed

    Cho, Ja Young; Cho, Dae-Yeon; Ahn, Sei Hyun; Choi, Su-Youn; Shin, Inkyung; Park, Hyun Gyu; Lee, Jong Won; Kim, Hee Jeong; Yu, Jong Han; Ko, Beom Seok; Ku, Bo Kyung; Son, Byung Ho

    2014-06-01

    We screened large genomic rearrangements of the BRCA1 and BRCA2 genes in Korean, familial breast cancer patients. Multiplex ligation-dependent probe amplification assay was used to identify BRCA1 and BRCA2 genomic rearrangements in 226 Korean familial breast cancer patients with risk factors for BRCA1 and BRCA2 mutations, who previously tested negative for point mutations in the two genes. We identified only one large deletion (c.4186-1593_4676-1465del) in BRCA1. No large rearrangements were found in BRCA2. Our result indicates that large genomic rearrangement in the BRCA1 and BRCA2 genes does not seem like a major determinant of breast cancer susceptibility in the Korean population. A large-scale study needs to validate our result in Korea.

  4. Plasticity of animal genome architecture unmasked by rapid evolution of a pelagic tunicate.

    PubMed

    Denoeud, France; Henriet, Simon; Mungpakdee, Sutada; Aury, Jean-Marc; Da Silva, Corinne; Brinkmann, Henner; Mikhaleva, Jana; Olsen, Lisbeth Charlotte; Jubin, Claire; Cañestro, Cristian; Bouquet, Jean-Marie; Danks, Gemma; Poulain, Julie; Campsteijn, Coen; Adamski, Marcin; Cross, Ismael; Yadetie, Fekadu; Muffato, Matthieu; Louis, Alexandra; Butcher, Stephen; Tsagkogeorga, Georgia; Konrad, Anke; Singh, Sarabdeep; Jensen, Marit Flo; Huynh Cong, Evelyne; Eikeseth-Otteraa, Helen; Noel, Benjamin; Anthouard, Véronique; Porcel, Betina M; Kachouri-Lafond, Rym; Nishino, Atsuo; Ugolini, Matteo; Chourrout, Pascal; Nishida, Hiroki; Aasland, Rein; Huzurbazar, Snehalata; Westhof, Eric; Delsuc, Frédéric; Lehrach, Hans; Reinhardt, Richard; Weissenbach, Jean; Roy, Scott W; Artiguenave, François; Postlethwait, John H; Manak, J Robert; Thompson, Eric M; Jaillon, Olivier; Du Pasquier, Louis; Boudinot, Pierre; Liberles, David A; Volff, Jean-Nicolas; Philippe, Hervé; Lenhard, Boris; Roest Crollius, Hugues; Wincker, Patrick; Chourrout, Daniel

    2010-12-03

    Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain.

  5. Plasticity of Animal Genome Architecture Unmasked by Rapid Evolution of a Pelagic Tunicate

    PubMed Central

    Denoeud, France; Henriet, Simon; Mungpakdee, Sutada; Aury, Jean-Marc; Da Silva, Corinne; Brinkmann, Henner; Mikhaleva, Jana; Olsen, Lisbeth Charlotte; Jubin, Claire; Cañestro, Cristian; Bouquet, Jean-Marie; Danks, Gemma; Poulain, Julie; Campsteijn, Coen; Adamski, Marcin; Cross, Ismael; Yadetie, Fekadu; Muffato, Matthieu; Louis, Alexandra; Butcher, Stephen; Tsagkogeorga, Georgia; Konrad, Anke; Singh, Sarabdeep; Jensen, Marit Flo; Cong, Evelyne Huynh; Eikeseth-Otteraa, Helen; Noel, Benjamin; Anthouard, Véronique; Porcel, Betina M.; Kachouri-Lafond, Rym; Nishino, Atsuo; Ugolini, Matteo; Chourrout, Pascal; Nishida, Hiroki; Aasland, Rein; Huzurbazar, Snehalata; Westhof, Eric; Delsuc, Frédéric; Lehrach, Hans; Reinhardt, Richard; Weissenbach, Jean; Roy, Scott W.; Artiguenave, François; Postlethwait, John H.; Manak, J. Robert; Thompson, Eric M.; Jaillon, Olivier; Pasquier, Louis Du; Boudinot, Pierre; Liberles, David A.; Volff, Jean-Nicolas; Philippe, Hervé; Lenhard, Boris; Crollius, Hugues Roest; Wincker, Patrick; Chourrout, Daniel

    2012-01-01

    Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain. PMID:21097902

  6. First Insights into the Large Genome of Epimedium sagittatum (Sieb. et Zucc) Maxim, a Chinese Traditional Medicinal Plant

    PubMed Central

    Liu, Di; Zeng, Shao-Hua; Chen, Jian-Jun; Zhang, Yan-Jun; Xiao, Gong; Zhu, Lin-Yao; Wang, Ying

    2013-01-01

    Epimedium sagittatum (Sieb. et Zucc) Maxim is a member of the Berberidaceae family of basal eudicot plants, widely distributed and used as a traditional medicinal plant in China for therapeutic effects on many diseases with a long history. Recent data shows that E. sagittatum has a relatively large genome, with a haploid genome size of ~4496 Mbp, divided into a small number of only 12 diploid chromosomes (2n = 2x = 12). However, little is known about Epimedium genome structure and composition. Here we present the analysis of 691 kb of high-quality genomic sequence derived from 672 randomly selected plasmid clones of E. sagittatum genomic DNA, representing ~0.0154% of the genome. The sampled sequences comprised at least 78.41% repetitive DNA elements and 2.51% confirmed annotated gene sequences, with a total GC% content of 39%. Retrotransposons represented the major class of transposable element (TE) repeats identified (65.37% of all TE repeats), particularly LTR (Long Terminal Repeat) retrotransposons (52.27% of all TE repeats). Chromosome analysis and Fluorescence in situ Hybridization of Gypsy-Ty3 retrotransposons were performed to survey the E. sagittatum genome at the cytological level. Our data provide the first insights into the composition and structure of the E. sagittatum genome, and will facilitate the functional genomic analysis of this valuable medicinal plant. PMID:23807511

  7. Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core.

    PubMed

    Mann, Rachel A; Smits, Theo H M; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E; Plummer, Kim M; Beer, Steven V; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1(Ea) and a putative secondary metabolite pathway only present in Rubus-infecting strains.

  8. Bacterial delivery of large intact genomic-DNA-containing BACs into mammalian cells.

    PubMed

    Cheung, Wing; Kotzamanis, George; Abdulrazzak, Hassan; Goussard, Sylvie; Kaname, Tadashi; Kotsinas, Athanassios; Gorgoulis, Vassilis G; Grillot-Courvalin, Catherine; Huxley, Clare

    2012-01-01

    Efficient delivery of large intact vectors into mammalian cells remains problematical. Here we evaluate delivery by bacterial invasion of two large BACs of more than 150 kb in size into various cells. First, we determined the effect of several drugs on bacterial delivery of a small plasmid into different cell lines. Most drugs tested resulted in a marginal increase of the overall efficiency of delivery in only some cell lines, except the lysosomotropic drug chloroquine, which was found to increase the efficiency of delivery by 6-fold in B16F10 cells. Bacterial invasion was found to be significantly advantageous compared with lipofection in delivering large intact BACs into mouse cells, resulting in 100% of clones containing intact DNA. Furthermore, evaluation of expression of the human hypoxanthine phosphoribosyltransferase (HPRT) gene from its genomic locus, which was present in one of the BACs, showed that single copy integrations of the HPRT-containing BAC had occurred in mouse B16F10 cells and that expression of HPRT from each human copy was 0.33 times as much as from each endogenous mouse copy. These data provide new evidence that bacterial delivery is a convenient and efficient method to transfer large intact therapeutic genes into mammalian cells.

  9. Direct selection: a method for the isolation of cDNAs encoded by large genomic regions.

    PubMed Central

    Lovett, M; Kere, J; Hinton, L M

    1991-01-01

    We have developed a strategy for the rapid enrichment and identification of cDNAs encoded by large genomic regions. The basis of this "direct selection" scheme is the hybridization of an entire library of cDNAs to an immobilized genomic clone. Nonspecific hybrids are eliminated and selected cDNAs are eluted. These molecules are then amplified and are either cloned or subjected to further selection/amplification cycles. This scheme was tested using a 550-kilobase yeast artificial chromosome clone that contains the EPO gene. Using this clone and a fetal kidney cDNA library, we have achieved a 1000-fold enrichment of EPO cDNAs in one cycle of enrichment. More significantly, we have further investigated one of the "anonymous" cDNAs that was selectively enriched. We confirmed that this cDNA was encoded by the yeast artificial chromosome. Its frequency in the starting library was 1 in 1 x 10(5) cDNAs and after selection comprised 2% of the selected library. DNA sequence analysis of this cDNA and of the yeast artificial chromosome clone revealed that this gene encodes the beta 2 subunit of the human guanine nucleotide-binding regulatory proteins. Restriction mapping and hybridization data position this gene (GNB2) to within 30-70 kilobases of the EPO gene. The selective isolation and mapping of GNB2 confirms the feasibility of this direct selection strategy and suggests that it will be useful for the rapid isolation of cDNAs, including disease-related genes, across extensive portions of the human genome. Images PMID:1946378

  10. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

    SciTech Connect

    Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

    2016-01-01

    In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

  11. Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0

    PubMed Central

    2010-01-01

    Background The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. Findings GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb). It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. Conclusions The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind. PMID:20178631

  12. BAL31-NGS approach for identification of telomeres de novo in large genomes.

    PubMed

    Peška, Vratislav; Sitová, Zdeňka; Fajkus, Petr; Fajkus, Jiří

    2017-02-01

    This article describes a novel method to identify as yet undiscovered telomere sequences, which combines next generation sequencing (NGS) with BAL31 digestion of high molecular weight DNA. The method was applied to two groups of plants: i) dicots, genus Cestrum, and ii) monocots, Allium species (e.g. A. ursinum and A. cepa). Both groups consist of species with large genomes (tens of Gb) and a low number of chromosomes (2n=14-16), full of repeat elements. Both genera lack typical telomeric repeats and multiple studies have attempted to characterize alternative telomeric sequences. However, despite interesting hypotheses and suggestions of alternative candidate telomeres (retrotransposons, rDNA, satellite repeats) these studies have not resolved the question. In a novel approach based on the two most general features of eukaryotic telomeres, their repetitive character and sensitivity to BAL31 nuclease digestion, we have taken advantage of the capacity and current affordability of NGS in combination with the robustness of classical BAL31 nuclease digestion of chromosomal termini. While representative samples of most repeat elements were ensured by low-coverage (less than 5%) genomic shot-gun NGS, candidate telomeres were identified as under-represented sequences in BAL31-treated samples.

  13. Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma.

    PubMed

    Cerhan, James R; Berndt, Sonja I; Vijai, Joseph; Ghesquières, Hervé; McKay, James; Wang, Sophia S; Wang, Zhaoming; Yeager, Meredith; Conde, Lucia; de Bakker, Paul I W; Nieters, Alexandra; Cox, David; Burdett, Laurie; Monnereau, Alain; Flowers, Christopher R; De Roos, Anneclaire J; Brooks-Wilson, Angela R; Lan, Qing; Severi, Gianluca; Melbye, Mads; Gu, Jian; Jackson, Rebecca D; Kane, Eleanor; Teras, Lauren R; Purdue, Mark P; Vajdic, Claire M; Spinelli, John J; Giles, Graham G; Albanes, Demetrius; Kelly, Rachel S; Zucca, Mariagrazia; Bertrand, Kimberly A; Zeleniuch-Jacquotte, Anne; Lawrence, Charles; Hutchinson, Amy; Zhi, Degui; Habermann, Thomas M; Link, Brian K; Novak, Anne J; Dogan, Ahmet; Asmann, Yan W; Liebow, Mark; Thompson, Carrie A; Ansell, Stephen M; Witzig, Thomas E; Weiner, George J; Veron, Amelie S; Zelenika, Diana; Tilly, Hervé; Haioun, Corinne; Molina, Thierry Jo; Hjalgrim, Henrik; Glimelius, Bengt; Adami, Hans-Olov; Bracci, Paige M; Riby, Jacques; Smith, Martyn T; Holly, Elizabeth A; Cozen, Wendy; Hartge, Patricia; Morton, Lindsay M; Severson, Richard K; Tinker, Lesley F; North, Kari E; Becker, Nikolaus; Benavente, Yolanda; Boffetta, Paolo; Brennan, Paul; Foretova, Lenka; Maynadie, Marc; Staines, Anthony; Lightfoot, Tracy; Crouch, Simon; Smith, Alex; Roman, Eve; Diver, W Ryan; Offit, Kenneth; Zelenetz, Andrew; Klein, Robert J; Villano, Danylo J; Zheng, Tongzhang; Zhang, Yawei; Holford, Theodore R; Kricker, Anne; Turner, Jenny; Southey, Melissa C; Clavel, Jacqueline; Virtamo, Jarmo; Weinstein, Stephanie; Riboli, Elio; Vineis, Paolo; Kaaks, Rudolph; Trichopoulos, Dimitrios; Vermeulen, Roel C H; Boeing, Heiner; Tjonneland, Anne; Angelucci, Emanuele; Di Lollo, Simonetta; Rais, Marco; Birmann, Brenda M; Laden, Francine; Giovannucci, Edward; Kraft, Peter; Huang, Jinyan; Ma, Baoshan; Ye, Yuanqing; Chiu, Brian C H; Sampson, Joshua; Liang, Liming; Park, Ju-Hyun; Chung, Charles C; Weisenburger, Dennis D; Chatterjee, Nilanjan; Fraumeni, Joseph F; Slager, Susan L; Wu, Xifeng; de Sanjose, Silvia; Smedby, Karin E; Salles, Gilles; Skibola, Christine F; Rothman, Nathaniel; Chanock, Stephen J

    2014-11-01

    Diffuse large B cell lymphoma (DLBCL) is the most common lymphoma subtype and is clinically aggressive. To identify genetic susceptibility loci for DLBCL, we conducted a meta-analysis of 3 new genome-wide association studies (GWAS) and 1 previous scan, totaling 3,857 cases and 7,666 controls of European ancestry, with additional genotyping of 9 promising SNPs in 1,359 cases and 4,557 controls. In our multi-stage analysis, five independent SNPs in four loci achieved genome-wide significance marked by rs116446171 at 6p25.3 (EXOC2; P = 2.33 × 10(-21)), rs2523607 at 6p21.33 (HLA-B; P = 2.40 × 10(-10)), rs79480871 at 2p23.3 (NCOA1; P = 4.23 × 10(-8)) and two independent SNPs, rs13255292 and rs4733601, at 8q24.21 (PVT1; P = 9.98 × 10(-13) and 3.63 × 10(-11), respectively). These data provide substantial new evidence for genetic susceptibility to this B cell malignancy and point to pathways involved in immune recognition and immune function in the pathogenesis of DLBCL.

  14. Merkel Cell Polyomavirus Large T Antigen Disrupts Host Genomic Integrity and Inhibits Cellular Proliferation

    PubMed Central

    Li, Jing; Wang, Xin; Diaz, Jason; Tsang, Sabrina H.; Buck, Christopher B.

    2013-01-01

    Clonal integration of Merkel cell polyomavirus (MCV) DNA into the host genome has been observed in at least 80% of Merkel cell carcinoma (MCC). The integrated viral genome typically carries mutations that truncate the C-terminal DNA binding and helicase domains of the MCV large T antigen (LT), suggesting a selective pressure to remove this MCV LT region during tumor development. In this study, we show that MCV infection leads to the activation of host DNA damage responses (DDR). This activity was mapped to the C-terminal helicase-containing region of the MCV LT. The MCV LT-activated DNA damage kinases, in turn, led to enhanced p53 phosphorylation, upregulation of p53 downstream target genes, and cell cycle arrest. Compared to the N-terminal MCV LT fragment that is usually preserved in mutants isolated from MCC tumors, full-length MCV LT shows a decreased potential to support cellular proliferation, focus formation, and anchorage-independent cell growth. These apparently antitumorigenic effects can be reversed by a dominant-negative p53 inhibitor. Our results demonstrate that MCV LT-induced DDR activates p53 pathway, leading to the inhibition of cellular proliferation. This study reveals a key difference between MCV LT and simian vacuolating virus 40 LT, which activates a DDR but inhibits p53 function. This study also explains, in part, why truncation mutations that remove the MCV LT C-terminal region are necessary for the oncogenic progression of MCV-associated cancers. PMID:23760247

  15. Searching for large genomic rearrangements of the BRCA1 gene in a Nigerian population.

    PubMed

    Zhang, Jing; Fackenthal, James D; Huo, Dezheng; Zheng, Yonglan; Olopade, Olufunmilayo I

    2010-11-01

    BRCA1/2 germline mutations predispose to breast and ovarian cancer. Large genomic rearrangements (LGRs) have widened the mutational spectrum of the BRCA1 gene, but the frequencies vary in different populations. In this study, we want to determine the spectrum of LGRs in BRCA1 gene in Nigerian breast cancer patients. The multiplex ligation-dependent probe amplification (MLPA) assay was used to screen BRCA1 rearrangements in 352 patients who previously tested negative for BRCA1 and BRCA2 point mutations and small insertions/deletions. Positive MLPA result was confirmed and located by long-range PCR. The breakpoints of the candidate rearrangement were characterized by sequencing. A novel deletion of BRCA1 exon 21 (c.5277 + 480_5332 + 672del) was detected in 1 out of 352 Nigerian breast cancer patients (0.3% occurrence frequency). Further analysis of breakpoints revealed that the deletion involves two Alu-elements: one AluSg in intron 20 and the AluY in intron 21. These data suggest that while BRCA1 genomic rearrangement exists, they do not contribute significantly to BRCA1-associated risk in the Nigerian population.

  16. Histology of “placoderm” dermal skeletons: Implications for the nature of the ancestral gnathostome

    PubMed Central

    Giles, Sam; Rücklin, Martin

    2013-01-01

    Abstract The vertebrate dermal skeleton has long been interpreted to have evolved from a primitive condition exemplified by chondrichthyans. However, chondrichthyans and osteichthyans evolved from an ancestral gnathostome stem‐lineage in which the dermal skeleton was more extensively developed. To elucidate the histology and skeletal structure of the gnathostome crown‐ancestor we conducted a histological survey of the diversity of the dermal skeleton among the placoderms, a diverse clade or grade of early jawed vertebrates. The dermal skeleton of all placoderms is composed largely of a cancellar architecture of cellular dermal bone, surmounted by dermal tubercles in the most ancestral clades, including antiarchs. Acanthothoracids retain an ancestral condition for the dermal skeleton, and we record its secondary reduction in antiarchs. We also find that mechanisms for remodeling bone and facilitating different growth rates between adjoining plates are widespread throughout the placoderms. J. Morphol., 2013. © 2013 Wiley Periodicals, Inc. PMID:23378262

  17. The search for ancestral nervous systems: an integrative and comparative approach.

    PubMed

    Satterlie, Richard A

    2015-02-15

    Even the most basal multicellular nervous systems are capable of producing complex behavioral acts that involve the integration and combination of simple responses, and decision-making when presented with conflicting stimuli. This requires an understanding beyond that available from genomic investigations, and calls for a integrative and comparative approach, where the power of genomic/transcriptomic techniques is coupled with morphological, physiological and developmental experimentation to identify common and species-specific nervous system properties for the development and elaboration of phylogenomic reconstructions. With careful selection of genes and gene products, we can continue to make significant progress in our search for ancestral nervous system organizations.

  18. Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires.

    PubMed

    Burstein, David; Amaro, Francisco; Zusman, Tal; Lifshitz, Ziv; Cohen, Ofir; Gilbert, Jack A; Pupko, Tal; Shuman, Howard A; Segal, Gil

    2016-02-01

    Infection by the human pathogen Legionella pneumophila relies on the translocation of ∼ 300 virulence proteins, termed effectors, which manipulate host cell processes. However, almost no information exists regarding effectors in other Legionella pathogens. Here we sequenced, assembled and characterized the genomes of 38 Legionella species and predicted their effector repertoires using a previously validated machine learning approach. This analysis identified 5,885 predicted effectors. The effector repertoires of different Legionella species were found to be largely non-overlapping, and only seven core effectors were shared by all species studied. Species-specific effectors had atypically low GC content, suggesting exogenous acquisition, possibly from the natural protozoan hosts of these species. Furthermore, we detected numerous new conserved effector domains and discovered new domain combinations, which allowed the inference of as yet undescribed effector functions. The effector collection and network of domain architectures described here can serve as a roadmap for future studies of effector function and evolution.

  19. The Exceptionally Large Chloroplast Genome of the Green Alga Floydiella terrestris Illuminates the Evolutionary History of the Chlorophyceae

    PubMed Central

    Brouard, Jean-Simon; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2010-01-01

    The Chlorophyceae, an advanced class of chlorophyte green algae, comprises five lineages that form two major clades (Chlamydomonadales + Sphaeropleales and Oedogoniales + Chaetopeltidales + Chaetophorales). The four complete chloroplast DNA (cpDNA) sequences currently available for chlorophyceans uncovered an extraordinarily fluid genome architecture as well as many structural features distinguishing this group from other green algae. We report here the 521,168-bp cpDNA sequence from a member of the Chaetopeltidales (Floydiella terrestris), the sole chlorophycean lineage not previously sampled for chloroplast genome analysis. This genome, which contains 97 conserved genes and 26 introns (19 group I and 7 group II introns), is the largest chloroplast genome ever sequenced. Intergenic regions account for 77.8% of the genome size and are populated by short repeats. Numerous genomic features are shared with the cpDNA of the chaetophoralean Stigeoclonium helveticum, notably the absence of a large inverted repeat and the presence of unique gene clusters and trans-spliced group II introns. Although only one of the Floydiella group I introns encodes a homing endonuclease gene, our finding of five free-standing reading frames having similarity with such genes suggests that chloroplast group I introns endowed with mobility were once more abundant in the Floydiella lineage. Parsimony analysis of structural genomic features and phylogenetic analysis of chloroplast sequence data unambiguously resolved the Oedogoniales as sister to the Chaetopeltidales and Chaetophorales. An evolutionary scenario of the molecular events that shaped the chloroplast genome in the Chlorophyceae is presented. PMID:20624729

  20. BK Polyomavirus Genomic Integration and Large T Antigen Expression: Evolving Paradigms in Human Oncogenesis.

    PubMed

    Kenan, D J; Mieczkowski, P A; Latulippe, E; Côté, I; Singh, H K; Nickeleit, V

    2016-12-31

    Human polyomaviruses are ubiquitous, with primary infections that typically occur during childhood and subsequent latency that may last a lifetime. Polyomavirus-mediated disease has been described in immunocompromised patients; its relationship to oncogenesis is poorly understood. We present deep sequencing data from a high-grade BK virus-associated tumor expressing large T antigen. The carcinoma arose in a kidney allograft 6 years after transplantation. We identified a novel genotype 1a BK polyomavirus, called Chapel Hill BK polyomavirus 2 (CH-2), that was integrated into the BRE gene in chromosome 2 of tumor cells. At the chromosomal integration site, viral break points were found, disrupting late BK gene sequences encoding capsid proteins VP1 and VP2/3. Immunohistochemistry and in situ hybridization studies demonstrated that the integrated BK virus was replication incompetent. We propose that the BK virus CH-2 was integrated into the human genome as a concatemer, resulting in alterations of feedback loops and overexpression of large T antigen. Collectively, these findings support the emerging understanding that viral integration is a nearly ubiquitous feature in polyomavirus-associated malignancy and that unregulated large T antigen expression drives a proliferative state that is conducive to oncogenesis. Based on the current observations, we present an updated model of polyomavirus-mediated oncogenesis.

  1. Software engineering the mixed model for genome-wide association studies on large samples.

    PubMed

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.

  2. An Ancestral Recombination Graph for Diploid Populations with Skewed Offspring Distribution

    PubMed Central

    Birkner, Matthias; Blath, Jochen; Eldon, Bjarki

    2013-01-01

    A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar “heavily skewed” reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between

  3. Large-scale detection of recombination in nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Chan, Cheong Xin; Beiko, Robert G.; Ragan, Mark A.

    2008-01-01

    Genetic recombination following a genetic transfer event can produce heterogeneous phylogenetic histories within sets of genes that share a common ancestral origin. Delineating recombination events will enhance our understanding in genome evolution. However, the task of detecting recombination is not trivial due to effect of more-recent evolutionary changes that can obscure such event from detection. In this paper, we demonstrate the use of a two-phase strategy for detecting recombination events on a large-scale dataset.

  4. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  5. The genome of woodland strawberry (Fragaria vesca).

    PubMed

    Shulaev, Vladimir; Sargent, Daniel J; Crowhurst, Ross N; Mockler, Todd C; Folkerts, Otto; Delcher, Arthur L; Jaiswal, Pankaj; Mockaitis, Keithanne; Liston, Aaron; Mane, Shrinivasrao P; Burns, Paul; Davis, Thomas M; Slovin, Janet P; Bassil, Nahla; Hellens, Roger P; Evans, Clive; Harkins, Tim; Kodira, Chinnappa; Desany, Brian; Crasta, Oswald R; Jensen, Roderick V; Allan, Andrew C; Michael, Todd P; Setubal, Joao Carlos; Celton, Jean-Marc; Rees, D Jasper G; Williams, Kelly P; Holt, Sarah H; Ruiz Rojas, Juan Jairo; Chatterjee, Mithu; Liu, Bo; Silva, Herman; Meisel, Lee; Adato, Avital; Filichkin, Sergei A; Troggio, Michela; Viola, Roberto; Ashman, Tia-Lynn; Wang, Hao; Dharmawardhana, Palitha; Elser, Justin; Raja, Rajani; Priest, Henry D; Bryant, Douglas W; Fox, Samuel E; Givan, Scott A; Wilhelm, Larry J; Naithani, Sushma; Christoffels, Alan; Salama, David Y; Carter, Jade; Lopez Girona, Elena; Zdepski, Anna; Wang, Wenqin; Kerstetter, Randall A; Schwab, Wilfried; Korban, Schuyler S; Davik, Jahn; Monfort, Amparo; Denoyes-Rothan, Beatrice; Arus, Pere; Mittler, Ron; Flinn, Barry; Aharoni, Asaph; Bennetzen, Jeffrey L; Salzberg, Steven L; Dickerman, Allan W; Velasco, Riccardo; Borodovsky, Mark; Veilleux, Richard E; Folta, Kevin M

    2011-02-01

    The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.

  6. A Draft Sequence of the Neandertal Genome

    PubMed Central

    Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante

    2016-01-01

    Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178

  7. The evolution of chloroplast genes and genomes in ferns.

    PubMed

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  8. Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction.

    PubMed

    De Coninck, Arne; De Baets, Bernard; Kourounis, Drosos; Verbosio, Fabio; Schenk, Olaf; Maenhout, Steven; Fostier, Jan

    2016-05-01

    Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.

  9. Genomic imbalances during transformation from follicular lymphoma to diffuse large B-cell lymphoma.

    PubMed

    Berglund, Mattias; Enblad, Gunilla; Thunberg, Ulf; Amini, Rose-Marie; Sundström, Christer; Roos, Göran; Erlanson, Martin; Rosenquist, Richard; Larsson, Catharina; Lagercrantz, Svetlana

    2007-01-01

    Follicular lymphoma is commonly transformed to a more aggressive diffuse large B-cell lymphoma (DLBCL). In order to provide molecular characterization of this histological and clinical transformation, comparative genomic hybridization was applied to 23 follicular lymphoma and 35 transformed DLBCL tumors from a total of 30 patients. The results were also compared with our published findings in de novo DLBCL. Copy number changes were detected in 70% of follicular lymphoma and in 97% of transformed DLBCL. In follicular lymphoma, the most common alterations were +18q21 (33%), +Xq25-26 (28%), +1q31-32 (23%), and -17p (23%), whereas transformed DLBCL most frequently exhibited +Xq25-26 (36%), +12q15 (29%), +7pter-q22 (25%), +8q21 (21%), and -6q16-21(25%). Transformed DLBCL showed significantly more alterations as compared to follicular lymphoma (P=0.0001), and the alterations -6q16-21 and +7pter-q22 were only found in transformed DLBCL but not in follicular lymphoma (P=0.02). Alterations involving +13q22 were significantly less frequent, whereas -4q13-21 was more common in transformed as compared to de novo DLBCL (P=0.01 and P=0.02, respectively). Clinical progression from follicular lymphoma to transformed DLBCL is on the genetic level associated with acquisition of increasing number of genomic copy number changes, with non-random involvement of specific target regions. The findings support diverse genetic background between transformed and de novo DLBCL.

  10. Large Genomic Fragment Deletions and Insertions in Mouse Using CRISPR/Cas9

    PubMed Central

    Satheka, Achim Cchitvsanzwhoh; Togo, Jacques; An, Yao; Humphrey, Mabwi; Ban, Luying; Ji, Yan; Jin, Honghong; Feng, Xuechao; Zheng, Yaowu

    2015-01-01

    ZFN, TALENs and CRISPR/Cas9 system have been used to generate point mutations and large fragment deletions and insertions in genomic modifications. CRISPR/Cas9 system is the most flexible and fast developing technology that has been extensively used to make mutations in all kinds of organisms. However, the most mutations reported up to date are small insertions and deletions. In this report, CRISPR/Cas9 system was used to make large DNA fragment deletions and insertions, including entire Dip2a gene deletion, about 65kb in size, and β-galactosidase (lacZ) reporter gene insertion of larger than 5kb in mouse. About 11.8% (11/93) are positive for 65kb deletion from transfected and diluted ES clones. High targeting efficiencies in ES cells were also achieved with G418 selection, 46.2% (12/26) and 73.1% (19/26) for left and right arms respectively. Targeted large fragment deletion efficiency is about 21.4% of live pups or 6.0% of injected embryos. Targeted insertion of lacZ reporter with NEO cassette showed 27.1% (13/48) of targeting rate by ES cell transfection and 11.1% (2/18) by direct zygote injection. The procedures have bypassed in vitro transcription by directly co-injection of zygotes or co-transfection of embryonic stem cells with circular plasmid DNA. The methods are technically easy, time saving, and cost effective in generating mouse models and will certainly facilitate gene function studies. PMID:25803037

  11. Large-scale contamination of microbial isolate genomes by Illumina PhiX control.

    PubMed

    Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia; Kyrpides, Nikos C; Pati, Amrita

    2015-01-01

    With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world's biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community.

  12. Large-scale contamination of microbial isolate genomes by Illumina PhiX control

    PubMed Central

    2015-01-01

    With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world’s biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community. PMID:26203331

  13. Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes.

    PubMed

    Voss, Stephen R; Kump, D Kevin; Putta, Srikrishna; Pauly, Nathan; Reynolds, Anna; Henry, Rema J; Basa, Saritha; Walker, John A; Smith, Jeramiah J

    2011-08-01

    Amphibian genomes differ greatly in DNA content and chromosome size, morphology, and number. Investigations of this diversity are needed to identify mechanisms that have shaped the evolution of vertebrate genomes. We used comparative mapping to investigate the organization of genes in the Mexican axolotl (Ambystoma mexicanum), a species that presents relatively few chromosomes (n = 14) and a gigantic genome (>20 pg/N). We show extensive conservation of synteny between Ambystoma, chicken, and human, and a positive correlation between the length of conserved segments and genome size. Ambystoma segments are estimated to be four to 51 times longer than homologous human and chicken segments. Strikingly, genes demarking the structures of 28 chicken chromosomes are ordered among linkage groups defining the Ambystoma genome, and we show that these same chromosomal segments are also conserved in a distantly related anuran amphibian (Xenopus tropicalis). Using linkage relationships from the amphibian maps, we predict that three chicken chromosomes originated by fusion, nine to 14 originated by fission, and 12-17 evolved directly from ancestral tetrapod chromosomes. We further show that some ancestral segments were fused prior to the divergence of salamanders and anurans, while others fused independently and randomly as chromosome numbers were reduced in lineages leading to Ambystoma and Xenopus. The maintenance of gene order relationships between chromosomal segments that have greatly expanded and contracted in salamander and chicken genomes, respectively, suggests selection to maintain synteny relationships and/or extremely low rates of chromosomal rearrangement. Overall, the results demonstrate the value of data from diverse, amphibian genomes in studies of vertebrate genome evolution.

  14. Reconstruction of oomycete genome evolution identifies differences in evolutionary trajectories leading to present-day large gene families.

    PubMed

    Seidl, Michael F; Van den Ackerveken, Guido; Govers, Francine; Snel, Berend

    2012-01-01

    The taxonomic class of oomycetes contains numerous pathogens of plants and animals but is related to nonpathogenic diatoms and brown algae. Oomycetes have flexible genomes comprising large gene families that play roles in pathogenicity. The evolutionary processes that shaped the gene content have not yet been studied by applying systematic tree reconciliation of the phylome of these species. We analyzed evolutionary dynamics of ten Stramenopiles. Gene gains, duplications, and losses were inferred by tree reconciliation of 18,459 gene trees constituting the phylome with a highly supported species phylogeny. We reconstructed a strikingly large last common ancestor of the Stramenopiles that contained ~10,000 genes. Throughout evolution, the genomes of pathogenic oomycetes have constantly gained and lost genes, though gene gains through duplications outnumber the losses. The branch leading to the plant pathogenic Phytophthora genus was identified as a major transition point characterized by increased frequency of duplication events that has likely driven the speciation within this genus. Large gene families encoding different classes of enzymes associated with pathogenicity such as glycoside hydrolases are formed by complex and distinct patterns of duplications and losses leading to their expansion in extant oomycetes. This study unveils the large-scale evolutionary dynamics that shaped the genomes of pathogenic oomycetes. By the application of phylogenetic based analyses methods, it provides additional insights that shed light on the complex history of oomycete genome evolution and the emergence of large gene families characteristic for this important class of pathogens.

  15. Assessing the Accuracy of Ancestral Protein Reconstruction Methods

    PubMed Central

    Williams, Paul D; Pollock, David D; Blackburne, Benjamin P; Goldstein, Richard A

    2006-01-01

    The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated. PMID:16789817

  16. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    PubMed

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer.

  17. Evo-Devo: Variations on Ancestral Themes

    PubMed Central

    De Robertis, E.M.

    2008-01-01

    Most animals evolved from a common ancestor, Urbilateria, which already had in place the developmental genetic networks for shaping body plans. Comparative genomics has revealed rather unexpectedly that many of the genes present in bilaterian animal ancestors were lost by individual phyla during evolution. Reconstruction of the archetypal developmental genomic tool-kit present in Urbilateria will help to elucidate the contribution of gene loss and developmental constraints to the evolution of animal body plans. PMID:18243095

  18. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

    PubMed Central

    de Vries, Paul S.; Sabater-Lleal, Maria; Chasman, Daniel I.; Trompet, Stella; Kleber, Marcus E.; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R.; Marioni, Riccardo E.; Weng, Lu-Chen; Grossmann, Vera; Brody, Jennifer A.; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M.; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J.; Rumley, Ann; Mulas, Antonella; de Craen, Anton J. M.; Grotevendt, Anne; Taylor, Kent D.; Delgado, Graciela E.; Kifley, Annette; Lopez, Lorna M.; Berentzen, Tina L.; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C.; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P. M.; Draisma, Harmen H. M.; Lowe, Gordon D.; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J.; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G.; McEvoy, Mark A.; Starr, John M.; Hysi, Pirro G.; Hernandez, Dena G.; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L.; Slagboom, P. Eline; Zeller, Tanja; Psaty, Bruce M.; Uitterlinden, André G.; de Geus, Eco J. C.; Stott, David J.; Binder, Harald; Hofman, Albert; Franco, Oscar H.; Rotter, Jerome I.; Ferrucci, Luigi; Spector, Tim D.; Deary, Ian J.; März, Winfried; Greinacher, Andreas; Wild, Philipp S.; Cucca, Francesco; Boomsma, Dorret I.; Watkins, Hugh; Tang, Weihong; Ridker, Paul M.; Jukema, Jan W.; Scott, Rodney J.; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J.; Smith, Nicholas L.; Strachan, David P.

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10−8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10−8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development. PMID:28107422

  19. Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset

    PubMed Central

    Anagnostou, Paolo; Dominici, Valentina; Battaggia, Cinzia; Pagani, Luca; Vilar, Miguel; Wells, R. Spencer; Pettener, Davide; Sarno, Stefania; Boattini, Alessio; Francalacci, Paolo; Colonna, Vincenza; Vona, Giuseppe; Calò, Carla; Destro Bisol, Giovanni; Tofanelli, Sergio

    2017-01-01

    Human populations are often dichotomized into “isolated” and “open” categories using cultural and/or geographical barriers to gene flow as differential criteria. Although widespread, the use of these alternative categories could obscure further heterogeneity due to inter-population differences in effective size, growth rate, and timing or amount of gene flow. We compared intra and inter-population variation measures combining novel and literature data relative to 87,818 autosomal SNPs in 14 open populations and 10 geographic and/or linguistic European isolates. Patterns of intra-population diversity were found to vary considerably more among isolates, probably due to differential levels of drift and inbreeding. The relatively large effective size estimated for some population isolates challenges the generalized view that they originate from small founding groups. Principal component scores based on measures of intra-population variation of isolated and open populations were found to be distributed along a continuum, with an area of intersection between the two groups. Patterns of inter-population diversity were even closer, as we were able to detect some differences between population groups only for a few multidimensional scaling dimensions. Therefore, different lines of evidence suggest that dichotomizing human populations into open and isolated groups fails to capture the actual relations among their genomic features. PMID:28145502

  20. Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset.

    PubMed

    Anagnostou, Paolo; Dominici, Valentina; Battaggia, Cinzia; Pagani, Luca; Vilar, Miguel; Wells, R Spencer; Pettener, Davide; Sarno, Stefania; Boattini, Alessio; Francalacci, Paolo; Colonna, Vincenza; Vona, Giuseppe; Calò, Carla; Destro Bisol, Giovanni; Tofanelli, Sergio

    2017-02-01

    Human populations are often dichotomized into "isolated" and "open" categories using cultural and/or geographical barriers to gene flow as differential criteria. Although widespread, the use of these alternative categories could obscure further heterogeneity due to inter-population differences in effective size, growth rate, and timing or amount of gene flow. We compared intra and inter-population variation measures combining novel and literature data relative to 87,818 autosomal SNPs in 14 open populations and 10 geographic and/or linguistic European isolates. Patterns of intra-population diversity were found to vary considerably more among isolates, probably due to differential levels of drift and inbreeding. The relatively large effective size estimated for some population isolates challenges the generalized view that they originate from small founding groups. Principal component scores based on measures of intra-population variation of isolated and open populations were found to be distributed along a continuum, with an area of intersection between the two groups. Patterns of inter-population diversity were even closer, as we were able to detect some differences between population groups only for a few multidimensional scaling dimensions. Therefore, different lines of evidence suggest that dichotomizing human populations into open and isolated groups fails to capture the actual relations among their genomic features.

  1. Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.

    PubMed

    Matsumoto, Tomotaka; Akashi, Hiroshi; Yang, Ziheng

    2015-07-01

    Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.

  2. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  3. Physical mapping of a large plant genome using global high-information-content-fingerprinting: the distal region of the wheat ancestor Aegilops tauschii chromosome 3DS.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of wheat genome, Aegilops tauschii, is used as a resource for wheat...

  4. Physical mapping of a large plant genome using global high-information content fingerprinting: a distal region of wheat chromosome 3DS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of wheat. We report the use of the Ae. tauschii, the diploid ancestor of the wheat D genome, for the construction of t...

  5. Matrilocal residence is ancestral in Austronesian societies

    PubMed Central

    Jordan, Fiona M.; Gray, Russell D.; Greenhill, Simon J.; Mace, Ruth

    2009-01-01

    The nature of social life in human prehistory is elusive, yet knowing how kinship systems evolve is critical for understanding population history and cultural diversity. Post-marital residence rules specify sex-specific dispersal and kin association, influencing the pattern of genetic markers across populations. Cultural phylogenetics allows us to practise ‘virtual archaeology’ on these aspects of social life that leave no trace in the archaeological record. Here we show that early Austronesian societies practised matrilocal post-marital residence. Using a Markov-chain Monte Carlo comparative method implemented in a Bayesian phylogenetic framework, we estimated the type of residence at each ancestral node in a sample of Austronesian language trees spanning 135 Pacific societies. Matrilocal residence has been hypothesized for proto-Oceanic society (ca 3500 BP), but we find strong evidence that matrilocality was predominant in earlier Austronesian societies ca 5000–4500 BP, at the root of the language family and its early branches. Our results illuminate the divergent patterns of mtDNA and Y-chromosome markers seen in the Pacific. The analysis of present-day cross-cultural data in this way allows us to directly address cultural evolutionary and life-history processes in prehistory. PMID:19324748

  6. Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines.

    PubMed

    Eppinger, Mark; Baar, Claudia; Linz, Bodo; Raddatz, Günter; Lanz, Christa; Keller, Heike; Morelli, Giovanna; Gressmann, Helga; Achtman, Mark; Schuster, Stephan C

    2006-07-01

    Helicobacter pylori infection of humans is so old that its population genetic structure reflects that of ancient human migrations. A closely related species, Helicobacter acinonychis, is specific for large felines, including cheetahs, lions, and tigers, whereas hosts more closely related to humans harbor more distantly related Helicobacter species. This observation suggests a jump between host species. But who ate whom and when did it happen? In order to resolve this question, we determined the genomic sequence of H. acinonychis strain Sheeba and compared it to genomes from H. pylori. The conserved core genes between the genomes are so similar that the host jump probably occurred within the last 200,000 (range 50,000-400,000) years. However, the Sheeba genome also possesses unique features that indicate the direction of the host jump, namely from early humans to cats. Sheeba possesses an unusually large number of highly fragmented genes, many encoding outer membrane proteins, which may have been destroyed in order to bypass deleterious responses from the feline host immune system. In addition, the few Sheeba-specific genes that were found include a cluster of genes encoding sialylation of the bacterial cell surface carbohydrates, which were imported by horizontal genetic exchange and might also help to evade host immune defenses. These results provide a genomic basis for elucidating molecular events that allow bacteria to adapt to novel animal hosts.

  7. Testing the large genome constraint hypothesis: plant traits, habitat and climate seasonality in Liliaceae.

    PubMed

    Carta, Angelino; Peruzzi, Lorenzo

    2016-04-01

    The factors driving genome size evolution in Liliaceae were examined. In particular, we investigated whether species with larger genomes are confined to less stressful environments with a longer vegetative season. We tested our hypotheses by correlating the genome size with other plant traits and environmental variables. To determine the adaptive nature of the genome size, we also compared the performances of Brownian motion (BM) processes with those inferred by Ornstein-Uhlenbeck (OU) models of trait evolution. A positive correlation of genome size with plant size, mean temperature and habitat moisture and a negative correlation with altitude and precipitation seasonality were found. Models of trait evolution revealed a deviation from a drift process or BM. Instead, changes in genome size were significantly associated with precipitation regimes according to an OU process. Specifically, the evolutionary optima towards which the genome size evolves were higher for humid climates and lower for drier ones. Taken together, our results indicate that the genome size increase in Liliaceae is constrained by climate seasonality.

  8. Large-scale computational and statistical analyses of high transcription potentialities in 32 prokaryotic genomes

    PubMed Central

    Sinoquet, Christine; Demey, Sylvain; Braun, Frédérique

    2008-01-01

    This article compares 32 bacterial genomes with respect to their high transcription potentialities. The σ70 promoter has been widely studied for Escherichia coli model and a consensus is known. Since transcriptional regulations are known to compensate for promoter weakness (i.e. when the promoter similarity with regard to the consensus is rather low), predicting functional promoters is a hard task. Instead, the research work presented here comes within the scope of investigating potentially high ORF expression, in relation with three criteria: (i) high similarity to the σ70 consensus (namely, the consensus variant appropriate for each genome), (ii) transcription strength reinforcement through a supplementary binding site—the upstream promoter (UP) element—and (iii) enhancement through an optimal Shine-Dalgarno (SD) sequence. We show that in the AT-rich Firmicutes’ genomes, frequencies of potentially strong σ70-like promoters are exceptionally high. Besides, though they contain a low number of strong promoters (SPs), some genomes may show a high proportion of promoters harbouring an UP element. Putative SPs of lesser quality are more frequently associated with an UP element than putative strong promoters of better quality. A meaningful difference is statistically ascertained when comparing bacterial genomes with similarly AT-rich genomes generated at random; the difference is the highest for Firmicutes. Comparing some Firmicutes genomes with similarly AT-rich Proteobacteria genomes, we confirm the Firmicutes specificity. We show that this specificity is neither explained by AT-bias nor genome size bias; neither does it originate in the abundance of optimal SD sequences, a typical and significant feature of Firmicutes more thoroughly analysed in our study. PMID:18440978

  9. Leveraging Large-Scale Cancer Genomics Datasets for Germline Discovery - TCGA

    Cancer.gov

    The session will review how data types have changed over time, focusing on how next-generation sequencing is being employed to yield more precise information about the underlying genomic variation that influences tumor etiology and biology.

  10. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    SciTech Connect

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely-related bacterial genomes.

  11. Continuing Evolution of Burkholderia mallei Through Genome Reduction and Large-Scale Rearrangements

    DTIC Science & Technology

    2010-01-22

    in Materials and Methods. b NRPS, nonribosomal peptide synthase ; PKS, polyketide synthase ; RND, resistance nodulation-division like pump. Losada et al...genomics, genome erosion, bacterial virulence. ª The Author(s) 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology...creativecommons.org/licenses/by-nc/ 2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original

  12. Large-scale evaluation of experimentally determined DNA G+C contents with whole genome sequences of prokaryotes.

    PubMed

    Kim, Mincheol; Park, Sang-Cheol; Baek, Inwoo; Chun, Jongsik

    2015-03-01

    Historically, DNA G+C content has played a critical role in the description of bacterial and archaeal species. Despite its importance in prokaryote taxonomy, its accuracy has been questioned due to methodological heterogeneity and measurement errors of conventional methods. Here we investigated the extent of accuracy of experimentally determined DNA G+C contents by comparing the reference values calculated from whole genome sequences. The large-scale comparison revealed that G+C contents determined by high-performance liquid chromatography and buoyant density centrifugation methods were more similar to the genome-derived reference values than those generated by thermal denaturation method. However, there was a substantial degree of discrepancy in DNA G+C contents between values obtained by conventional methods and genome-derived reference values. The majority of the differences between them fell out of the acceptable range (i.e. 1 mol% G+C content difference) for species delimitation of prokaryotes. In contrast, when average nucleotide identity (ANI) was correlated to G+C difference among genomes, most G+C difference was confined to less than 1% within species. Therefore, erroneous conventional methods are not meaningful in the description of bacterial and archaeal species. For taxonomic purposes, DNA G+C content should be determined by calculating directly from high-quality genome sequences with at least 16× or higher sequencing depth of coverage.

  13. Enhancing the pharmaceutical properties of protein drugs by ancestral sequence reconstruction

    PubMed Central

    Zakas, Philip M.; Brown, Harrison C.; Knight, Kristopher; Meeks, Shannon L.; Spencer, H. Trent; Gaucher, Eric A.; Doering, Christopher B.

    2016-01-01

    Optimization of a protein’s pharmaceutical properties is usually carried out by rational design and/or directed evolution. Here we test an alternative approach based on ancestral sequence reconstruction. Using available genomic sequence data on coagulation factor VIII and predictive models of molecular evolution, we engineer protein variants with improved activity, stability. biosynthesis potential, and reduced inhibition by clinical anti-drug antibodies. In principle, this approach can be applied to any protein drug based on a conserved gene sequence. PMID:27669166

  14. A large maize (Zea Mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    SNP genotyping arrays have been useful for many applications that require a large number of molecular markers such as high-density genetic mapping, genome-wide association studies (GWAS), and genomic selection for accelerated breeding. We report the establishment of a large SNP array for maize and i...

  15. A large set of 26 new reference transcriptomes dedicated to comparative population genomics in crops and wild relatives.

    PubMed

    Sarah, Gautier; Homa, Felix; Pointet, Stéphanie; Contreras, Sandy; Sabot, François; Nabholz, Benoit; Santoni, Sylvain; Sauné, Laure; Ardisson, Morgane; Chantret, Nathalie; Sauvage, Christopher; Tregear, James; Jourda, Cyril; Pot, David; Vigouroux, Yves; Chair, Hana; Scarcelli, Nora; Billot, Claire; Yahiaoui, Nabila; Bacilieri, Roberto; Khadari, Bouchaib; Boccara, Michel; Barnaud, Adéline; Péros, Jean-Pierre; Labouisse, Jean-Pierre; Pham, Jean-Louis; David, Jacques; Glémin, Sylvain; Ruiz, Manuel

    2016-08-04

    We produced a unique large data set of reference transcriptomes to obtain new knowledge about the evolution of plant genomes and crop domestication. For this purpose, we validated a RNA-Seq data assembly protocol to perform comparative population genomics. For the validation, we assessed and compared the quality of de novo Illumina short-read assemblies using data from two crops for which an annotated reference genome was available, namely grapevine and sorghum. We used the same protocol for the release of 26 new transcriptomes of crop plants and wild relatives, including still understudied crops such as yam, pearl millet and fonio. The species list has a wide taxonomic representation with the inclusion of 15 monocots and 11 eudicots. All contigs were annotated using BLAST, prot4EST and Blast2GO. A strong originality of the data set is that each crop is associated with close relative species, which will permit whole-genome comparative evolutionary studies between crops and their wild-related species. This large resource will thus serve research communities working on both crops and model organisms. All the data are available at http://arcad-bioinformatics.southgreen.fr/.

  16. The Korarchaeota: Archaeal orphans representing an ancestral lineage of life

    SciTech Connect

    Elkins, James G.; Kunin, Victor; Anderson, Iain; Barry, Kerrie; Goltsman, Eugene; Lapidus, Alla; Hedlund, Brian; Hugenholtz, Phil; Kyrpides, Nikos; Graham, David; Keller, Martin; Wanner, Gerhard; Richardson, Paul; Stetter, Karl O.

    2007-05-01

    Based on conserved cellular properties, all life on Earth can be grouped into different phyla which belong to the primary domains Bacteria, Archaea, and Eukarya. However, tracing back their evolutionary relationships has been impeded by horizontal gene transfer and gene loss. Within the Archaea, the kingdoms Crenarchaeota and Euryarchaeota exhibit a profound divergence. In order to elucidate the evolution of these two major kingdoms, representatives of more deeply diverged lineages would be required. Based on their environmental small subunit ribosomal (ss RNA) sequences, the Korarchaeota had been originally suggested to have an ancestral relationship to all known Archaea although this assessment has been refuted. Here we describe the cultivation and initial characterization of the first member of the Korarchaeota, highly unusual, ultrathin filamentous cells about 0.16 {micro}m in diameter. A complete genome sequence obtained from enrichment cultures revealed an unprecedented combination of signature genes which were thought to be characteristic of either the Crenarchaeota, Euryarchaeota, or Eukarya. Cell division appears to be mediated through a FtsZ-dependent mechanism which is highly conserved throughout the Bacteria and Euryarchaeota. An rpb8 subunit of the DNA-dependent RNA polymerase was identified which is absent from other Archaea and has been described as a eukaryotic signature gene. In addition, the representative organism possesses a ribosome structure typical for members of the Crenarchaeota. Based on its gene complement, this lineage likely diverged near the separation of the two major kingdoms of Archaea. Further investigations of these unique organisms may shed additional light onto the evolution of extant life.

  17. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  18. Evidence for an Ancestral Association of Human Coronavirus 229E with Bats

    PubMed Central

    Corman, Victor Max; Baldwin, Heather J.; Tateno, Adriana Fumie; Zerbinati, Rodrigo Melim; Annan, Augustina; Owusu, Michael; Nkrumah, Evans Ewald; Maganga, Gael Darren; Oppong, Samuel; Adu-Sarkodie, Yaw; Vallo, Peter; da Silva Filho, Luiz Vicente Ribeiro Ferreira; Leroy, Eric M.; Thiel, Volker; van der Hoek, Lia; Poon, Leo L. M.; Tschapka, Marco

    2015-01-01

    ABSTRACT We previously showed that close relatives of human coronavirus 229E (HCoV-229E) exist in African bats. The small sample and limited genomic characterizations have prevented further analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana for HCoV-229E-related viruses by reverse transcription-PCR (RT-PCR). Only hipposiderid bats tested positive. To compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and diagnostic specimens sampled globally over 10 years. Bat viruses were 5- and 6-fold more diversified than HCoV-229E in the RNA-dependent RNA polymerase (RdRp) and spike genes. In phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with animal viruses. Bat viruses formed three large clades in close and more distant sister relationships. A recently described 229E-related alpaca virus occupied an intermediate phylogenetic position between bat and human viruses. According to taxonomic criteria, human, alpaca, and bat viruses form a single CoV species showing evidence for multiple recombination events. HCoV-229E and the alpaca virus showed a major deletion in the spike S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3′ end. ORF8 also existed in the 229E-related alpaca virus. Reanalysis of HCoV-229E sequences showed a conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as intermediate hosts preceding the establishment of HCoV-229E. IMPORTANCE The ancestral origins of major human coronaviruses (HCoVs) likely involve bat hosts. Here, we provide conclusive genetic evidence for an evolutionary origin of the common cold virus HCoV-229E in

  19. Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach.

    PubMed

    Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi; Clayton, Ellen Wright; Kantarcioglu, Murat; Malin, Bradley

    2017-02-02

    Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.

  20. Comparative genomics uncovers large tandem chromosomal duplications in Mycobacterium bovis BCG Pasteur.

    PubMed

    Brosch, R; Gordon, S V; Buchrieser, C; Pym, A S; Garnier, T; Cole, S T

    2000-06-30

    On direct comparison of minimal sets of ordered clones from bacterial artificial chromosome (BAC) libraries representing the complete genomes of Mycobacterium tuberculosis H37Rv and the vaccine strain, Mycobacterium bovis BCG Pasteur, two major rearrangements were identified in the genome of M. bovis BCG Pasteur. These were shown to correspond to two tandem duplications, DU1 and DU2, of 29 668 bp and 36 161 bp, respectively. While DU1 resulted from a single duplication event, DU2 apparently arose from duplication of a 100 kb genomic segment that subsequently incurred an internal deletion of 64 kb. Several lines of evidence suggest that DU2 may continue to expand, since two copies were detected in a subpopulation of BCG Pasteur cells. BCG strains harbouring DU1 and DU2 are diploid for at least 58 genes and contain two copies of oriC, the chromosomal origin of replication. These findings indicate that these genomic regions of the BCG genome are still dynamic. Although the role of DU1 and DU2 in the attenuation and/or altered immunogenicity of BCG is yet unknown, knowledge of their existence will facilitate quality control of BCG vaccine lots and may help in monitoring the efficacy of the world's most widely used vaccine.

  1. Genomic shotgun array: a procedure linking large-scale DNA sequencing with regional transcript mapping.

    PubMed

    Li, Ling-Hui; Li, Jian-Chiuan; Lin, Yung-Feng; Lin, Chung-Yen; Chen, Chung-Yung; Tsai, Shih-Feng

    2004-02-11

    To facilitate transcript mapping and to investigate alterations in genomic structure and gene expression in a defined genomic target, we developed a novel microarray-based method to detect transcriptional activity of the human chromosome 4q22-24 region. Loss of heterozygosity of human 4q22-24 is frequently observed in hepatocellular carcinoma (HCC). One hundred and eighteen well-characterized genes have been identified from this region. We took previously sequenced shotgun subclones as templates to amplify overlapping sequences for the genomic segment and constructed a chromosome-region-specific microarray. Using genomic DNA fragments as probes, we detected transcriptional activity from within this region among five different tissues. The hybridization results indicate that there are new transcripts that have not yet been identified by other methods. The existence of new transcripts encoded by genes in this region was confirmed by PCR cloning or cDNA library screening. The procedure reported here allows coupling of shotgun sequencing with transcript mapping and, potentially, detailed analysis of gene expression and chromosomal copy of the genomic sequence for the putative HCC tumor suppressor gene(s) in the 4q candidate region.

  2. An ancestral miR-1304 allele present in Neanderthals regulates genes involved in enamel formation and could explain dental differences with modern humans.

    PubMed

    Lopez-Valenzuela, Maria; Ramírez, Oscar; Rosas, Antonio; García-Vargas, Samuel; de la Rasilla, Marco; Lalueza-Fox, Carles; Espinosa-Parrilla, Yolanda

    2012-07-01

    Genetic changes in regulatory elements are likely to result in phenotypic effects that might explain population-specific as well as species-specific traits. MicroRNAs (miRNAs) are posttranscriptional repressors involved in the control of almost every biological process. These small noncoding RNAs are present in various phylogenetic groups, and a large number of them remain highly conserved at the sequence level. MicroRNA-mediated regulation depends on perfect matching between the seven nucleotides of its seed region and the target sequence usually located at the 3' untranslated region of the regulated gene. Hence, even single changes in seed regions are predicted to be deleterious as they may affect miRNA target specificity. In accordance to this, purifying selection has strongly acted on these regions. Comparison between the genomes of present-day humans from various populations, Neanderthal, and other nonhuman primates showed an miRNA, miR-1304, that carries a polymorphism on its seed region. The ancestral allele is found in Neanderthal, nonhuman primates, at low frequency (~5%) in modern Asian populations and rarely in Africans. Using miRNA target site prediction algorithms, we found that the derived allele increases the number of putative target genes for the derived miRNA more than ten-fold, indicating an important functional evolution for miR-1304. Analysis of the predicted targets for derived miR-1304 indicates an association with behavior and nervous system development and function. Two of the predicted target genes for the ancestral miR-1304 allele are important genes for teeth formation, enamelin, and amelotin. MicroRNA overexpression experiments using a luciferase-based assay showed that the ancestral version of miR-1304 reduces the enamelin- and amelotin-associated reporter gene expression by 50%, whereas the derived miR-1304 does not have any effect. Deletion of the corresponding target sites for miR-1304 in these dental genes avoided their repression

  3. Large-scale genomics unveil polygenic architecture of human cortical surface area.

    PubMed

    Chen, Chi-Hua; Peng, Qian; Schork, Andrew J; Lo, Min-Tzu; Fan, Chun-Chieh; Wang, Yunpeng; Desikan, Rahul S; Bettella, Francesco; Hagler, Donald J; Westlye, Lars T; Kremen, William S; Jernigan, Terry L; Le Hellard, Stephanie; Steen, Vidar M; Espeseth, Thomas; Huentelman, Matt; Håberg, Asta K; Agartz, Ingrid; Djurovic, Srdjan; Andreassen, Ole A; Schork, Nicholas; Dale, Anders M

    2015-07-20

    Little is known about how genetic variation contributes to neuroanatomical variability, and whether particular genomic regions comprising genes or evolutionarily conserved elements are enriched for effects that influence brain morphology. Here, we examine brain imaging and single-nucleotide polymorphisms (SNPs) data from ∼2,700 individuals. We show that a substantial proportion of variation in cortical surface area is explained by additive effects of SNPs dispersed throughout the genome, with a larger heritable effect for visual and auditory sensory and insular cortices (h(2)∼0.45). Genome-wide SNPs collectively account for, on average, about half of twin heritability across cortical regions (N=466 twins). We find enriched genetic effects in or near genes. We also observe that SNPs in evolutionarily more conserved regions contributed significantly to the heritability of cortical surface area, particularly, for medial and temporal cortical regions. SNPs in less conserved regions contributed more to occipital and dorsolateral prefrontal cortices.

  4. Characterisation of monotreme caseins reveals lineage-specific expansion of an ancestral casein locus in mammals.

    PubMed

    Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R

    2009-01-01

    Using a milk-cell cDNA sequencing approach we characterised milk-protein sequences from two monotreme species, platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus) and found a full set of caseins and casein variants. The genomic organisation of the platypus casein locus is compared with other mammalian genomes, including the marsupial opossum and several eutherians. Physical linkage of casein genes has been seen in the casein loci of all mammalian genomes examined and we confirm that this is also observed in platypus. However, we show that a recent duplication of beta-casein occurred in the monotreme lineage, as opposed to more ancient duplications of alpha-casein in the eutherian lineage, while marsupials possess only single copies of alpha- and beta-caseins. Despite this variability, the close proximity of the main alpha- and beta-casein genes in an inverted tail-tail orientation and the relative orientation of the more distant kappa-casein genes are similar in all mammalian genome sequences so far available. Overall, the conservation of the genomic organisation of the caseins indicates the early, pre-monotreme development of the fundamental role of caseins during lactation. In contrast, the lineage-specific gene duplications that have occurred within the casein locus of monotremes and eutherians but not marsupials, which may have lost part of the ancestral casein locus, emphasises the independent selection on milk provision strategies to the young, most likely linked to different developmental strategies. The monotremes therefore provide insight into the ancestral drivers for lactation and how these have adapted in different lineages.

  5. Large Scale Sequencing of Dothideomycetes Provides Insights into Genome Evolution and Adaptation

    SciTech Connect

    Haridas, Sajeet; Crous, Pedro; Binder, Manfred; Spatafora, Joseph; Grigoriev, Igor

    2015-03-16

    Dothideomycetes is the largest and most diverse class of ascomycete fungi with 23 orders 110 families, 1300 genera and over 19,000 known species. We present comparative analysis of 70 Dothideomycete genomes including over 50 that we sequenced and are as yet unpublished. This extensive sampling has almost quadrupled the previous study of 18 species and uncovered a 10 fold range of genome sizes. We were able to clarify the phylogenetic positions of several species whose origins were unclear in previous morphological and sequence comparison studies. We analyzed selected gene families including proteases, transporters and small secreted proteins and show that major differences in gene content is influenced by speciation.

  6. Twenty years of artificial directional selection have shaped the genome of the Italian Large White pig breed.

    PubMed

    Schiavo, G; Galimberti, G; Calò, D G; Samorè, A B; Bertolini, F; Russo, V; Gallo, M; Buttazzoni, L; Fontanesi, L

    2016-04-01

    In this study, we investigated at the genome-wide level if 20 years of artificial directional selection based on boar genetic evaluation obtained with a classical BLUP animal model shaped the genome of the Italian Large White pig breed. The most influential boars of this breed (n = 192), born from 1992 (the beginning of the selection program of this breed) to 2012, with an estimated breeding value reliability of >0.85, were genotyped with the Illumina Porcine SNP60 BeadChip. After grouping the boars in eight classes according to their year of birth, filtered single nucleotide polymorphisms (SNPs) were used to evaluate the effects of time on genotype frequency changes using multinomial logistic regression models. Of these markers, 493 had a PBonferroni  < 0.10. However, there was an increasing number of SNPs with a decreasing level of allele frequency changes over time, representing a continuous profile across the genome. The largest proportion of the 493 SNPs was on porcine chromosome (SSC) 7, SSC2, SSC8 and SSC18 for a total of 204 haploblocks. Functional annotations of genomic regions, including the 493 shifted SNPs, reported a few Gene Ontology terms that might underly the biological processes that contributed to increase performances of the pigs over the 20 years of the selection program. The obtained results indicated that the genome of the Italian Large White pigs was shaped by a directional selection program derived by the application of methodologies assuming the infinitesimal model that captured a continuous trend of allele frequency changes in the boar population.

  7. Discovery of novel phosphonate natural products and their biosynthetic pathways by large-scale genome mining

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome mining has revolutionized the field of natural products, providing hope that new antibiotics can be discovered in time before all remainders are rendered useless against multidrug resistant pathogens. While this approach has been successful in academic settings focused on small collections or...

  8. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms.

    PubMed

    Lowery, Robert K; Uribe, Gabriel; Jimenez, Eric B; Weiss, Mark A; Herrera, Kristian J; Regueiro, Maria; Herrera, Rene J

    2013-11-01

    Analyses of the genetic relationships among modern humans, Neanderthals and Denisovans have suggested that 1-4% of the non-Sub-Saharan African gene pool may be Neanderthal derived, while 6-8% of the Melanesian gene pool may be the product of admixture between the Denisovans and the direct ancestors of Melanesians. In the present study, we analyzed single nucleotide polymorphism (SNP) diversity among a worldwide collection of contemporary human populations with respect to the genetic constitution of these two archaic hominins and Pan troglodytes (chimpanzee). We partitioned SNPs into subsets, including those that are derived in both archaic lineages, those that are ancestral in both archaic lineages and those that are only derived in one archaic lineage. By doing this, we have conducted separate examinations of subsets of mutations with higher probabilities of divergent phylogenetic origins. While previous investigations have excluded SNPs from common ancestors in principal component analyses, we included common ancestral SNPs in our analyses to visualize the relative placement of the Neanderthal and Denisova among human populations. To assess the genetic similarities among the various hominin lineages, we performed genetic structure analyses to provide a comparison of genetic patterns found within contemporary human genomes that may have archaic or common ancestral roots. Our results indicate that 3.6% of the Neanderthal genome is shared with roughly 65.4% of the average European gene pool, which clinally diminishes with distance from Europe. Our results suggest that Neanderthal genetic associations with contemporary non-Sub-Saharan African populations, as well as the genetic affinities observed between Denisovans and Melanesians most likely result from the retention of ancient mutations in these populations.

  9. Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species.

    PubMed

    Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G; O'Brien, Stephen J; Johnson, Warren E

    2006-02-01

    Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondrial genome, is one of the largest to be reported in eukaryotes. The Panthera genus numt differs from the numt previously described in the Felis genus in: (1) chromosomal location (F2-telomeric region vs. D2-centromeric region), (2) gene make up (from the ND5 to the ATP8 vs. from the CR to the COII), (3) size (12.5 vs. 7.9 kb), and (4) structure (single monomer vs. tandemly repeated in Felis). These distinctions indicate that the origin of this large numt fragment in the nuclear genome of the Panthera species is an independent insertion from that of the domestic cat lineage, which has been further supported by phylogenetic analyses. The tiger cymtDNA shared around 90% sequence identity with the homologous numt sequence, suggesting an origin for the Panthera numt at around 3.5 million years ago, prior to the radiation of the five extant Panthera species.

  10. Large-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock.

    PubMed

    Stephen, Stuart; Pheasant, Michael; Makunin, Igor V; Mattick, John S

    2008-02-01

    Mammalian genomes contain millions of highly conserved noncoding sequences, many of which are regulatory. The most extreme examples are the 481 ultraconserved elements (UCEs) that are identical over at least 200 bp in human, mouse, and rat and show 96% identity with chicken, which diverged approximately 310 MYA. If the substitution rate in UCEs remained constant, these elements should also be present with a high level of identity in fish (approximately 450 Myr), but this is not the case, suggesting that many appeared in the amniotes or tetrapods or that the molecular clock has slowed down in these lineages, or both. Taking advantage of the availability of multiple genomes, we identified 13,736 UCEs in the human genome that are identical over at least 100 bp in at least 3 of 5 placental mammals, including 2,189 sequences over at least 200 bp, thereby greatly expanding the repertoire of known UCEs, and investigated the evolution of these sequences in opossum, chicken, frog, and fish. We conclude that there was a massive genome-wide acquisition and expansion of UCEs during tetrapod and then amniote evolution, accompanied by a slowdown of the molecular clock, particularly in the amniotes, a process consistent with their functional exaptation in these lineages. The majority of tetrapod-specific UCEs are noncoding and associated with genes involved in regulation of transcription and development. In contrast, fish genomes contain relatively few UCEs, the majority of which are common to all bony vertebrates. These elements are different from other conserved noncoding elements and appear to be important regulatory innovations that became fixed following the emergence of vertebrates from the sea to the land.

  11. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    SciTech Connect

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  12. Ancestral-derived effects on the mutational landscape of laryngeal cancer

    PubMed Central

    Ramakodi, Meganathan P.; Kulathinal, Rob J.; Chung, Yujin; Serebriiskii, Ilya; Liu, Jeffrey C.; Ragin, Camille C.

    2016-01-01

    Laryngeal cancer disproportionately affects more African-Americans than European-Americans. Here, we analyze the genome-wide somatic point mutations from the tumors of 13 African-Americans and 57 European-Americans from TCGA to differentiate between environmental and ancestrally-inherited factors. The mean number of mutations were different between African-Americans (151.31) and European-Americans (277.63). Other differences in the overall mutational landscape between African-American and European-American were also found. The frequency of C>A, and C>G were significantly different between the two populations (p-value<0.05). Context nucleotide signatures for some mutation types significantly differ between these two populations. Thus, the context nucleotide signatures along with other factors could be related to the observed mutational landscapes differences between two races. Finally, we show that mutated genes associated with these mutational differences differ between the two populations. Thus, at the molecular level, race appears to be a factor in the progression of laryngeal cancer with ancestral genomic signatures best explaining these differences. PMID:26721311

  13. GRAIN: a computer program to calculate ancestral and partial inbreeding coefficients using a gene dropping approach.

    PubMed

    Baumung, R; Farkas, J; Boichard, D; Mészáros, G; Sölkner, J; Curik, I

    2015-04-01

    GRain is freely available software intended to enable and promote testing of hypotheses with respect to purging and heterogeneity of inbreeding depression. The program is based on a stochastic approach, the gene dropping method, and calculates various coefficients from large and complex pedigrees. GRain calculates, together with the 'classical' inbreeding coefficient, ancestral inbreeding coefficients proposed by Ballou, (1997) J. Hered., 88, 169 and Kalinowski et al., (2000) Conserv. Biol., 14, 1375 as well as an ancestral history coefficient (AHC ), defined here for the first time. AHC is defined as the number that tells how many times during pedigree segregation (gene dropping) a randomly taken allele has been in IBD status. Furthermore, GRain enables testing of heterogeneity and/or purging of inbreeding depression with respect to different founders/ancestors by calculating partial coefficients for all previously obtained coefficients.

  14. Reconstructed Ancestral Myo-Inositol-3-Phosphate Synthases Indicate That Ancestors of the Thermococcales and Thermotoga Species Were More Thermophilic than Their Descendants

    PubMed Central

    Butzin, Nicholas C.; Lapierre, Pascal; Green, Anna G.; Swithers, Kristen S.; Gogarten, J. Peter; Noll, Kenneth M.

    2013-01-01

    The bacterial genomes of Thermotoga species show evidence of significant interdomain horizontal gene transfer from the Archaea. Members of this genus acquired many genes from the Thermococcales, which grow at higher temperatures than Thermotoga species. In order to study the functional history of an interdomain horizontally acquired gene we used ancestral sequence reconstruction to examine the thermal characteristics of reconstructed ancestral proteins of the Thermotoga lineage and its archaeal donors. Several ancestral sequence reconstruction methods were used to determine the possible sequences of the ancestral Thermotoga and Archaea myo-inositol-3-phosphate synthase (MIPS). These sequences were predicted to be more thermostable than the extant proteins using an established sequence composition method. We verified these computational predictions by measuring the activities and thermostabilities of purified proteins from the Thermotoga and the Thermococcales species, and eight ancestral reconstructed proteins. We found that the ancestral proteins from both the archaeal donor and the Thermotoga most recent common ancestor recipient were more thermostable than their descendants. We show that there is a correlation between the thermostability of MIPS protein and the optimal growth temperature (OGT) of its host, which suggests that the OGT of the ancestors of these species of Archaea and the Thermotoga grew at higher OGTs than their descendants. PMID:24391933

  15. Reconstructed ancestral Myo-inositol-3-phosphate synthases indicate that ancestors of the Thermococcales and Thermotoga species were more thermophilic than their descendants.

    PubMed

    Butzin, Nicholas C; Lapierre, Pascal; Green, Anna G; Swithers, Kristen S; Gogarten, J Peter; Noll, Kenneth M

    2013-01-01

    The bacterial genomes of Thermotoga species show evidence of significant interdomain horizontal gene transfer from the Archaea. Members of this genus acquired many genes from the Thermococcales, which grow at higher temperatures than Thermotoga species. In order to study the functional history of an interdomain horizontally acquired gene we used ancestral sequence reconstruction to examine the thermal characteristics of reconstructed ancestral proteins of the Thermotoga lineage and its archaeal donors. Several ancestral sequence reconstruction methods were used to determine the possible sequences of the ancestral Thermotoga and Archaea myo-inositol-3-phosphate synthase (MIPS). These sequences were predicted to be more thermostable than the extant proteins using an established sequence composition method. We verified these computational predictions by measuring the activities and thermostabilities of purified proteins from the Thermotoga and the Thermococcales species, and eight ancestral reconstructed proteins. We found that the ancestral proteins from both the archaeal donor and the Thermotoga most recent common ancestor recipient were more thermostable than their descendants. We show that there is a correlation between the thermostability of MIPS protein and the optimal growth temperature (OGT) of its host, which suggests that the OGT of the ancestors of these species of Archaea and the Thermotoga grew at higher OGTs than their descendants.

  16. Retroviral envelope syncytin capture in an ancestrally diverged mammalian clade for placentation in the primitive Afrotherian tenrecs

    PubMed Central

    Cornelis, Guillaume; Vernochet, Cécile; Malicorne, Sébastien; Souquere, Sylvie; Tzika, Athanasia C.; Goodman, Steven M.; Catzeflis, François; Robinson, Terence J.; Milinkovitch, Michel C.; Pierron, Gérard; Heidmann, Odile; Dupressoir, Anne; Heidmann, Thierry

    2014-01-01

    Syncytins are fusogenic envelope (env) genes of retroviral origin that have been captured for a function in placentation. Syncytins have been identified in Euarchontoglires (primates, rodents, Leporidae) and Laurasiatheria (Carnivora, ruminants) placental mammals. Here, we searched for similar genes in species that retained characteristic features of primitive mammals, namely the Malagasy and mainland African Tenrecidae. They belong to the superorder Afrotheria, an early lineage that diverged from Euarchotonglires and Laurasiatheria 100 Mya, during the Cretaceous terrestrial revolution. An in silico search for env genes with full coding capacity within a Tenrecidae genome identified several candidates, with one displaying placenta-specific expression as revealed by RT-PCR analysis of a large panel of Setifer setosus tissues. Cloning of this endogenous retroviral env gene demonstrated fusogenicity in an ex vivo cell–cell fusion assay on a panel of mammalian cells. Refined analysis of placental architecture and ultrastructure combined with in situ hybridization demonstrated specific expression of the gene in multinucleate cellular masses and layers at the materno–fetal interface, consistent with a role in syncytium formation. This gene, which we named “syncytin-Ten1,” is conserved among Tenrecidae, with evidence of purifying selection and conservation of fusogenic activity. To our knowledge, it is the first syncytin identified to date within the ancestrally diverged Afrotheria superorder. PMID:25267646

  17. Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots

    PubMed Central

    Wang, Yupeng; Ficklin, Stephen P.; Wang, Xiyin; Feltus, F. Alex; Paterson, Andrew H.

    2016-01-01

    Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots. PMID:27195960

  18. Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations.

    PubMed Central

    Service, S K; Lang, D W; Freimer, N B; Sandkuijl, L A

    1999-01-01

    Linkage disequilibrium (LD) mapping may be a powerful means for genome screening to identify susceptibility loci for common diseases. A new statistical approach for detection of LD around a disease gene is presented here. This method compares the distribution of haplotypes in affected individuals versus that expected for individuals descended from a common ancestor who carried a mutation of the disease gene. Simulations demonstrate that this method, which we term "ancestral haplotype reconstruction" (AHR), should be powerful for genome screening of phenotypes characterized by a high degree of etiologic heterogeneity, even with currently available marker maps. AHR is best suited to application in isolated populations where affected individuals are relatively recently descended (< approximately 25 generations) from a common disease mutation-bearing founder. PMID:10330361

  19. Hierarchically aligning 10 legume genomes establishes a family-level genomics platform.

    PubMed

    Wang, Jinpeng; Sun, Pengchuan; Li, Yuxian; Liu, Yinzhe; Yu, Jigao; Ma, Xuelian; Sun, Sangrong; Yang, Nanshan; Xia, Ruiyan; Lei, Tianyu; Liu, Xiaojian; Jiao, Beibei; Xing, Yue; Ge, Weina; Wang, Li; Wang, Zhenyi; Song, Xiaoming; Yuan, Min; Guo, Di; Zhang, Lan; Zhang, Jiaqi; Jin, Dianchuan; Chen, Wei; Pan, Yuxin; Liu, Tao; Jin, Ling; Sun, Jinshuai; Yu, Jiaxiang; Cheng, Rui; Duan, Xueqian; Shen, Shaoqi; Qin, Jun; Zhang, Mengchen; Paterson, Andrew H; Wang, Xiyin

    2017-03-21

    Mainly due to their economic importance, genomes of 10 legumes, including soybean, wild peanuts, barrel medic, etc, have been sequenced. However, a family-level comparative genomics analysis has been unavailable. With grape and selected legume genomes as outgroups, we managed to perform a hierarchical and event-related alignment of these genomes and deconvoluted layers of homologous regions produced by ancestral polyploidizations or speciations. Consequently, we illustrated genomic fractionation characterized by wide-spread gene losses after the polyploidizations. Notably, high similarity in gene retention between recently duplicated chromosomes in soybean supported a likely autopolypoidy nature of its tetraploid ancestor. Moreover, though mostly gene losses were nearly random, largely but not fully described by geometric distribution, we showed that polyploidization contributed divergently to copy number variation of important gene families. Besides, we showed significantly divergent evolutionary levels among legumes, and by performing Ks correction, re-dated major evolutionary events during their expansion. The present effort laid a solid foundation further genomics exploration in the legume research community and beyond. We described only a tiny fraction of legume comparative genomics analysis that we performed, and more information was stored in the newly constructed Legume Comparative Genomics Research Platform (www.legumegrp.org).

  20. Widespread genomic signatures of natural selection in hominid evolution.

    PubMed

    McVicker, Graham; Gordon, David; Davis, Colleen; Green, Phil

    2009-05-01

    Selection acting on genomic functional elements can be detected by its indirect effects on population diversity at linked neutral sites. To illuminate the selective forces that shaped hominid evolution, we analyzed the genomic distributions of human polymorphisms and sequence differences among five primate species relative to the locations of conserved sequence features. Neutral sequence diversity in human and ancestral hominid populations is substantially reduced near such features, resulting in a surprisingly large genome average diversity reduction due to selection of 19-26% on the autosomes and 12-40% on the X chromosome. The overall trends are broadly consistent with "background selection" or hitchhiking in ancestral populations acting to remove deleterious variants. Average selection is much stronger on exonic (both protein-coding and untranslated) conserved features than non-exonic features. Long term selection, rather than complex speciation scenarios, explains the large intragenomic variation in human/chimpanzee divergence. Our analyses reveal a dominant role for selection in shaping genomic diversity and divergence patterns, clarify hominid evolution, and provide a baseline for investigating specific selective events.

  1. Complete Genome Sequence of the Multiresistant Acinetobacter baumannii Strain AbH12O-A2, Isolated during a Large Outbreak in Spain

    PubMed Central

    Merino, M.; Alvarez-Fraga, L.; Gómez, M. J.; Aransay, A. M.; Lavín, J. L.; Chaves, F.

    2014-01-01

    We report the complete genome sequence of Acinetobacter baumannii strain AbH12O-A2, isolated during a large outbreak in Spain. The genome has 3,875,775 bp and 3,526 coding sequences, with 39.4% G+C content. The availability of this genome will facilitate the study of the pathogenicity of the Acinetobacter species. PMID:25395646

  2. Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome

    PubMed Central

    van Opijnen, Tim; Bento, José

    2016-01-01

    The interaction between an antibiotic and bacterium is not merely restricted to the drug and its direct target, rather antibiotic induced stress seems to resonate through the bacterium, creating selective pressures that drive the emergence of adaptive mutations not only in the direct target, but in genes involved in many different fundamental processes as well. Surprisingly, it has been shown that adaptive mutations do not necessarily have the same effect in all species, indicating that the genetic background influences how phenotypes are manifested. However, to what extent the genetic background affects the manner in which a bacterium experiences antibiotic stress, and how this stress is processed is unclear. Here we employ the genome-wide tool Tn-Seq to construct daptomycin-sensitivity profiles for two strains of the bacterial pathogen Streptococcus pneumoniae. Remarkably, over half of the genes that are important for dealing with antibiotic-induced stress in one strain are dispensable in another. By confirming over 100 genotype-phenotype relationships, probing potassium-loss, employing genetic interaction mapping as well as temporal gene-expression experiments we reveal genome-wide conditionally important/essential genes, we discover roles for genes with unknown function, and uncover parts of the antibiotic’s mode-of-action. Moreover, by mapping the underlying genomic network for two query genes we encounter little conservation in network connectivity between strains as well as profound differences in regulatory relationships. Our approach uniquely enables genome-wide fitness comparisons across strains, facilitating the discovery that antibiotic responses are complex events that can vary widely between strains, which suggests that in some cases the emergence of resistance could be strain specific and at least for species with a large pan-genome less predictable. PMID:27607357

  3. Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome.

    PubMed

    van Opijnen, Tim; Dedrick, Sandra; Bento, José

    2016-09-01

    The interaction between an antibiotic and bacterium is not merely restricted to the drug and its direct target, rather antibiotic induced stress seems to resonate through the bacterium, creating selective pressures that drive the emergence of adaptive mutations not only in the direct target, but in genes involved in many different fundamental processes as well. Surprisingly, it has been shown that adaptive mutations do not necessarily have the same effect in all species, indicating that the genetic background influences how phenotypes are manifested. However, to what extent the genetic background affects the manner in which a bacterium experiences antibiotic stress, and how this stress is processed is unclear. Here we employ the genome-wide tool Tn-Seq to construct daptomycin-sensitivity profiles for two strains of the bacterial pathogen Streptococcus pneumoniae. Remarkably, over half of the genes that are important for dealing with antibiotic-induced stress in one strain are dispensable in another. By confirming over 100 genotype-phenotype relationships, probing potassium-loss, employing genetic interaction mapping as well as temporal gene-expression experiments we reveal genome-wide conditionally important/essential genes, we discover roles for genes with unknown function, and uncover parts of the antibiotic's mode-of-action. Moreover, by mapping the underlying genomic network for two query genes we encounter little conservation in network connectivity between strains as well as profound differences in regulatory relationships. Our approach uniquely enables genome-wide fitness comparisons across strains, facilitating the discovery that antibiotic responses are complex events that can vary widely between strains, which suggests that in some cases the emergence of resistance could be strain specific and at least for species with a large pan-genome less predictable.

  4. Mechanisms for the Evolution of a Derived Function in the Ancestral Glucocorticoid Receptor

    SciTech Connect

    Carroll, Sean Michael; Ortlund, Eric A; Thornton, Joseph W.

    2012-03-16

    Understanding the genetic, structural, and biophysical mechanisms that caused protein functions to evolve is a central goal of molecular evolutionary studies. Ancestral sequence reconstruction (ASR) offers an experimental approach to these questions. Here we use ASR to shed light on the earliest functions and evolution of the glucocorticoid receptor (GR), a steroid-activated transcription factor that plays a key role in the regulation of vertebrate physiology. Prior work showed that GR and its paralog, the mineralocorticoid receptor (MR), duplicated from a common ancestor roughly 450 million years ago; the ancestral functions were largely conserved in the MR lineage, but the functions of GRs - reduced sensitivity to all hormones and increased selectivity for glucocorticoids - are derived. Although the mechanisms for the evolution of glucocorticoid specificity have been identified, how reduced sensitivity evolved has not yet been studied. Here we report on the reconstruction of the deepest ancestor in the GR lineage (AncGR1) and demonstrate that GR's reduced sensitivity evolved before the acquisition of restricted hormone specificity, shortly after the GR-MR split. Using site-directed mutagenesis, X-ray crystallography, and computational analyses of protein stability to recapitulate and determine the effects of historical mutations, we show that AncGR1's reduced ligand sensitivity evolved primarily due to three key substitutions. Two large-effect mutations weakened hydrogen bonds and van der Waals interactions within the ancestral protein, reducing its stability. The degenerative effect of these two mutations is extremely strong, but a third permissive substitution, which has no apparent effect on function in the ancestral background and is likely to have occurred first, buffered the effects of the destabilizing mutations. Taken together, our results highlight the potentially creative role of substitutions that partially degrade protein structure and function and

  5. Large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference.

    PubMed

    Shen, Sam H; Stauft, Charles B; Gorbatsevych, Oleksandr; Song, Yutong; Ward, Charles B; Yurovsky, Alisa; Mueller, Steffen; Futcher, Bruce; Wimmer, Eckard

    2015-04-14

    The protein synthesis machineries of two distinct phyla of the Animal kingdom, insects of Arthropoda and mammals of Chordata, have different preferences for how to best encode proteins. Nevertheless, arboviruses (arthropod-borne viruses) are capable of infecting both mammals and insects just like arboviruses that use insect vectors to infect plants. These organisms have evolved carefully balanced genomes that can efficiently use the translational machineries of different phyla, even if the phyla belong to different kingdoms. Using dengue virus as an example, we have undone the genome encoding balance and specifically shifted the encoding preference away from mammals. These mammalian-attenuated viruses grow to high titers in insect cells but low titers in mammalian cells, have dramatically increased LD50s in newborn mice, and induce high levels of protective antibodies. Recoded arboviruses with a bias toward phylum-specific expression could form the basis of a new generation of live attenuated vaccine candidates.

  6. Large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference

    PubMed Central

    Shen, Sam H.; Stauft, Charles B.; Gorbatsevych, Oleksandr; Song, Yutong; Ward, Charles B.; Yurovsky, Alisa; Mueller, Steffen; Futcher, Bruce; Wimmer, Eckard

    2015-01-01

    The protein synthesis machineries of two distinct phyla of the Animal kingdom, insects of Arthropoda and mammals of Chordata, have different preferences for how to best encode proteins. Nevertheless, arboviruses (arthropod-borne viruses) are capable of infecting both mammals and insects just like arboviruses that use insect vectors to infect plants. These organisms have evolved carefully balanced genomes that can efficiently use the translational machineries of different phyla, even if the phyla belong to different kingdoms. Using dengue virus as an example, we have undone the genome encoding balance and specifically shifted the encoding preference away from mammals. These mammalian-attenuated viruses grow to high titers in insect cells but low titers in mammalian cells, have dramatically increased LD50s in newborn mice, and induce high levels of protective antibodies. Recoded arboviruses with a bias toward phylum-specific expression could form the basis of a new generation of live attenuated vaccine candidates. PMID:25825721

  7. Multiple recent horizontal transfers of a large genomic region in cheese making fungi.

    PubMed

    Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

    2014-01-01

    While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti--called Wallaby--present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes.

  8. Writ large: Genomic Dissection of the Effect of Cellular Environment on Immune Response

    PubMed Central

    Yosef, Nir; Regev, Aviv

    2016-01-01

    Cells of the immune system routinely respond to cues from their local environment and feedback to their surrounding through transient responses, choice of differentiation trajectories, plastic changes in cell state, and malleable adaptation to their tissue of residence. Genomic approaches have opened the way for comprehensive interrogation of such orchestrated responses. Focusing on genomic profiling of transcriptional and epigenetic cell state, we discuss how they are applied to investigate immune cells faced with various environmental cues. We highlight some of the emerging principles, on the role of dense regulatory circuitry, epigenetic memory, cell type fluidity, and reuse of regulatory modules, in achieving and maintaining appropriate responses to a changing environment. These provide a first step toward a systematic understanding of molecular circuits in complex tissues. PMID:27846493

  9. HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes.

    PubMed

    Xiong, Wenwei; He, Limei; Lai, Jinsheng; Dooner, Hugo K; Du, Chunguang

    2014-07-15

    Transposons make up the bulk of eukaryotic genomes, but are difficult to annotate because they evolve rapidly. Most of the unannotated portion of sequenced genomes is probably made up of various divergent transposons that have yet to be categorized. Helitrons are unusual rolling circle eukaryotic transposons that often capture gene sequences, making them of considerable evolutionary importance. Unlike other DNA transposons, Helitrons do not end in inverted repeats or create target site duplications, so they are particularly challenging to identify. Here we present HelitronScanner, a two-layered local combinational variable (LCV) tool for generalized Helitron identification that represents a major improvement over previous identification programs based on DNA sequence or structure. HelitronScanner identified 64,654 Helitrons from a wide range of plant genomes in a highly automated way. We tested HelitronScanner's predictive ability in maize, a species with highly heterogeneous Helitron elements. LCV scores for the 5' and 3' termini of the predicted Helitrons provide a primary confidence level and element copy number provides a secondary one. Newly identified Helitrons were validated by PCR assays or by in silico comparative analysis of insertion site polymorphism among multiple accessions. Many new Helitrons were identified in model species, such as maize, rice, and Arabidopsis, and in a variety of organisms where Helitrons had not been reported previously to our knowledge, leading to a major upward reassessment of their abundance in plant genomes. HelitronScanner promises to be a valuable tool in future comparative and evolutionary studies of this major transposon superfamily.

  10. Large-Scale Mutagenesis of the Yeast Genome Using a Tn7-Derived Multipurpose Transposon

    PubMed Central

    Kumar, Anuj; Seringhaus, Michael; Biery, Matthew C.; Sarnovsky, Robert J.; Umansky, Lara; Piccirillo, Stacy; Heidtman, Matthew; Cheung, Kei-Hoi; Dobry, Craig J.; Gerstein, Mark B.; Craig, Nancy L.; Snyder, Michael

    2004-01-01

    We present here an unbiased and extremely versatile insertional library of yeast genomic DNA generated by in vitro mutagenesis with a multipurpose element derived from the bacterial transposon Tn7. This mini-Tn7 element has been engineered such that a single insertion can be used to generate a lacZ fusion, gene disruption, and epitope-tagged gene product. Using this transposon, we generated a plasmid-based library of ∼300,000 mutant alleles; by high-throughput screening in yeast, we identified and sequenced 9032 insertions affecting 2613 genes (45% of the genome). From analysis of 7176 insertions, we found little bias in Tn7 target-site selection in vitro. In contrast, we also sequenced 10,174 Tn3 insertions and found a markedly stronger preference for an AT-rich 5-base pair target sequence. We further screened 1327 insertion alleles in yeast for hypersensitivity to the chemotherapeutic cisplatin. Fifty-one genes were identified, including four functionally uncharacterized genes and 25 genes involved in DNA repair, replication, transcription, and chromatin structure. In total, the collection reported here constitutes the largest plasmid-based set of sequenced yeast mutant alleles to date and, as such, should be singularly useful for gene and genome-wide functional analysis. PMID:15466296

  11. Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon.

    PubMed

    Kumar, Anuj; Seringhaus, Michael; Biery, Matthew C; Sarnovsky, Robert J; Umansky, Lara; Piccirillo, Stacy; Heidtman, Matthew; Cheung, Kei-Hoi; Dobry, Craig J; Gerstein, Mark B; Craig, Nancy L; Snyder, Michael

    2004-10-01

    We present here an unbiased and extremely versatile insertional library of yeast genomic DNA generated by in vitro mutagenesis with a multipurpose element derived from the bacterial transposon Tn7. This mini-Tn7 element has been engineered such that a single insertion can be used to generate a lacZ fusion, gene disruption, and epitope-tagged gene product. Using this transposon, we generated a plasmid-based library of approximately 300,000 mutant alleles; by high-throughput screening in yeast, we identified and sequenced 9032 insertions affecting 2613 genes (45% of the genome). From analysis of 7176 insertions, we found little bias in Tn7 target-site selection in vitro. In contrast, we also sequenced 10,174 Tn3 insertions and found a markedly stronger preference for an AT-rich 5-base pair target sequence. We further screened 1327 insertion alleles in yeast for hypersensitivity to the chemotherapeutic cisplatin. Fifty-one genes were identified, including four functionally uncharacterized genes and 25 genes involved in DNA repair, replication, transcription, and chromatin structure. In total, the collection reported here constitutes the largest plasmid-based set of sequenced yeast mutant alleles to date and, as such, should be singularly useful for gene and genome-wide functional analysis.

  12. Large scale DNA sequencing: new challenges emerge--the 2007 Human Genome Variation Society scientific meeting.

    PubMed

    Oetting, William S

    2008-05-01

    The annual scientific meeting of the Human Genome Variation Society (HGVS) was held on 23 October 2007, in San Diego, CA. The major theme of this meeting was "New DNA Sequencing Technologies & Human Genome Variation." A series of speakers provided information on several new technologies that produce DNA sequence data on a scale far beyond what was possible even a few years ago. These new technologies produce up to gigabases of nucleotides on a single run. Already, two individuals have had their entire genome sequenced, resulting in the identification of many novel DNA variants. Several new questions now need to be answered. What impact do these novel variants have on the phenotypes? How are we to associate private variants in a single individual with disease, especially when current association studies require genotyping thousands of individuals? Further work will be required to create methodologies to analyze these variants to determine if they are potentially disease-producing or are phenotypically silent. For the technology to be useful in a medical setting it will be crucial to answer to these questions.

  13. A high-density SNP genome-wide linkage scan in a large autism extended pedigree.

    PubMed

    Allen-Brady, K; Miller, J; Matsunami, N; Stevens, J; Block, H; Farley, M; Krasny, L; Pingree, C; Lainhart, J; Leppert, M; McMahon, W M; Coon, H

    2009-06-01

    We performed a high-density, single nucleotide polymorphism (SNP), genome-wide scan on a six-generation pedigree from Utah with seven affected males, diagnosed with autism spectrum disorder. Using a two-stage linkage design, we first performed a nonparametric analysis on the entire genome using a 10K SNP chip to identify potential regions of interest. To confirm potentially interesting regions, we eliminated SNPs in high linkage disequilibrium (LD) using a principal components analysis (PCA) method and repeated the linkage results. Three regions met genome-wide significance criteria after controlling for LD: 3q13.2-q13.31 (nonparametric linkage (NPL), 5.58), 3q26.31-q27.3 (NPL, 4.85) and 20q11.21-q13.12 (NPL, 5.56). Two regions met suggestive criteria for significance 7p14.1-p11.22 (NPL, 3.18) and 9p24.3 (NPL, 3.44). All five chromosomal regions are consistent with other published findings. Haplotype sharing results showed that five of the affected subjects shared more than a single chromosomal region of interest with other affected subjects. Although no common autism susceptibility genes were found for all seven autism cases, these results suggest that multiple genetic loci within these regions may contribute to the autism phenotype in this family, and further follow-up of these chromosomal regions is warranted.

  14. Genomic diversity of large-plaque-forming podoviruses infecting the phytopathogen Ralstonia solanacearum.

    PubMed

    Kawasaki, Takeru; Narulita, Erlia; Matsunami, Minaho; Ishikawa, Hiroki; Shimizu, Mio; Fujie, Makoto; Bhunchoth, Anjana; Phironrit, Namthip; Chatchawankanphanich, Orawan; Yamada, Takashi

    2016-05-01

    The genome organization, gene structure, and host range of five podoviruses that infect Ralstonia solanacearum, the causative agent of bacterial wilt disease were characterized. The phages fell into two distinctive groups based on the genome position of the RNA polymerase gene (i.e., T7-type and ϕKMV-type). One-step growth experiments revealed that ϕRSB2 (a T7-like phage) lysed host cells more efficiently with a shorter infection cycle (ca. 60 min corresponding to half the doubling time of the host) than ϕKMV-like phages such as ϕRSB1 (with an infection cycle of ca. 180 min). Co-infection experiments with ϕRSB1 and ϕRSB2 showed that ϕRSB2 always predominated in the phage progeny independent of host strains. Most phages had wide host-ranges and the phage particles usually did not attach to the resistant strains; when occasionally some did, the phage genome was injected into the resistant strain's cytoplasm, as revealed by fluorescence microscopy with SYBR Gold-labeled phage particles.

  15. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  16. Genome-Wide Survey of Large Rare Copy Number Variants in Alzheimer’s Disease Among Caribbean Hispanics

    PubMed Central

    Ghani, Mahdi; Pinto, Dalila; Lee, Joseph H.; Grinberg, Yakov; Sato, Christine; Moreno, Danielle; Scherer, Stephen W.; Mayeux, Richard; St. George-Hyslop, Peter; Rogaeva, Ekaterina

    2012-01-01

    Recently genome-wide association studies have identified significant association between Alzheimer’s disease (AD) and variations in CLU, PICALM, BIN1, CR1, MS4A4/MS4A6E, CD2AP, CD33, EPHA1, and ABCA7. However, the pathogenic variants in these loci have not yet been found. We conducted a genome-wide scan for large copy number variation (CNV) in a dataset of Caribbean Hispanic origin (554 controls and 559 AD cases that were previously investigated in a SNP-based genome-wide association study using Illumina HumanHap 650Y platform). We ran four CNV calling algorithms to obtain high-confidence calls for large CNVs (>100 kb) that were detected by at least two algorithms. Global burden analyses did not reveal significant differences between cases and controls in CNV rate, distribution of deletions or duplications, total or average CNV size; or number of genes affected by CNVs. However, we observed a nominal association between AD and a ∼470 kb duplication on chromosome 15q11.2 (P = 0.037). This duplication, encompassing up to five genes (TUBGCP5, CYFIP1, NIPA2, NIPA1, and WHAMML1) was present in 10 cases (2.6%) and 3 controls (0.8%). The dosage increase of CYFIP1 and NIPA1 genes was further confirmed by quantitative PCR. The current study did not detect CNVs that affect novel AD loci identified by recent genome-wide association studies. However, because the array technology used in our study has limitations in detecting small CNVs, future studies must carefully assess novel AD genes for the presence of disease-related CNVs. PMID:22384383

  17. Re-annotation, improved large-scale assembly and establishment of a catalogue of noncoding loci for the genome of the model brown alga Ectocarpus.

    PubMed

    Cormier, Alexandre; Avia, Komlan; Sterck, Lieven; Derrien, Thomas; Wucher, Valentin; Andres, Gwendoline; Monsoor, Misharl; Godfroy, Olivier; Lipinska, Agnieszka; Perrineau, Marie-Mathilde; Van De Peer, Yves; Hitte, Christophe; Corre, Erwan; Coelho, Susana M; Cock, J Mark

    2017-04-01

    The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles.

  18. Enzyme functional evolution through improved catalysis of ancestrally nonpreferred substrates.

    PubMed

    Huang, Ruiqi; Hippauf, Frank; Rohrbeck, Diana; Haustein, Maria; Wenke, Katrin; Feike, Janie; Sorrelle, Noah; Piechulla, Birgit; Barkman, Todd J

    2012-02-21

    In this study, we investigated the role for ancestral functional variation that may be selected upon to generate protein functional shifts using ancestral protein resurrection, statistical tests for positive selection, forward and reverse evolutionary genetics, and enzyme functional assays. Data are presented for three instances of protein functional change in the salicylic acid/benzoic acid/theobromine (SABATH) lineage of plant secondary metabolite-producing enzymes. In each case, we demonstrate that ancestral nonpreferred activities were improved upon in a daughter enzyme after gene duplication, and that these functional shifts were likely coincident with positive selection. Both forward and reverse mutagenesis studies validate the impact of one or a few sites toward increasing activity with ancestrally nonpreferred substrates. In one case, we document the occurrence of an evolutionary reversal of an active site residue that reversed enzyme properties. Furthermore, these studies show that functionally important amino acid replacements result in substrate discrimination as reflected in evolutionary changes in the specificity constant (k(cat)/K(M)) for competing substrates, even though adaptive substitutions may affect K(M) and k(cat) separately. In total, these results indicate that nonpreferred, or even latent, ancestral protein activities may be coopted at later times to become the primary or preferred protein activities.

  19. An experimental phylogeny to benchmark ancestral sequence reconstruction

    PubMed Central

    Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.

    2016-01-01

    Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687

  20. Male androphilia in the ancestral environment. An ethnological analysis.

    PubMed

    VanderLaan, Doug P; Ren, Zhiyuan; Vasey, Paul L

    2013-12-01

    The kin selection hypothesis posits that male androphilia (male sexual attraction to adult males) evolved because androphilic males invest more in kin, thereby enhancing inclusive fitness. Increased kin-directed altruism has been repeatedly documented among a population of transgendered androphilic males, but never among androphilic males in other cultures who adopt gender identities as men. Thus, the kin selection hypothesis may be viable if male androphilia was expressed in the transgendered form in the ancestral past. Using the Standard Cross-Cultural Sample (SCCS), we examined 46 societies in which male androphilia was expressed in the transgendered form (transgendered societies) and 146 comparison societies (non-transgendered societies). We analyzed SCCS variables pertaining to ancestral sociocultural conditions, access to kin, and societal reactions to homosexuality. Our results show that ancestral sociocultural conditions and bilateral and double descent systems were more common in transgendered than in non-transgendered societies. Across the entire sample, descent systems and residence patterns that would presumably facilitate increased access to kin were associated with the presence of ancestral sociocultural conditions. Among transgendered societies, negative societal attitudes toward homosexuality were unlikely. We conclude that the ancestral human sociocultural environment was likely conducive to the expression of the transgendered form of male androphilia. Descent systems, residence patterns, and societal reactions to homosexuality likely facilitated investments in kin by transgendered males. Given that contemporary transgendered male androphiles appear to exhibit elevated kin-directed altruism, these findings further indicate the viability of the kin selection hypothesis.

  1. Enzyme functional evolution through improved catalysis of ancestrally nonpreferred substrates

    PubMed Central

    Huang, Ruiqi; Hippauf, Frank; Rohrbeck, Diana; Haustein, Maria; Wenke, Katrin; Feike, Janie; Sorrelle, Noah; Piechulla, Birgit; Barkman, Todd J.

    2012-01-01

    In this study, we investigated the role for ancestral functional variation that may be selected upon to generate protein functional shifts using ancestral protein resurrection, statistical tests for positive selection, forward and reverse evolutionary genetics, and enzyme functional assays. Data are presented for three instances of protein functional change in the salicylic acid/benzoic acid/theobromine (SABATH) lineage of plant secondary metabolite-producing enzymes. In each case, we demonstrate that ancestral nonpreferred activities were improved upon in a daughter enzyme after gene duplication, and that these functional shifts were likely coincident with positive selection. Both forward and reverse mutagenesis studies validate the impact of one or a few sites toward increasing activity with ancestrally nonpreferred substrates. In one case, we document the occurrence of an evolutionary reversal of an active site residue that reversed enzyme properties. Furthermore, these studies show that functionally important amino acid replacements result in substrate discrimination as reflected in evolutionary changes in the specificity constant (kcat/KM) for competing substrates, even though adaptive substitutions may affect KM and kcat separately. In total, these results indicate that nonpreferred, or even latent, ancestral protein activities may be coopted at later times to become the primary or preferred protein activities. PMID:22315396

  2. Eleven ancestral gene families lost in mammals and vertebrates while otherwise universally conserved in animals

    PubMed Central

    Danchin, Etienne GJ; Gouret, Philippe; Pontarotti, Pierre

    2006-01-01

    Background Gene losses played a role which may have been as important as gene and genome duplications and rearrangements, in modelling today species' genomes from a common ancestral set of genes. The set and diversity of protein-coding genes in a species has direct output at the functional level. While gene losses have been reported in all the major lineages of the metazoan tree of life, none have proposed a focus on specific losses in the vertebrates and mammals lineages. In contrast, genes lost in protostomes (i.e. arthropods and nematodes) but still present in vertebrates have been reported and extensively detailed. This probable over-anthropocentric way of comparing genomes does not consider as an important phenomena, gene losses in species that are usually described as "higher". However reporting universally conserved genes throughout evolution that have recently been lost in vertebrates and mammals could reveal interesting features about the evolution of our genome, particularly if these losses can be related to losses of capability. Results We report 11 gene families conserved throughout eukaryotes from yeasts (such as Saccharomyces cerevisiae) to bilaterian animals (such as Drosophila melanogaster or Caenorhabditis elegans). This evolutionarily wide conservation suggests they were present in the last common ancestors of fungi and metazoan animals. None of these 11 gene families are found in human nor mouse genomes, and their absence generally extends to all vertebrates. A total of 8 out of these 11 gene families have orthologs in plants, suggesting they were present in the Last Eukaryotic Common Ancestor (LECA). We investigated known functional information for these 11 gene families. This allowed us to correlate some of the lost gene families to loss of capabilities. Conclusion Mammalian and vertebrate genomes lost evolutionary conserved ancestral genes that are probably otherwise not dispensable in eukaryotes. Hence, the human genome, which is generally

  3. The integration of recombination and physical maps in a large-genome monocot using haploid genome analysis in a trihybrid allium population.

    PubMed

    Khrustaleva, L I; de Melo, P E; van Heusden, A W; Kik, C

    2005-03-01

    Integrated mapping in large-genome monocots has been carried out on a limited number of species. Furthermore, integrated maps are difficult to construct for these species due to, among other reasons, the specific plant populations needed. To fill these gaps, Alliums were chosen as target species and a new strategy for constructing suitable populations was developed. This strategy involves the use of trihybrid genotypes in which only one homeolog of a chromosome pair is recombinant due to interspecific recombination. We used genotypes from a trihybrid Allium cepa x (A. roylei x A. fistulosum) population. Recombinant chromosomes 5 and 8 from the interspecific parent were analyzed using genomic in situ hybridization visualization of recombination points and the physical positions of recombination were integrated into AFLP linkage maps of both chromosomes. The integrated maps showed that in Alliums recombination predominantly occurs in the proximal half of chromosome arms and that 57.9% of PstI/MseI markers are located in close proximity to the centromeric region, suggesting the presence of genes in this region. These findings are different from data obtained on cereals, where recombination rate and gene density tends to be higher in distal regions.

  4. The Integration of Recombination and Physical Maps in a Large-Genome Monocot Using Haploid Genome Analysis in a Trihybrid Allium Population

    PubMed Central

    Khrustaleva, L. I.; de Melo, P. E.; van Heusden, A. W.; Kik, C.

    2005-01-01

    Integrated mapping in large-genome monocots has been carried out on a limited number of species. Furthermore, integrated maps are difficult to construct for these species due to, among other reasons, the specific plant populations needed. To fill these gaps, Alliums were chosen as target species and a new strategy for constructing suitable populations was developed. This strategy involves the use of trihybrid genotypes in which only one homeolog of a chromosome pair is recombinant due to interspecific recombination. We used genotypes from a trihybrid Allium cepa × (A. roylei × A. fistulosum) population. Recombinant chromosomes 5 and 8 from the interspecific parent were analyzed using genomic in situ hybridization visualization of recombination points and the physical positions of recombination were integrated into AFLP linkage maps of both chromosomes. The integrated maps showed that in Alliums recombination predominantly occurs in the proximal half of chromosome arms and that 57.9% of PstI/MseI markers are located in close proximity to the centromeric region, suggesting the presence of genes in this region. These findings are different from data obtained on cereals, where recombination rate and gene density tends to be higher in distal regions. PMID:15654085

  5. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations

    PubMed Central

    dos Santos, Gilberto; Schroeder, Andrew J.; Goodman, Joshua L.; Strelets, Victor B.; Crosby, Madeline A.; Thurmond, Jim; Emmert, David B.; Gelbart, William M.

    2015-01-01

    Release 6, the latest reference genome assembly of the fruit fly Drosophila melanogaster, was released by the Berkeley Drosophila Genome Project in 2014; it replaces their previous Release 5 genome assembly, which had been the reference genome assembly for over 7 years. With the enormous amount of information now attached to the D. melanogaster genome in public repositories and individual laboratories, the replacement of the previous assembly by the new one is a major event requiring careful migration of annotations and genome-anchored data to the new, improved assembly. In this report, we describe the attributes of the new Release 6 reference genome assembly, the migration of FlyBase genome annotations to this new assembly, how genome features on this new assembly can be viewed in FlyBase (http://flybase.org) and how users can convert coordinates for their own data to the corresponding Release 6 coordinates. PMID:25398896

  6. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations.

    PubMed

    dos Santos, Gilberto; Schroeder, Andrew J; Goodman, Joshua L; Strelets, Victor B; Crosby, Madeline A; Thurmond, Jim; Emmert, David B; Gelbart, William M

    2015-01-01

    Release 6, the latest reference genome assembly of the fruit fly Drosophila melanogaster, was released by the Berkeley Drosophila Genome Project in 2014; it replaces their previous Release 5 genome assembly, which had been the reference genome assembly for over 7 years. With the enormous amount of information now attached to the D. melanogaster genome in public repositories and individual laboratories, the replacement of the previous assembly by the new one is a major event requiring careful migration of annotations and genome-anchored data to the new, improved assembly. In this report, we describe the attributes of the new Release 6 reference genome assembly, the migration of FlyBase genome annotations to this new assembly, how genome features on this new assembly can be viewed in FlyBase (http://flybase.org) and how users can convert coordinates for their own data to the corresponding Release 6 coordinates.

  7. Magmatism and Epithermal Gold-Silver Deposits of the Southern Ancestral Cascade Arc, Western Nevada and Eastern California

    USGS Publications Warehouse

    John, David A.; du Bray, Edward A.; Henry, Christopher D.; Vikre, Peter

    2015-01-01

    Many epithermal gold-silver deposits are temporally and spatially associated with late Oligocene to Pliocene magmatism of the southern ancestral Cascade arc in western Nevada and eastern California. These deposits, which include both quartz-adularia (low- and intermediate-sulfidation; Comstock Lode, Tonopah, Bodie) and quartz-alunite (high-sulfidation; Goldfield, Paradise Peak) types, were major producers of gold and silver. Ancestral Cascade arc magmatism preceded that of the modern High Cascades arc and reflects subduction of the Farallon plate beneath North America. Ancestral arc magmatism began about 45 Ma, continued until about 3 Ma, and extended from near the Canada-United States border in Washington southward to about 250 km southeast of Reno, Nevada. The ancestral arc was split into northern and southern segments across an inferred tear in the subducting slab between Mount Shasta and Lassen Peak in northern California. The southern segment extends between 42°N in northern California and 37°N in western Nevada and was active from about 30 to 3 Ma. It is bounded on the east by the northeast edge of the Walker Lane. Ancestral arc volcanism represents an abrupt change in composition and style of magmatism relative to that in central Nevada. Large volume, caldera-forming, silicic ignimbrites associated with the 37 to 19 Ma ignimbrite flareup are dominant in central Nevada, whereas volcanic centers of the ancestral arc in western Nevada consist of andesitic stratovolcanoes and dacitic to rhyolitic lava domes that mostly formed between 25 and 4 Ma. Both ancestral arc and ignimbrite flareup magmatism resulted from rollback of the shallowly dipping slab that began about 45 Ma in northeast Nevada and migrated south-southwest with time. Most southern segment ancestral arc rocks have oxidized, high potassium, calc-alkaline compositions with silica contents ranging continuously from about 55 to 77 wt%. Most lavas are porphyritic and contain coarse plagioclase

  8. Chætognath transcriptome reveals ancestral and unique features among bilaterians

    PubMed Central

    Marlétaz, Ferdinand; Gilles, André; Caubit, Xavier; Perez, Yvan; Dossat, Carole; Samain, Sylvie; Gyapay, Gabor; Wincker, Patrick; Le Parco, Yannick

    2008-01-01

    Background The chætognaths (arrow worms) have puzzled zoologists for years because of their astonishing morphological and developmental characteristics. Despite their deuterostome-like development, phylogenomic studies recently positioned the chætognath phylum in protostomes, most likely in an early branching. This key phylogenetic position and the peculiar characteristics of chætognaths prompted further investigation of their genomic features. Results Transcriptomic and genomic data were collected from the chætognath Spadella cephaloptera through the sequencing of expressed sequence tags and genomic bacterial artificial chromosome clones. Transcript comparisons at various taxonomic scales emphasized the conservation of a core gene set and phylogenomic analysis confirmed the basal position of chætognaths among protostomes. A detailed survey of transcript diversity and individual genotyping revealed a past genome duplication event in the chætognath lineage, which was, surprisingly, followed by a high retention rate of duplicated genes. Moreover, striking genetic heterogeneity was detected within the sampled population at the nuclear and mitochondrial levels but cannot be explained by cryptic speciation. Finally, we found evidence for trans-splicing maturation of transcripts through splice-leader addition in the chætognath phylum and we further report that this processing is associated with operonic transcription. Conclusion These findings reveal both shared ancestral and unique derived characteristics of the chætognath genome, which suggests that this genome is likely the product of a very original evolutionary history. These features promote chætognaths as a pivotal model for comparative genomics, which could provide new clues for the investigation of the evolution of animal genomes. PMID:18533022

  9. Ancestral state reconstruction, rate heterogeneity, and the evolution of reptile viviparity.

    PubMed

    King, Benedict; Lee, Michael S Y

    2015-05-01

    Virtually all models for reconstructing ancestral states for discrete characters make the crucial assumption that the trait of interest evolves at a uniform rate across the entire tree. However, this assumption is unlikely to hold in many situations, particularly as ancestral state reconstructions are being performed on increasingly large phylogenies. Here, we show how failure to account for such variable evolutionary rates can cause highly anomalous (and likely incorrect) results, while three methods that accommodate rate variability yield the opposite, more plausible, and more robust reconstructions. The random local clock method, implemented in BEAST, estimates the position and magnitude of rate changes on the tree; split BiSSE estimates separate rate parameters for pre-specified clades; and the hidden rates model partitions each character state into a number of rate categories. Simulations show the inadequacy of traditional models when characters evolve with both asymmetry (different rates of change between states within a character) and heterotachy (different rates of character evolution across different clades). The importance of accounting for rate heterogeneity in ancestral state reconstruction is highlighted empirically with a new analysis of the evolution of viviparity in squamate reptiles, which reveal a predominance of forward (oviparous-viviparous) transitions and very few reversals.

  10. A novel common large genomic deletion and two new missense mutations identified in the Romanian phenylketonuria population.

    PubMed

    Gemperle-Britschgi, Corinne; Iorgulescu, Daniela; Mager, Monica Alina; Anton-Paduraru, Dana; Vulturar, Romana; Thöny, Beat

    2016-01-15

    The mutation spectrum for the phenylalanine hydroxylase (PAH) gene was investigated in a cohort of 84 hyperphenylalaninemia (HPA) patients from Romania identified through newborn screening or neurometabolic investigations. Differential diagnosis identified 81 patients with classic PAH deficiency while 3 had tetrahydropterin-cofactor deficiency and/or remained uncertain due to insufficient specimen. PAH-genetic analysis included a combination of Sanger sequencing of exons and exon–intron boundaries, MLPA and NGS with genomic DNA, and cDNA analysis from immortalized lymphoblasts. A diagnostic efficiency of 99.4% was achieved, as for one allele (out of a total of 162 alleles) no mutation could be identified. The most prevalent mutation was p.Arg408Trp which was found in ~ 38% of all PKU alleles. Three novel mutations were identified, including the two missense mutations p.Gln226Lys and p.Tyr268Cys that were both disease causing by prediction algorithms, and the large genomic deletion EX6del7831 (c.509 + 4140_706 + 510del7831) that resulted in skipping of exon 6 based on PAH-cDNA analysis in immortalized lymphocytes. The genomic deletion was present in a heterozygous state in 12 patients, i.e. in ~ 8% of all the analyzed PKU alleles, and might have originated from a Romanian founder.

  11. Mapping of chimpanzee full-length cDNAs onto the human genome unveils large potential divergence of the transcriptome.

    PubMed

    Sakate, Ryuichi; Suto, Yumiko; Imanishi, Tadashi; Tanoue, Tetsuya; Hida, Munetomo; Hayasaka, Ikuo; Kusuda, Jun; Gojobori, Takashi; Hashimoto, Katsuyuki; Hirai, Momoki

    2007-09-01

    The genetic basis of the phenotypic difference between human and chimpanzee is one of the most actively pursued issues in current genomics. Although the genomic divergence between the two species has been described, the transcriptomic divergence has not been well documented. Thus, we newly sequenced and analyzed chimpanzee full-length cDNAs (FLcDNAs) representing 87 protein-coding genes. The number of nucleotide substitutions and sites of insertions/deletions (indels) was counted as a measure of sequence divergence between the chimpanzee FLcDNAs and the human genome onto which the FLcDNAs were mapped. Difference in transcription start/termination sites (TSSs/TTSs) and alternative splicing (AS) exons was also counted as a measure of structural divergence between the chimpanzee FLcDNAs and their orthologous human transcripts (NCBI RefSeq). As a result, we found that transposons (Alu) and repetitive segments caused large indels, which strikingly increased the average amount of sequence divergence up to more than 2% in the 3'-UTRs. Moreover, 20 out of the 87 transcripts contained more than 10% structural divergence in length. In particular, two-thirds of the structural divergence was found in the 3'-UTRs, and variable transcription start sites were conspicuous in the 5'-UTRs. As both transcriptional and translational efficiency were supposed to be related to 5'- and 3'-UTR sequences, these results lead to the idea that the difference in gene regulation can be a major cause of the difference in phenotype between human and chimpanzee.

  12. Large-Scale Screening for Targeted Knockouts in the Caenorhabditis elegans Genome

    PubMed Central

    2012-01-01

    The nematode Caenorhabditis elegans is a powerful model system to study contemporary biological problems. This system would be even more useful if we had mutations in all the genes of this multicellular metazoan. The combined efforts of the C. elegans Deletion Mutant Consortium and individuals within the worm community are moving us ever closer to this goal. At present, of the 20,377 protein-coding genes in this organism, 6764 genes with associated molecular lesions are either deletions or null mutations (WormBase WS220). Our three laboratories have contributed the majority of mutated genes, 6841 mutations in 6013 genes. The principal method we used to detect deletion mutations in the nematode utilizes polymerase chain reaction (PCR). More recently, we have used array comparative genome hybridization (aCGH) to detect deletions across the entire coding part of the genome and massively parallel short-read sequencing to identify nonsense, splicing, and missense defects in open reading frames. As deletion strains can be frozen and then thawed when needed, these strains will be an enduring community resource. Our combined molecular screening strategies have improved the overall throughput of our gene-knockout facilities and have broadened the types of mutations that we and others can identify. These multiple strategies should enable us to eventually identify a mutation in every gene in this multicellular organism. This knowledge will usher in a new age of metazoan genetics in which the contribution to any biological process can be assessed for all genes. PMID:23173093

  13. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

    PubMed Central

    Croucher, Nicholas J.; Page, Andrew J.; Connor, Thomas R.; Delaney, Aidan J.; Keane, Jacqueline A.; Bentley, Stephen D.; Parkhill, Julian; Harris, Simon R.

    2015-01-01

    The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates’ recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X. PMID:25414349

  14. Perspectives on clinical informatics: integrating large-scale clinical, genomic, and health information for clinical care.

    PubMed

    Choi, In Young; Kim, Tae-Min; Kim, Myung Shin; Mun, Seong K; Chung, Yeun-Jun

    2013-12-01

    The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

  15. Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

    PubMed Central

    Choi, In Young; Kim, Tae-Min; Kim, Myung Shin; Mun, Seong K.

    2013-01-01

    The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population. PMID:24465229

  16. Short-chain dehydrogenase/reductase (SDR) relationships: a large family with eight clusters common to human, animal, and plant genomes.

    PubMed

    Kallberg, Yvonne; Oppermann, Udo; Jörnvall, Hans; Persson, Bengt

    2002-03-01

    The progress in genome characterizations has opened new routes for studying enzyme families. The availability of the human genome enabled us to delineate the large family of short-chain dehydrogenase/reductase (SDR) members. Although the human genome releases are not yet final, we have already found 63 members. We have also compared these SDR forms with those of three model organisms: Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana. We detect eight SDR ortholog clusters in a cross-genome comparison. Four of these clusters represent extended SDR forms, a subgroup found in all life forms. The other four are classical SDRs with activities involved in cellular differentiation and signalling. We also find 18 SDR genes that are present only in the human genome of the four genomes studied, reflecting enzyme forms specific to mammals. Close to half of these gene products represent steroid dehydrogenases, emphasizing the regulatory importance of these enzymes.

  17. Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates

    PubMed Central

    Peng, Jiguang; He, Bin Z.; Li, Yumei; Liu, Chu-Jun; Luan, Xuke; Ding, Wanqiu; Li, Shuxian; Chen, Chunyan; Tan, Bertrand Chin-Ming; Zhang, Yong E.; He, Aibin; Li, Chuan-Yun

    2015-01-01

    While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts. PMID:26177073

  18. Large-Scale Release of Campylobacter Draft Genomes: Resources for Food Safety and Public Health from the 100K Pathogen Genome Project

    PubMed Central

    Huang, Bihua C.; Storey, Dylan B.; Kong, Nguyet; Chen, Poyin; Arabyan, Narine; Gilpin, Brent; Mason, Carl; Townsend, Andrea K.; Smith, Woutrina A.; Byrne, Barbara A.; Taff, Conor C.

    2017-01-01

    ABSTRACT Campylobacter is a food-associated bacterium and a leading cause of foodborne illness worldwide, being associated with poultry in the food supply. This is the initial public release of 202 Campylobacter genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Campylobacter genus. PMID:28057746

  19. Close Split of Sorghum and Maize Genome Progenitors

    PubMed Central

    Swigoňová, Zuzana; Lai, Jinsheng; Ma, Jianxin; Ramakrishna, Wusirika; Llaca, Victor; Bennetzen, Jeffrey L.; Messing, Joachim

    2004-01-01

    It is generally believed that maize (Zea mays L. ssp. mays) arose as a tetraploid; however, the two progenitor genomes cannot be unequivocally traced within the genome of modern maize. We have taken a new approach to investigate the origin of the maize genome. We isolated and sequenced large genomic fragments from the regions surrounding five duplicated loci from the maize genome and their orthologous loci in sorghum, and then we compared these sequences with the orthologous regions in the rice genome. Within the studied segments, we identified 11 genes that were conserved in location, order, and orientation. We performed phylogenetic and distance analyses and examined the patterns of estimated times of divergence for sorghum and maize gene orthologs and also the time of divergence for maize orthologs. Our results support a tetraploid origin of maize. This analysis also indicates contemporaneous divergence of the ancestral sorghum genome and the two maize progenitor genomes about 11.9 million years ago (Mya). On the basis of a putative conversion event detected for one of the genes, tetraploidization must have occurred before 4.8 Mya, and therefore, preceded the major maize genome expansion by gene amplification and retrotransposition. PMID:15466289

  20. Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

    PubMed Central

    2012-01-01

    Background Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions Two transcriptome sets

  1. Comparison of Eleven Methods for Genomic DNA Extraction Suitable for Large-Scale Whole-Genome Genotyping and Long-Term DNA Banking Using Blood Samples

    PubMed Central

    Psifidi, Androniki; Dovas, Chrysostomos I.; Bramis, Georgios; Lazou, Thomai; Russel, Claire L.; Arsenos, Georgios; Banos, Georgios

    2015-01-01

    Over the recent years, next generation sequencing and microarray technologies have revolutionized scientific research with their applications to high-throughput analysis of biological systems. Isolation of high quantities of pure, intact, double stranded, highly concentrated, not contaminated genomic DNA is prerequisite for successful and reliable large scale genotyping analysis. High quantities of pure DNA are also required for the creation of DNA-banks. In the present study, eleven different DNA extraction procedures, including phenol-chloroform, silica and magnetic beads based extractions, were examined to ascertain their relative effectiveness for extracting DNA from ovine blood samples. The quality and quantity of the differentially extracted DNA was subsequently assessed by spectrophotometric measurements, Qubit measurements, real-time PCR amplifications and gel electrophoresis. Processing time, intensity of labor and cost for each method were also evaluated. Results revealed significant differences among the eleven procedures and only four of the methods yielded satisfactory outputs. These four methods, comprising three modified silica based commercial kits (Modified Blood, Modified Tissue, Modified Dx kits) and an in-house developed magnetic beads based protocol, were most appropriate for extracting high quality and quantity DNA suitable for large-scale microarray genotyping and also for long-term DNA storage as demonstrated by their successful application to 600 individuals. PMID:25635817

  2. A draft physical map of a D-genome cotton species (Gossypium raimondii)

    PubMed Central

    2010-01-01

    Background Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing. Results A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences. Conclusion Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence. PMID:20569427

  3. Reaching Children through Their Ancestral Language and Authentic Literature

    ERIC Educational Resources Information Center

    Bannon, Kay Thorpe

    2004-01-01

    In this article, the author describes a program of Eastern Cherokee ancestral language restoration in Cherokee, North Carolina. One of the primary goals of the program is to enhance the self-concept of the children and motivate the students to experience academic excitement and success. The use of authentic legends and stories is one method…

  4. The Effect of Recombination on the Reconstruction of Ancestral Sequences

    PubMed Central

    Arenas, Miguel; Posada, David

    2010-01-01

    While a variety of methods exist to reconstruct ancestral sequences, all of them assume that a single phylogeny underlies all the positions in the alignment and therefore that recombination has not taken place. Using computer simulations we show that recombination can severely bias ancestral sequence reconstruction (ASR), and quantify this effect. If recombination is ignored, the ancestral sequences recovered can be quite distinct from the grand most recent common ancestor (GMRCA) of the sample and better resemble the concatenate of partial most recent common ancestors (MRCAs) at each recombination fragment. When independent phylogenetic trees are assumed for the different recombinant segments, the estimation of the fragment MRCAs improves significantly. Importantly, we show that recombination can change the biological predictions derived from ASRs carried out with real data. Given that recombination is widespread on nuclear genes and in particular in RNA viruses and some bacteria, the reconstruction of ancestral sequences in these cases should consider the potential impact of recombination and ideally be carried out using approaches that accommodate recombination. PMID:20124027

  5. Are survival processing memory advantages based on ancestral priorities?

    PubMed

    Soderstrom, Nicholas C; McCabe, David P

    2011-06-01

    Recent research has suggested that our memory systems are especially tuned to process information according to its survival relevance, and that inducing problems of "ancestral priorities" faced by our ancestors should lead to optimal recall performance (Nairne & Pandeirada, Cognitive Psychology, 2010). The present study investigated the specificity of this idea by comparing an ancestor-consistent scenario and a modern survival scenario that involved threats that were encountered by human ancestors (e.g., predators) or threats from fictitious creatures (i.e., zombies). Participants read one of four survival scenarios in which the environment and the explicit threat were either consistent or inconsistent with ancestrally based problems (i.e., grasslands-predators, grasslands-zombies, city-attackers, city-zombies), or they rated words for pleasantness. After rating words based on their survival relevance (or pleasantness), the participants performed a free recall task. All survival scenarios led to better recall than did pleasantness ratings, but recall was greater when zombies were the threat, as compared to predators or attackers. Recall did not differ for the modern (i.e., city) and ancestral (i.e., grasslands) scenarios. These recall differences persisted when valence and arousal ratings for the scenarios were statistically controlled as well. These data challenge the specificity of ancestral priorities in survival-processing advantages in memory.

  6. Musculature in sipunculan worms: ontogeny and ancestral states.

    PubMed

    Schulze, Anja; Rice, Mary E

    2009-01-01

    Molecular phylogenetics suggests that the Sipuncula fall into the Annelida, although they are morphologically very distinct and lack segmentation. To understand the evolutionary transformations from the annelid to the sipunculan body plan, it is important to reconstruct the ancestral states within the respective clades at all life history stages. Here we reconstruct the ancestral states for the head/introvert retractor muscles and the body wall musculature in the Sipuncula using Bayesian statistics. In addition, we describe the ontogenetic transformations of the two muscle systems in four sipunculan species with different developmental modes, using F-actin staining with fluorescent-labeled phalloidin in conjunction with confocal laser scanning microscopy. All four species, which have smooth body wall musculature and less than the full set of four introvert retractor muscles as adults, go through developmental stages with four retractor muscles that are eventually reduced to a lower number in the adult. The circular and sometimes the longitudinal body wall musculature are split into bands that later transform into a smooth sheath. Our ancestral state reconstructions suggest with nearly 100% probability that the ancestral sipunculan had four introvert retractor muscles, longitudinal body wall musculature in bands and circular body wall musculature arranged as a smooth sheath. Species with crawling larvae have more strongly developed body wall musculature than those with swimming larvae. To interpret our findings in the context of annelid evolution, a more solid phylogenetic framework is needed for the entire group and more data on ontogenetic transformations of annelid musculature are desirable.

  7. Advanced Intestinal Cancers often Maintain a Multi-Ancestral Architecture

    PubMed Central

    Zahm, Christopher D.; Szulczewski, Joseph M.; Leystra, Alyssa A.; Paul Olson, Terrah J.; Clipson, Linda; Albrecht, Dawn M.; Middlebrooks, Malisa; Thliveris, Andrew T.; Matkowskyj, Kristina A.; Washington, Mary Kay; Newton, Michael A.; Eliceiri, Kevin W.; Halberg, Richard B.

    2016-01-01

    A widely accepted paradigm in the field of cancer biology is that solid tumors are uni-ancestral being derived from a single founder and its descendants. However, data have been steadily accruing that indicate early tumors in mice and humans can have a multi-ancestral origin in which an initiated primogenitor facilitates the transformation of neighboring co-genitors. We developed a new mouse model that permits the determination of clonal architecture of intestinal tumors in vivo and ex vivo, have validated this model, and then used it to assess the clonal architecture of adenomas, intramucosal carcinomas, and invasive adenocarcinomas of the intestine. The percentage of multi-ancestral tumors did not significantly change as tumors progressed from adenomas with low-grade dysplasia [40/65 (62%)], to adenomas with high-grade dysplasia [21/37 (57%)], to intramucosal carcinomas [10/23 (43%]), to invasive adenocarcinomas [13/19 (68%)], indicating that the clone arising from the primogenitor continues to coexist with clones arising from co-genitors. Moreover, neoplastic cells from distinct clones within a multi-ancestral adenocarcinoma have even been observed to simultaneously invade into the underlying musculature [2/15 (13%)]. Thus, intratumoral heterogeneity arising early in tumor formation persists throughout tumorigenesis. PMID:26919712

  8. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.

    PubMed

    Olalde, Iñigo; Allentoft, Morten E; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W K; DeGiorgio, Michael; Prado-Martinez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G; Novembre, John; Sturm, Richard A; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles

    2014-03-13

    Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer.

  9. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    SciTech Connect

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  10. Prediction of human miRNA target genes using computationally reconstructed ancestral mammalian sequences

    PubMed Central

    Leclercq, Mickael; Diallo, Abdoulaye Baniré; Blanchette, Mathieu

    2017-01-01

    MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/∼blanchem/mirancestar. PMID:27899600

  11. Sexually Dimorphic Effects of Ancestral Exposure to Vinclozolin on Stress Reactivity in Rats

    PubMed Central

    Gillette, Ross; Miller-Crews, Isaac; Nilsson, Eric E.; Skinner, Michael K.; Gore, Andrea C.

    2014-01-01

    How an individual responds to the environment depends upon both personal life history as well as inherited genetic and epigenetic factors from ancestors. Using a 2-hit, 3 generations apart model, we tested how F3 descendants of rats given in utero exposure to the environmental endocrine-disrupting chemical (EDC) vinclozolin reacted to stress during adolescence in their own lives, focusing on sexually dimorphic phenotypic outcomes. In adulthood, male and female F3 vinclozolin- or vehicle-lineage rats, stressed or nonstressed, were behaviorally characterized on a battery of tests and then euthanized. Serum was used for hormone assays, and brains were used for quantitative PCR and transcriptome analyses. Results showed that the effects of ancestral exposure to vinclozolin converged with stress experienced during adolescence in a sexually dimorphic manner. Debilitating effects were seen at all levels of the phenotype, including physiology, behavior, brain metabolism, gene expression, and genome-wide transcriptome modifications in specific brain nuclei. Additionally, females were significantly more vulnerable than males to transgenerational effects of vinclozolin on anxiety but not sociality tests. This fundamental transformation occurs in a manner not predicted by the ancestral exposure or the proximate effects of stress during adolescence, an interaction we refer to as synchronicity. PMID:25051444

  12. The genome of the brown alga Ectocarpus siliculosus contains a series of viral DNA pieces, suggesting an ancient association with large dsDNA viruses

    PubMed Central

    2008-01-01

    Background Ectocarpus siliculosus virus-1 (EsV-1) is a lysogenic dsDNA virus belonging to the super family of nucleocytoplasmic large DNA viruses (NCLDV) that infect Ectocarpus siliculosus, a marine filamentous brown alga. Previous studies indicated that the viral genome is integrated into the host DNA. In order to find the integration sites of the viral genome, a genomic library from EsV-1-infected algae was screened using labelled EsV-1 DNA. Several fragments were isolated and some of them were sequenced and analyzed in detail. Results Analysis revealed that the algal genome is split by a copy of viral sequences that have a high identity to EsV-1 DNA sequences. These fragments are interspersed with DNA repeats, pseudogenes and genes coding for products involved in DNA replication, integration and transposition. Some of these gene products are not encoded by EsV-1 but are present in the genome of other members of the NCLDV family. Further analysis suggests that the Ectocarpus algal genome contains traces of the integration of a large dsDNA viral genome; this genome could be the ancestor of the extant NCLDV genomes. Furthermore, several lines of evidence indicate that the EsV-1 genome might have originated in these viral DNA pieces, implying the existence of a complex integration and recombination system. A protein similar to a new class of tyrosine recombinases might be a key enzyme of this system. Conclusion Our results support the hypothesis that some dsDNA viruses are monophyletic and evolved principally through genome reduction. Moreover, we hypothesize that phaeoviruses have probably developed an original replication system. PMID:18405387

  13. Creation of Functional Viruses from Non-Functional cDNA Clones Obtained from an RNA Virus Population by the Use of Ancestral Reconstruction.

    PubMed

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Dräger, Carolin; Orton, Richard J; Blome, Sandra; Höper, Dirk; Beer, Martin; Rasmussen, Thomas Bruun

    2015-01-01

    RNA viruses have the highest known mutation rates. Consequently it is likely that a high proportion of individual RNA virus genomes, isolated from an infected host, will contain lethal mutations and be non-functional. This is problematic if the aim is to clone and investigate high-fitness, functional cDNAs and may also pose problems for sequence-based analysis of viral evolution. To address these challenges we have performed a study of the evolution of classical swine fever virus (CSFV) using deep sequencing and analysis of 84 full-length cDNA clones, each representing individual genomes from a moderately virulent isolate. In addition to here being used as a model for RNA viruses generally, CSFV has high socioeconomic importance and remains a threat to animal welfare and pig production. We find that the majority of the investigated genomes are non-functional and only 12% produced infectious RNA transcripts. Full length sequencing of cDNA clones and deep sequencing of the parental population identified substitutions important for the observed phenotypes. The investigated cDNA clones were furthermore used as the basis for inferring the sequence of functional viruses. Since each unique clone must necessarily be the descendant of a functional ancestor, we hypothesized that it should be possible to produce functional clones by reconstructing ancestral sequences. To test this we used phylogenetic methods to infer two ancestral sequences, which were then reconstructed as cDNA clones. Viruses rescued from the reconstructed cDNAs were tested in cell culture and pigs. Both reconstructed ancestral genomes proved functional, and displayed distinct phenotypes in vitro and in vivo. We suggest that reconstruction of ancestral viruses is a useful tool for experimental and computational investigations of virulence and viral evolution. Importantly, ancestral reconstruction can be done even on the basis of a set of sequences that all correspond to non-functional variants.

  14. The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes.

    PubMed

    Sahl, Jason W; Caporaso, J Gregory; Rasko, David A; Keim, Paul

    2014-01-01

    Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR. Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27-57 h, depending upon the alignment method, using 16 processors. Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical

  15. Even modest prediction accuracy of genomic models can have large clinical utility

    PubMed Central

    Dhurandhar, Emily J.; Vazquez, Ana I.; Argyropoulos, George A.; Allison, David B.

    2014-01-01

    Whole Genome Prediction (WGP) jointly fits thousands of SNPs into a regression model to yield estimates for the contribution of markers to the overall variance of a particular trait, and for their associations with that trait. To date, WGP has offered only modest prediction accuracy, but in some cases even modest prediction accuracy may be useful. We provide an illustration of this using a theoretical simulation that used WGP to predict weight loss after bariatric surgery with moderate accuracy (R2 = 0.07) to assess the clinical utility of WGP despite these limitations. Prevention of Type 2 Diabetes (T2DM) post-surgery was considered the major outcome. Treating only patients above predefined threshold of predicted weight loss in our simulation, in the realistic context of finite resources for the surgery, significantly reduced lifetime risk of T2DM in the treatable population by selecting those most likely to succeed. Thus, our example illustrates how WGP may be clinically useful in some situations, and even with moderate accuracy, may provide a clear path for turning personalized medicine from theory to reality. PMID:25506355

  16. The complete mitochondrial genome of Solemya velum (Mollusca: Bivalvia) and its relationships with Conchifera

    PubMed Central

    2013-01-01

    Background Bivalve mitochondrial genomes exhibit a wide array of uncommon features, like extensive gene rearrangements, large sizes, and unusual ways of inheritance. Species pertaining to the order Solemyida (subclass Opponobranchia) show many peculiar evolutionary adaptations, f.i. extensive symbiosis with chemoautotrophic bacteria. Despite Opponobranchia are central in bivalve phylogeny, being considered the sister group of all Autobranchia, a complete mitochondrial genome has not been sequenced yet. Results In this paper, we characterized the complete mitochondrial genome of the Atlantic awning clam Solemya velum: A-T content, gene arrangement and other features are more similar to putative ancestral mollusks than to other bivalves. Two supranumerary open reading frames are present in a large, otherwise unassigned, region, while the origin of replication could be located in a region upstream to the cox3 gene. Conclusions We show that S. velum mitogenome retains most of the ancestral conchiferan features, which is unusual among bivalve mollusks, and we discuss main peculiarities of this first example of an organellar genome coming from the subclass Opponobranchia. Mitochondrial genomes of Solemya (for bivalves) and Haliotis (for gastropods) seem to retain the original condition of mollusks, as most probably exemplified by Katharina. PMID:23777315

  17. Accuracy of genomic selection models in a large population of open-pollinated families in white spruce

    PubMed Central

    Beaulieu, J; Doerksen, T; Clément, S; MacKay, J; Bousquet, J

    2014-01-01

    Genomic selection (GS) is of interest in breeding because of its potential for predicting the genetic value of individuals and increasing genetic gains per unit of time. To date, very few studies have reported empirical results of GS potential in the context of large population sizes and long breeding cycles such as for boreal trees. In this study, we assessed the effectiveness of marker-aided selection in an undomesticated white spruce (Picea glauca (Moench) Voss) population of large effective size using a GS approach. A discovery population of 1694 trees representative of 214 open-pollinated families from 43 natural populations was phenotyped for 12 wood and growth traits and genotyped for 6385 single-nucleotide polymorphisms (SNPs) mined in 2660 gene sequences. GS models were built to predict estimated breeding values using all the available SNPs or SNP subsets of the largest absolute effects, and they were validated using various cross-validation schemes. The accuracy of genomic estimated breeding values (GEBVs) varied from 0.327 to 0.435 when the training and the validation data sets shared half-sibs that were on average 90% of the accuracies achieved through traditionally estimated breeding values. The trend was also the same for validation across sites. As expected, the accuracy of GEBVs obtained after cross-validation with individuals of unknown relatedness was lower with about half of the accuracy achieved when half-sibs were present. We showed that with the marker densities used in the current study, predictions with low to moderate accuracy could be obtained within a large undomesticated population of related individuals, potentially resulting in larger gains per unit of time with GS than with the traditional approach. PMID:24781808

  18. Excavating the Genome: Large Scale Mutagenesis Screening for the Discovery of New Mouse Models

    PubMed Central

    Sundberg, John P.; Dadras, Soheil S.; Silva, Kathleen A.; Kennedy, Victoria E.; Murray, Stephen A.; Denegre, James; Schofield, Paul N.; King, Lloyd E.; Wiles, Michael; Pratt, C. Herbert

    2016-01-01

    Technology now exists for rapid screening of mutated laboratory mice to identify phenotypes associated with specific genetic mutations. Large repositories exist for spontaneous mutants and those induced by chemical mutagenesis, many of which have never been studied or comprehensively evaluated. To supplement these resources, a variety of techniques have been consolidated in an international effort to create mutations in all known protein coding genes in the mouse. With targeted embryonic stem cell lines now available for almost all protein coding genes and more recently CRISPR/Cas9 technology, large-scale efforts are underway to create novel mutant mouse strains and to characterize their phenotypes. However, accurate diagnosis of skin, hair, and nail diseases still relies on careful gross and histological analysis. While not automated to the level of the physiological phenotyping, histopathology provides the most direct and accurate diagnosis and correlation with human diseases. As a result of these efforts, many new mouse dermatological disease models are being developed. PMID:26551941

  19. A new way to protect privacy in large-scale genome-wide association studies

    PubMed Central

    Kamm, Liina; Bogdanov, Dan; Laur, Sven; Vilo, Jaak

    2013-01-01

    Motivation: Increased availability of various genotyping techniques has initiated a race for finding genetic markers that can be used in diagnostics and personalized medicine. Although many genetic risk factors are known, key causes of common diseases with complex heritage patterns are still unknown. Identification of such complex traits requires a targeted study over a large collection of data. Ideally, such studies bring together data from many biobanks. However, data aggregation on such a large scale raises many privacy issues. Results: We show how to conduct such studies without violating privacy of individual donors and without leaking the data to third parties. The presented solution has provable security guarantees. Contact: jaak.vilo@ut.ee Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23413435

  20. Integrating large-scale functional genomics data to dissect metabolic networks for hydrogen production

    SciTech Connect

    Harwood, Caroline S

    2012-12-17

    The goal of this project is to identify gene networks that are critical for efficient biohydrogen production by leveraging variation in gene content and gene expression in independently isolated Rhodopseudomonas palustris strains. Coexpression methods were applied to large data sets that we have collected to define probabilistic causal gene networks. To our knowledge this a first systems level approach that takes advantage of strain-to strain variability to computationally define networks critical for a particular bacterial phenotypic trait.

  1. Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare

    PubMed Central

    Doublet, Vincent; Ubrig, Elodie; Alioua, Abdelmalek; Bouchon, Didier; Marcadé, Isabelle; Maréchal-Drouard, Laurence

    2015-01-01

    A faithful expression of the mitochondrial DNA is crucial for cell survival. Animal mitochondrial DNA (mtDNA) presents a highly compact gene organization. The typical 16.5 kbp animal mtDNA encodes 13 proteins, 2 rRNAs and 22 tRNAs. In the backyard pillbug Armadillidium vulgare, the rather small 13.9 kbp mtDNA encodes the same set of proteins and rRNAs as compared to animal kingdom mtDNA, but seems to harbor an incomplete set of tRNA genes. Here, we first confirm the expression of 13 tRNA genes in this mtDNA. Then we show the extensive repair of a truncated tRNA, the expression of tRNA involved in large gene overlaps and of tRNA genes partially or fully integrated within protein-coding genes in either direct or opposite orientation. Under selective pressure, overlaps between genes have been likely favored for strong genome size reduction. Our study underlines the existence of unknown biochemical mechanisms for the complete gene expression of A. vulgare mtDNA, and of co-evolutionary processes to keep overlapping genes functional in a compacted mitochondrial genome. PMID:26361137

  2. Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome

    PubMed Central

    Ravasi, Timothy; Suzuki, Harukazu; Pang, Ken C.; Katayama, Shintaro; Furuno, Masaaki; Okunishi, Rie; Fukuda, Shiro; Ru, Kelin; Frith, Martin C.; Gongora, M. Milena; Grimmond, Sean M.; Hume, David A.; Hayashizaki, Yoshihide; Mattick, John S.

    2006-01-01

    Recent large-scale analyses of mainly full-length cDNA libraries generated from a variety of mouse tissues indicated that almost half of all representative cloned sequences did not contain an apparent protein-coding sequence, and were putatively derived from non-protein-coding RNA (ncRNA) genes. However, many of these clones were singletons and the majority were unspliced, raising the possibility that they may be derived from genomic DNA or unprocessed pre-mRNA contamination during library construction, or alternatively represent nonspecific “transcriptional noise.” Here we show, using reverse transcriptase-dependent PCR, microarray, and Northern blot analyses, that many of these clones were derived from genuine transcripts of unknown function whose expression appears to be regulated. The ncRNA transcripts have larger exons and fewer introns than protein-coding transcripts. Analysis of the genomic landscape around these sequences indicates that some cDNA clones were produced not from terminal poly(A) tracts but internal priming sites within longer transcripts, only a minority of which is encompassed by known genes. A significant proportion of these transcripts exhibit tissue-specific expression patterns, as well as dynamic changes in their expression in macrophages following lipopolysaccharide stimulation. Taken together, the data provide strong support for the conclusion that ncRNAs are an important, regulated component of the mammalian transcriptome. PMID:16344565

  3. Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare.

    PubMed

    Doublet, Vincent; Ubrig, Elodie; Alioua, Abdelmalek; Bouchon, Didier; Marcadé, Isabelle; Maréchal-Drouard, Laurence

    2015-01-01

    A faithful expression of the mitochondrial DNA is crucial for cell survival. Animal mitochondrial DNA (mtDNA) presents a highly compact gene organization. The typical 16.5 kbp animal mtDNA encodes 13 proteins, 2 rRNAs and 22 tRNAs. In the backyard pillbug Armadillidium vulgare, the rather small 13.9 kbp mtDNA encodes the same set of proteins and rRNAs as compared to animal kingdom mtDNA, but seems to harbor an incomplete set of tRNA genes. Here, we first confirm the expression of 13 tRNA genes in this mtDNA. Then we show the extensive repair of a truncated tRNA, the expression of tRNA involved in large gene overlaps and of tRNA genes partially or fully integrated within protein-coding genes in either direct or opposite orientation. Under selective pressure, overlaps between genes have been likely favored for strong genome size reduction. Our study underlines the existence of unknown biochemical mechanisms for the complete gene expression of A. vulgare mtDNA, and of co-evolutionary processes to keep overlapping genes functional in a compacted mitochondrial genome.

  4. Genomic characterization of a large panel of patient-derived hepatocellular carcinoma xenograft tumor models for preclinical development.

    PubMed

    Gu, Qingyang; Zhang, Bin; Sun, Hongye; Xu, Qiang; Tan, Yexiong; Wang, Guan; Luo, Qin; Xu, Weiguo; Yang, Shuqun; Li, Jian; Fu, Jing; Chen, Lei; Yuan, Shengxian; Liang, Guibai; Ji, Qunsheng; Chen, Shu-Hui; Chan, Chi-Chung; Zhou, Weiping; Xu, Xiaowei; Wang, Hongyang; Fang, Douglas D

    2015-08-21

    Lack of clinically relevant tumor models dramatically hampers development of effective therapies for hepatocellular carcinoma (HCC). Establishment of patient-derived xenograft (PDX) models that faithfully recapitulate the genetic and phenotypic features of HCC becomes important. In this study, we first established a cohort of 65 stable PDX models of HCC from corresponding Chinese patients. Then we showed that the histology and gene expression patterns of PDX models were highly consistent between xenografts and case-matched original tumors. Genetic alterations, including mutations and DNA copy number alterations (CNAs), of the xenografts correlated well with the published data of HCC patient specimens. Furthermore, differential responses to sorafenib, the standard-of-care agent, in randomly chosen xenografts were unveiled. Finally, in the models expressing high levels of FGFR1 gene according to the genomic data, FGFR1 inhibitor lenvatinib showed greater efficacy than sorafenib. Taken together, our data indicate that PDX models resemble histopathological and genomic characteristics of clinical HCC tumors, as well as recapitulate the differential responses of HCC patients to the standard-of-care treatment. Overall, this large collection of PDX models becomes a clinically relevant platform for drug screening, biomarker discovery and translational research in preclinical setting.

  5. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

    PubMed Central

    Ye, Chengxi; Hill, Christopher M.; Wu, Shigang; Ruan, Jue; Ma, Zhanshan (Sam)

    2016-01-01

    The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost. PMID:27573208

  6. Inferring the Early Evolution of Translation: Ancestral Reconstruction, Compositional Analysis, and Functional Specificity

    NASA Astrophysics Data System (ADS)

    Fournier, G. P.; Gogarten, J. P.

    2010-04-01

    Using ancestral sequence reconstruction and compositional analysis, it is possible to reconstruct the ancestral functions of many enzymes involved in protein synthesis, elucidating the early functional evolution of the translation machinery and genetic code.

  7. Genome-wide association study for the level of serum electrolytes in Italian Large White pigs.

    PubMed

    Bovo, S; Schiavo, G; Mazzoni, G; Dall'Olio, S; Galimberti, G; Calò, D G; Scotti, E; Bertolini, F; Buttazzoni, L; Samorè, A B; Fontanesi, L

    2016-10-01

    Calcium, magnesium and phosphorus are essential electrolytes involved in a large number of biological processes. Imbalance of these minerals in blood may indicate clinically relevant conditions and are important in inferring acute or chronic pathologies in humans and animals. In this work, we carried out a genome-wide association study (GWAS) for the level of these three electrolytes in the serum of 843 performance-tested Italian Large White pigs. All pigs were genotyped with the Illumina PorcineSNP60 BeadChip, and GWAS was carried out using genome-wide efficient mixed-model association. For the level of Ca(2+) , eight single nucleotide polymorphisms (SNPs) were significant, considering a false discovery rate (FDR) < 0.05, and another eight were above the moderate association threshold (Pnominal value  < 5.00E-05). These SNPs are distributed in four porcine chromosomes (SSC): SSC8, SSC11, SSC12 and SSC13. In particular, a few putative different signals of association detected on SSC13 and one on SSC12 were in genes or close to genes involved in calcium metabolism (P2RY1, RAP2B, SLC9A9, C3orf58, TSC22D2, PLCH1 and CACNB1). Only one SNP (on SSC7) and six SNPs (on SSC2 and SSC7) showed moderate association with the level of magnesium and phosphorus respectively. The association signals for these two latter minerals might identify genes not known thus far for playing a role in their biological functions and regulations. In conclusion, our GWAS contributed to increased knowledge on the role that calcium, magnesium and phosphorus may play in the genetically determined physiological mechanisms affecting the natural variability of mineral levels in mammalian blood.

  8. Extensive Capsule Locus Variation and Large-Scale Genomic Recombination within the Klebsiella pneumoniae Clonal Group 258.

    PubMed

    Wyres, Kelly L; Gorrie, Claire; Edwards, David J; Wertheim, Heiman F L; Hsu, Li Yang; Van Kinh, Nguyen; Zadoks, Ruth; Baker, Stephen; Holt, Kathryn E

    2015-04-10

    Klebsiella pneumoniae clonal group (CG) 258, comprising sequence types (STs) 258, 11, and closely related variants, is associated with dissemination of the K. pneumoniae carbapenemase (KPC). Hospital outbreaks of KPC CG258 infections have been observed globally and are very difficult to treat. As a consequence, there is renewed interest in alternative infection control measures such as vaccines and phage or depolymerase treatments targeting the K. pneumoniae polysaccharide capsule. To date, 78 immunologically distinct capsule variants have been described in K. pneumoniae. Previous investigations of ST258 and a small number of closely related strains suggested that capsular variation was limited within this clone; only two distinct ST258 capsule polysaccharide synthesis (cps) loci have been identified, both acquired through large-scale recombination events (>50 kb). In contrast to previous studies, we report a comparative genomic analysis of the broader K. pneumoniae CG258 (n = 39). We identified 11 different cps loci within CG258, indicating that capsular switching is actually common within the complex. We observed several insertion sequences (IS) within the cps loci, and show further intraclone diversification of two cps loci through IS activity. Our data also indicate that several large-scale recombination events have shaped the genomes of CG258, and that definition of the complex should be broadened to include ST395 (also reported to harbor KPC). As only the second report of extensive intraclonal cps variation among Gram-negative bacterial species, our findings alter our understanding of the evolution of these organisms and have key implications for the design of control measures targeting K. pneumoniae capsules.

  9. Organization of the large mitochondrial genome in the isopod Armadillidium vulgare.

    PubMed Central

    Raimond, R; Marcadé, I; Bouchon, D; Rigaud, T; Bossy, J P; Souty-Grosset, C

    1999-01-01

    The mitochondrial DNA (mtDNA) in animals is generally a circular molecule of approximately 15 kb, but there are many exceptions such as linear molecules and larger ones. RFLP studies indicated that the mtDNA in the terrestrial isopod Armadillidium vulgare varied from 20 to 42 kb. This variation depended on the restriction enzyme used, and on the restriction profile generated by a given enzyme. The DNA fragments had characteristic electrophoretic behaviors. Digestions with two endonucleases always generated fewer fragments than expected; denaturation of restriction profiles reduced the size of two bands by half; densitometry indicated that a number of small fragments were present in stoichiometry, which has approximately twice the expected concentration. Finally, hybridization to a 550-bp 16S rDNA probe often revealed two copies of this gene. These results cannot be due to the genetic rearrangements generally invoked to explain large mtDNA. We propose that the large A. vulgare mtDNA is produced by the tripling of a 14-kb monomer with a singular rearrangement: one monomer is linear and the other two form a circular dimer. Densitometry suggested that these two molecular structures were present in different proportions within a single individual. The absence of mutations within the dimers also suggests that replication occurs during the monomer phase. PMID:9872960

  10. Ontology-based annotations and semantic relations in large-scale (epi)genomics data.

    PubMed

    Galeota, Eugenia; Pelizzola, Mattia

    2016-05-03

    Public repositories of large-scale biological data currently contain hundreds of thousands of experiments, including high-throughput sequencing and microarray data. The potential of using these resources to assemble data sets combining samples previously not associated is vastly unexplored. This requires the ability to associate samples with clear annotations and to relate experiments matched with different annotation terms. In this study, we illustrate the semantic annotation of Gene Expression Omnibus samples metadata using concepts from biomedical ontologies, focusing on the association of thousands of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) samples with a given target, tissue and disease state. Next, we demonstrate the feasibility of quantitatively measuring the semantic similarity between different samples, with the aim of combining experiments associated with the same or similar semantic annotations, thus allowing the generation of large data sets without the need of additional experiments. We compared tools based on Unified Medical Language System with tools that use topic-specific ontologies, showing that the second approach outperforms the first both in the annotation process and in the computation of semantic similarity measures. Finally, we demonstrated the potential of this approach by identifying semantically homogeneous groups of ChIP-seq samples targeting the Myc transcription factor, and expanding this data set with semantically coherent epigenetic samples. The semantic information of these data sets proved to be coherent with the ChIP-seq signal and with the current knowledge about this transcription factor.

  11. Genome-wide scans to detect positive selection in Large White and Tongcheng pigs.

    PubMed

    Li, Xiuling; Yang, Songbai; Tang, Zhonglin; Li, Kui; Rothschild, Max F; Liu, Bang; Fan, Bin

    2014-06-01

    Due to the direction, intensity, duration and consistency of genetic selection, especially recent artificial selection, the production performance of domestic pigs has been greatly changed. Therefore, we reasoned that there must be footprints or selection signatures that had been left during domestication. In this study, with porcine 60K BeadChip genotyping data from both commercial Large White and local Chinese Tongcheng pigs, we calculated the extended haplotype homozygosity values of the two breeds using the long-range haplotype method to detect selection signatures. We found 34 candidate regions, including 61 known genes, from Large White pigs and 25 regions comprising 57 known genes from Tongcheng pigs. Many selection signatures were found on SSC1, SSC4, SSC7 and SSC14 regions in both populations. According to quantitative trait loci and network pathway analyses, most of the regions and genes were linked to growth, reproduction and immune responses. In addition, the average genetic differentiation coefficient FST was 0.254, which means that there had already been a significant differentiation between the breeds. The findings from this study can contribute to further research on molecular mechanisms of pig evolution and domestication and also provide valuable references for improvement of their breeding and cultivation.

  12. Bioinformatic evidence and characterization of novel putative large conjugative transposons residing in genomes of genera Bacteroides and Prevotella.

    PubMed

    Gorenc, Katja; Accetto, Tomaž; Avguštin, Gorazd

    2012-07-01

    Bioinformatic evidence of the presence of a large conjugative transposon in ruminal bacterium Prevotella bryantii B(1)4(T) is presented. The described transposon appears to be related to another large conjugative transposon CTnBST, described in Bacteroides uniformis WH207 and to the conjugative transposon CTn3-Bf, which was observed in the genome of Bacteroides fragilis strain YCH46. All three transposons share tra gene regions with high amino acid identity and clearly conserved gene order. Additionally, a second conserved region consisting of hypothetical genes was discovered in all three transposons and named the GG region. This region served as a specific sequence signature and made possible the discovery of several other apparently related hypothetical conjugative transposons in bacteria from the genus Bacteroides. A cluster of genes involved in sugar utilization and metabolism was discovered within the hypothetical CTnB(1)4, to a certain extent resembling the polysaccharide utilization loci which were described recently in some Bacteroides strains. This is the first firm report on the presence of a large mobile genetic element in any strain from the genus Prevotella.

  13. Genome-Wide Association Study of Event-Free Survival in Diffuse Large B-Cell Lymphoma Treated With Immunochemotherapy

    PubMed Central

    Ghesquieres, Hervé; Slager, Susan L.; Jardin, Fabrice; Veron, Amelie S.; Asmann, Yan W.; Maurer, Matthew J.; Fest, Thierry; Habermann, Thomas M.; Bene, Marie C.; Novak, Anne J.; Mareschal, Sylvain; Haioun, Corinne; Lamy, Thierry; Ansell, Stephen M.; Tilly, Herve; Witzig, Thomas E.; Weiner, George J.; Feldman, Andrew L.; Dogan, Ahmet; Cunningham, Julie M.; Olswold, Curtis L.; Molina, Thierry Jo; Link, Brian K.; Milpied, Noel; Cox, David G.; Salles, Gilles A.; Cerhan, James R.

    2015-01-01

    Purpose We performed a multistage genome-wide association study to identify inherited genetic variants that predict outcome in diffuse large B-cell lymphoma patients treated with immunochemotherapy. Methods We conducted a meta-analysis of two genome-wide association study data sets, one from the LNH2003B trial (N = 540), a prospective clinical trial from the Lymphoma Study Association, and the other from the Molecular Epidemiology Resource study (N = 312), a prospective observational study from the University of Iowa–Mayo Clinic Lymphoma Specialized Program of Research Excellence. Top single nucleotide polymorphisms were then genotyped in independent cohorts of patients from the Specialized Program of Research Excellence (N = 391) and the Groupe Ouest-Est des Leucémies Aiguës et Maladies du Sang (GOELAMS) -075 randomized trial (N = 294). We calculated the hazard ratios (HRs) and 95% CIs for event-free survival (EFS) and overall survival (OS) using a log-additive genetic model with adjustment for age, sex, and age-adjusted International Prognostic Index. Results In a meta-analysis of the four studies, the top loci for EFS were marked by rs7712513 at 5q23.2 (near SNX2 and SNCAIP; HR, 1.39; 95% CI, 1.23 to 1.57; P = 2.08 × 10−7), and rs7765004 at 6q21 (near MARCKS and HDAC2; HR, 1.38; 95% CI, 1.22 to 1.57; P = 7.09 × 10−7), although they did not reach conventional genome-wide significance (P = 5 × 10−8). Both rs7712513 (HR, 1.49; 95% CI, 1.29 to 1.72; P = 3.53 × 10−8) and rs7765004 (HR, 1.47; 95% CI, 1.27 to 1.71; P = 5.36 × 10−7) were also associated with OS. In exploratory analyses, a two–single nucleotide polymorphism risk score was highly predictive of EFS (P = 1.78 × 10−12) and was independent of treatment, IPI, and cell-of-origin classification. Conclusion Our study provides encouraging evidence for associations between loci at 5q23.2 and 6q21 with EFS and OS in patients with diffuse large B-cell lymphoma treated with immunochemotherapy

  14. Evolution of complex resistance transposons from an ancestral mercury transposon.

    PubMed

    Tanaka, M; Yamamoto, T; Sawai, T

    1983-03-01

    The molecular interrelationship of a transposon family which confers multiple antibiotic resistance and is assumed to have been generated from an ancestral mercury transposon was analyzed. Initially, the transposons Tn2613 (7.2 kilobases), encoding mercury resistance, and Tn2608 (13.5 kilobases), encoding mercury, streptomycin, and sulfonamide resistances, were isolated and their structures were analyzed. Next, the following transposons were compared with respect to their genetic and physical maps: Tn2613 and Tn501, encoding mercury resistance; Tn2608 and Tn21, encoding mercury, streptomycin, and sulfonamide resistance; Tn2607 and Tn4, encoding streptomycin, sulfonamide, and ampicillin resistance; and Tn2603, encoding mercury, streptomycin, sulfonamide, and ampicillin resistance. The results suggest that the transposons encoding multiple resistance were evolved from an ancestral mercury transposon.

  15. Mitogenomics and phylogenomics reveal priapulid worms as extant models of the ancestral Ecdysozoan.

    PubMed

    Webster, Bonnie L; Copley, Richard R; Jenner, Ronald A; Mackenzie-Dodds, Jacqueline A; Bourlat, Sarah J; Rota-Stabelli, Omar; Littlewood, D T J; Telford, Maximilian J

    2006-01-01

    Research into arthropod evolution is hampered by the derived nature and rapid evolution of the best-studied out-group: the nematodes. We consider priapulids as an alternative out-group. Priapulids are a small phylum of bottom-dwelling marine worms; their tubular body with spiny proboscis or introvert has changed little over 520 million years and recognizable priapulids are common among exceptionally preserved Cambrian fossils. Using the complete mitochondrial genome and 42 nuclear genes from Priapulus caudatus, we show that priapulids are slowly evolving ecdysozoans; almost all these priapulid genes have evolved more slowly than nematode orthologs and the priapulid mitochondrial gene order may be unchanged since the Cambrian. Considering their primitive bodyplan and embryology and the great conservation of both nuclear and mitochondrial genomes, priapulids may deserve the popular epithet of "living fossil." Their study is likely to yield significant new insights into the early evolution of the Ecdysozoa and the origins of the arthropods and their kin as well as aiding inference of the morphology of ancestral Ecdysozoa and Bilateria and their genomes.

  16. The ancestral eutherian karyotype is present in Xenarthra.

    PubMed

    Svartman, Marta; Stone, Gary; Stanyon, Roscoe

    2006-07-01

    Molecular studies have led recently to the proposal of a new super-ordinal arrangement of the 18 extant Eutherian orders. From the four proposed super-orders, Afrotheria and Xenarthra were considered the most basal. Chromosome-painting studies with human probes in these two mammalian groups are thus key in the quest to establish the ancestral Eutherian karyotype. Although a reasonable amount of chromosome-painting data with human probes have already been obtained for Afrotheria, no Xenarthra species has been thoroughly analyzed with this approach. We hybridized human chromosome probes to metaphases of species (Dasypus novemcinctus, Tamandua tetradactyla, and Choloepus hoffmanii) representing three of the four Xenarthra families. Our data allowed us to review the current hypotheses for the ancestral Eutherian karyotype, which range from 2n = 44 to 2n = 48. One of the species studied, the two-toed sloth C. hoffmanii (2n = 50), showed a chromosome complement strikingly similar to the proposed 2n = 48 ancestral Eutherian karyotype, strongly reinforcing it.

  17. The Ancestral Eutherian Karyotype Is Present in Xenarthra

    PubMed Central

    Svartman, Marta; Stone, Gary; Stanyon, Roscoe

    2006-01-01

    Molecular studies have led recently to the proposal of a new super-ordinal arrangement of the 18 extant Eutherian orders. From the four proposed super-orders, Afrotheria and Xenarthra were considered the most basal. Chromosome-painting studies with human probes in these two mammalian groups are thus key in the quest to establish the ancestral Eutherian karyotype. Although a reasonable amount of chromosome-painting data with human probes have already been obtained for Afrotheria, no Xenarthra species has been thoroughly analyzed with this approach. We hybridized human chromosome probes to metaphases of species (Dasypus novemcinctus, Tamandua tetradactyla, and Choloepus hoffmanii) representing three of the four Xenarthra families. Our data allowed us to review the current hypotheses for the ancestral Eutherian karyotype, which range from 2n = 44 to 2n = 48. One of the species studied, the two-toed sloth C. hoffmanii (2n = 50), showed a chromosome complement strikingly similar to the proposed 2n = 48 ancestral Eutherian karyotype, strongly reinforcing it. PMID:16848642

  18. An ancestral bacterial division system is widespread in eukaryotic mitochondria

    PubMed Central

    Leger, Michelle M.; Petrů, Markéta; Žárský, Vojtěch; Eme, Laura; Vlček, Čestmír; Harding, Tommy; Lang, B. Franz; Eliáš, Marek; Doležal, Pavel; Roger, Andrew J.

    2015-01-01

    Bacterial division initiates at the site of a contractile Z-ring composed of polymerized FtsZ. The location of the Z-ring in the cell is controlled by a system of three mutually antagonistic proteins, MinC, MinD, and MinE. Plastid division is also known to be dependent on homologs of these proteins, derived from the ancestral cyanobacterial endosymbiont that gave rise to plastids. In contrast, the mitochondria of model systems such as Saccharomyces cerevisiae, mammals, and Arabidopsis thaliana seem to have replaced the ancestral α-proteobacterial Min-based division machinery with host-derived dynamin-related proteins that form outer contractile rings. Here, we show that the mitochondrial division system of these model organisms is the exception, rather than the rule, for eukaryotes. We describe endosymbiont-derived, bacterial-like division systems comprising FtsZ and Min proteins in diverse less-studied eukaryote protistan lineages, including jakobid and heterolobosean excavates, a malawimonad, stramenopiles, amoebozoans, a breviate, and an apusomonad. For two of these taxa, the amoebozoan Dictyostelium purpureum and the jakobid Andalucia incarcerata, we confirm a mitochondrial localization of these proteins by their heterologous expression in Saccharomyces cerevisiae. The discovery of a proteobacterial-like division system in mitochondria of diverse eukaryotic lineages suggests that it was the ancestral feature of all eukaryotic mitochondria and has been supplanted by a host-derived system multiple times in distinct eukaryote lineages. PMID:25831547

  19. Ancestral facial morphology of Old World higher primates.

    PubMed Central

    Benefit, B R; McCrossin, M L

    1991-01-01

    Fossil remains of the cercopithecoid Victoria-pithecus recently recovered from middle Miocene deposits of Maboko Island (Kenya) provide evidence of the cranial anatomy of Old World monkeys prior to the evolutionary divergence of the extant subfamilies Colobinae and Cercopithecinae. Victoria-pithecus shares a suite of craniofacial features with the Oligocene catarrhine Aegyptopithecus and early Miocene hominoid Afropithecus. All three genera manifest supraorbital costae, anteriorly convergent temporal lines, the absence of a postglabellar fossa, a moderate to long snout, great facial height below the orbits, a deep cheek region, and anteriorly tapering premaxilla. The shared presence of these features in a catarrhine generally ancestral to apes and Old World monkeys, an early ape, and an early Old World monkey indicates that they are primitive characteristics that typified the last common ancestor of Hominoidea and Cercopithecoidea. These results contradict prevailing cranial morphotype reconstructions for ancestral catarrhines as Colobus- or Hylobates-like, characterized by a globular anterior braincase and orthognathy. By resolving several equivocal craniofacial morphocline polarities, these discoveries lay the foundation for a revised interpretation of the ancestral cranial morphology of Catarrhini more consistent with neontological and existing paleontological evidence. Images PMID:2052606

  20. Ancestral facial morphology of Old World higher primates.

    PubMed

    Benefit, B R; McCrossin, M L

    1991-06-15

    Fossil remains of the cercopithecoid Victoria-pithecus recently recovered from middle Miocene deposits of Maboko Island (Kenya) provide evidence of the cranial anatomy of Old World monkeys prior to the evolutionary divergence of the extant subfamilies Colobinae and Cercopithecinae. Victoria-pithecus shares a suite of craniofacial features with the Oligocene catarrhine Aegyptopithecus and early Miocene hominoid Afropithecus. All three genera manifest supraorbital costae, anteriorly convergent temporal lines, the absence of a postglabellar fossa, a moderate to long snout, great facial height below the orbits, a deep cheek region, and anteriorly tapering premaxilla. The shared presence of these features in a catarrhine generally ancestral to apes and Old World monkeys, an early ape, and an early Old World monkey indicates that they are primitive characteristics that typified the last common ancestor of Hominoidea and Cercopithecoidea. These results contradict prevailing cranial morphotype reconstructions for ancestral catarrhines as Colobus- or Hylobates-like, characterized by a globular anterior braincase and orthognathy. By resolving several equivocal craniofacial morphocline polarities, these discoveries lay the foundation for a revised interpretation of the ancestral cranial morphology of Catarrhini more consistent with neontological and existing paleontological evidence.

  1. Cases In Which Ancestral Maximum Likelihood Will Be Confusingly Misleading.

    PubMed

    Handelman, Tomer; Chor, Benny

    2017-03-02

    Ancestral maximum likelihood (AML) is a phylogenetic tree reconstruction criteria that "lies between" maximum parsimony (MP) and maximum likelihood (ML). ML has long been known to be statistically consistent. On the other hand, Felsenstein (1978) showed that MP is statistically inconsistent, and even positively misleading: There are cases where the parsimony criteria, applied to data generated according to one tree topology, will be optimized on a different tree topology. The question of weather AML is statistically consistent or not has been open for a long time. Mosel, Roch, and Steel (2009) have shown that AML can "shrink" short tree edges, resulting in a star tree with no internal resolution, which yields a better AML score than the original (resolved) model. This result implies that AML is statistically inconsistent, but not that it is positively misleading, because the star tree is compatible with any other topology. We show that AML is confusingly misleading: For some simple, four taxa (resolved) tree, the ancestral likelihood optimization criteria is maximized on an incorrect (resolved) tree topology, as well as on a star tree (both with specific edge lengths), while the tree with the original, correct topology, has strictly lower ancestral likelihood. Interestingly, the two short edges in the incorrect, resolved tree topology are of length zero, and are not adjacent, so this resolved tree is in fact a simple path. While for MP, the underlying phenomenon can be described as long edge attraction, it turns out that here we have long edge repulsion.

  2. An ancestral bacterial division system is widespread in eukaryotic mitochondria.

    PubMed

    Leger, Michelle M; Petrů, Markéta; Žárský, Vojtěch; Eme, Laura; Vlček, Čestmír; Harding, Tommy; Lang, B Franz; Eliáš, Marek; Doležal, Pavel; Roger, Andrew J

    2015-08-18

    Bacterial division initiates at the site of a contractile Z-ring composed of polymerized FtsZ. The location of the Z-ring in the cell is controlled by a system of three mutually antagonistic proteins, MinC, MinD, and MinE. Plastid division is also known to be dependent on homologs of these proteins, derived from the ancestral cyanobacterial endosymbiont that gave rise to plastids. In contrast, the mitochondria of model systems such as Saccharomyces cerevisiae, mammals, and Arabidopsis thaliana seem to have replaced the ancestral α-proteobacterial Min-based division machinery with host-derived dynamin-related proteins that form outer contractile rings. Here, we show that the mitochondrial division system of these model organisms is the exception, rather than the rule, for eukaryotes. We describe endosymbiont-derived, bacterial-like division systems comprising FtsZ and Min proteins in diverse less-studied eukaryote protistan lineages, including jakobid and heterolobosean excavates, a malawimonad, stramenopiles, amoebozoans, a breviate, and an apusomonad. For two of these taxa, the amoebozoan Dictyostelium purpureum and the jakobid Andalucia incarcerata, we confirm a mitochondrial localization of these proteins by their heterologous expression in Saccharomyces cerevisiae. The discovery of a proteobacterial-like division system in mitochondria of diverse eukaryotic lineages suggests that it was the ancestral feature of all eukaryotic mitochondria and has been supplanted by a host-derived system multiple times in distinct eukaryote lineages.

  3. Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference.

    PubMed

    Ni, Xumin; Yang, Xiong; Guo, Wei; Yuan, Kai; Zhou, Ying; Ma, Zhiming; Xu, Shuhua

    2016-01-28

    The length of ancestral tracks decays with the passing of generations which can be used to infer population admixture histories. Previous studies have shown the power in recovering the histories of admixed populations via the length distributions of ancestral tracks even under simple models. We believe that the deduction of length distributions under a general model will greatly elevate the power. Here we first deduced the length distributions under a general model and proposed general principles in parameter estimation and model selection with the deduced length distributions. Next, we focused on studying the length distributions and its applications under three typical special cases. Extensive simulations showed that the length distributions of ancestral tracks were well predicted by our theoretical framework. We further developed a new method, AdmixInfer, based on the length distributions and good performance was observed when it was applied to infer population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, good performance was also observed when applied to some real datasets of African Americans, Mexicans and South Asian populations from the HapMap project and the Human Genome Diversity Project.

  4. Germline large genomic alterations on 7q in patients with multiple primary cancers

    PubMed Central

    Villacis, R. A. R.; Basso, T. R.; Canto, L. M.; Nóbrega, A. F.; Achatz, M. I.; Rogatto, S. R.

    2017-01-01

    Patients with multiple primary cancers (MPCs) are suspected to have a hereditary cancer syndrome. However, only a small proportion may be explained by mutations in high-penetrance genes. We investigate two unrelated MPC patients that met Hereditary Breast and Ovaria Cancer criteria, both presenting triple negative breast tumors and no mutations in BRCA1, BRCA2 and TP53 genes. Germline rearrangements on chromosome 7q, involving over 40 Mb of the same region, were found in both patients: one with mosaic loss (80% of cells) and the other with cnLOH (copy-neutral loss of heterozygosity) secondary to maternal allele duplication. Five children tested had no alterations on 7q. The patients shared 330 genes in common on 7q22.1-q34, including several tumor suppressor genes (TSGs) previously related to breast cancer risk and imprinted genes. The analysis of the triple negative BC from one patient revealed a mosaic gain of 7q translated for over-expressed cancer-related genes. The involvement of TSGs and imprinted genes, mapped on 7q, has the potential of being associated to MPC risk, as well as cancer progression. To our knowledge, this is the first description of patients with MPCs that harbor constitutive large alterations on 7q. PMID:28139749

  5. Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing

    PubMed Central

    Nadeau, Nicola J.; Whibley, Annabel; Jones, Robert T.; Davey, John W.; Dasmahapatra, Kanchon K.; Baxter, Simon W.; Quail, Michael A.; Joron, Mathieu; ffrench-Constant, Richard H.; Blaxter, Mark L.; Mallet, James; Jiggins, Chris D.

    2012-01-01

    Heliconius butterflies represent a recent radiation of species, in which wing pattern divergence has been implicated in speciation. Several loci that control wing pattern phenotypes have been mapped and two were identified through sequencing. These same gene regions play a role in adaptation across the whole Heliconius radiation. Previous studies of population genetic patterns at these regions have sequenced small amplicons. Here, we use targeted next-generation sequence capture to survey patterns of divergence across these entire regions in divergent geographical races and species of Heliconius. This technique was successful both within and between species for obtaining high coverage of almost all coding regions and sufficient coverage of non-coding regions to perform population genetic analyses. We find major peaks of elevated population differentiation between races across hybrid zones, which indicate regions under strong divergent selection. These ‘islands’ of divergence appear to be more extensive between closely related species, but there is less clear evidence for such islands between more distantly related species at two further points along the ‘speciation continuum’. We also sequence fosmid clones across these regions in different Heliconius melpomene races. We find no major structural rearrangements but many relatively large (greater than 1 kb) insertion/deletion events (including gain/loss of transposable elements) that are variable between races. PMID:22201164

  6. State of cat genomics.

    PubMed

    O'Brien, Stephen J; Johnson, Warren; Driscoll, Carlos; Pontius, Joan; Pecon-Slattery, Jill; Menotti-Raymond, Marilyn

    2008-06-01

    Our knowledge of cat family biology was recently expanded to include a genomics perspective with the completion of a draft whole genome sequence of an Abyssinian cat. The utility of the new genome information has been demonstrated by applications ranging from disease gene discovery and comparative genomics to species conservation. Patterns of genomic organization among cats and inbred domestic cat breeds have illuminated our view of domestication, revealing linkage disequilibrium tracks consequent of breed formation, defining chromosome exchanges that punctuated major lineages of mammals and suggesting ancestral continental migration events that led to 37 modern species of Felidae. We review these recent advances here. As the genome resources develop, the cat is poised to make a major contribution to many areas in genetics and biology.

  7. Characterization and distribution of retrotransposons and simple sequence repeats in the bovine genome

    PubMed Central

    Adelson, David L.; Raison, Joy M.; Edgar, Robert C.

    2009-01-01

    Interspersed repeat composition and distribution in mammals have been best characterized in the human and mouse genomes. The bovine genome contains typical eutherian mammal repeats, but also has a significant number of long interspersed nuclear element RTE (BovB) elements proposed to have been horizontally transferred from squamata. Our analysis of the BovB repeats has indicated that only a few of them are currently likely to retrotranspose in cattle. However, bovine L1 repeats (L1 BT) have many likely active copies. Comparison of substitution rates for BovB and L1 BT indicates that L1 BT is a younger repeat family than BovB. In contrast to mouse and human, L1 occurrence is not negatively correlated with G+C content. However, BovB, Bov A2, ART2A, and Bov-tA are negatively correlated with G+C, although Bov-tAs correlation is weaker. Also, by performing genome wide correlation analysis of interspersed and simple sequence repeats, we have identified genome territories by repeat content that appear to define ancestral vs. ruminant-specific genomic regions. These ancestral regions, enriched with L2 and MIR repeats, are largely conserved between bovine and human. PMID:19625614

  8. Mutation accumulation in real branches: fitness assays for genomic deleterious mutation rate and effect in large-statured plants.

    PubMed

    Schultz, Stewart T; Scofield, Douglas G

    2009-08-01

    The genomic deleterious mutation rate and mean effect are central to the biology and evolution of all species. Large-statured plants, such as trees, are predicted to have high mutation rates due to mitotic mutation and the absence of a sheltered germ line, but their size and generation time has hindered genetic study. We develop and test approaches for estimating deleterious mutation rates and effects from viability comparisons within the canopy of large-statured plants. Our methods, inspired by E. J. Klekowski, are a modification of the classic Bateman-Mukai mutation-accumulation experiment. Within a canopy, cell lineages accumulate mitotic mutations independently. Gametes or zygotes produced at more distal points by these cell lineages contain more mitotic mutations than those at basal locations, and within-flower selfs contain more homozygous mutations than between-flower selfs. The resulting viability differences allow demonstration of lethal mutation with experiments similar in size to assays of genetic load and allow estimates of the rate and effect of new mutations with moderate precision and bias similar to that of classic mutation-accumulation experiments in small-statured organisms. These methods open up new possibilities with the potential to provide valuable new insights into the evolutionary genetics of plants.

  9. A genome-wide association study in large white and landrace pig populations for number piglets born alive.

    PubMed

    Bergfelder-Drüing, Sarah; Grosse-Brinkhaus, Christine; Lind, Bianca; Erbe, Malena; Schellander, Karl; Simianer, Henner; Tholen, Ernst

    2015-01-01

    The number of piglets born alive (NBA) per litter is one of the most important traits in pig breeding due to its influence on production efficiency. It is difficult to improve NBA because the heritability of the trait is low and it is governed by a high number of loci with low to moderate effects. To clarify the biological and genetic background of NBA, genome-wide association studies (GWAS) were performed using 4,012 Large White and Landrace pigs from herdbook and commercial breeding companies in Germany (3), Austria (1) and Switzerland (1). The animals were genotyped with the Illumina PorcineSNP60 BeadChip. Because of population stratifications within and between breeds, clusters were formed using the genetic distances between the populations. Five clusters for each breed were formed and analysed by GWAS approaches. In total, 17 different significant markers affecting NBA were found in regions with known effects on female reproduction. No overlapping significant chromosome areas or QTL between Large White and Landrace breed were detected.

  10. Evolutionary site-number changes of ribosomal DNA loci during speciation: complex scenarios of ancestral and more recent polyploid events.

    PubMed

    Rosato, Marcela; Moreno-Saiz, Juan C; Galián, José A; Rosselló, Josep A

    2015-11-16

    Several genome duplications have been identified in the evolution of seed plants, providing unique systems for studying karyological processes promoting diversification and speciation. Knowledge about the number of ribosomal DNA (rDNA) loci, together with their chromosomal distribution and structure, provides clues about organismal and molecular evolution at various phylogenetic levels. In this work, we aim to elucidate the evolutionary dynamics of karyological and rDNA site-number variation in all known taxa of subtribe Vellinae, showing a complex scenario of ancestral and more recent polyploid events. Specifically, we aim to infer the ancestral chromosome numbers and patterns of chromosome number variation, assess patterns of variation of both 45S and 5S rDNA families, trends in site-number change of rDNA loci within homoploid and polyploid series, and reconstruct the evolutionary history of rDNA site number using a phylogenetic hypothesis as a framework. The best-fitting model of chromosome number evolution with a high likelihood score suggests that the Vellinae core showing x = 17 chromosomes arose by duplication events from a recent x = 8 ancestor. Our survey suggests more complex patterns of polyploid evolution than previously noted for Vellinae. High polyploidization events (6x, 8x) arose independently in the basal clade Vella castrilensis-V. lucentina, where extant diploid species are unknown. Reconstruction of ancestral rDNA states in Vellinae supports the inference that the ancestral number of loci in the subtribe was two for each multigene family, suggesting that an overall tendency towards a net loss of 5S rDNA loci occurred during the splitting of Vellinae ancestors from the remaining Brassiceae lineages. A contrasting pattern for rDNA site change in both paleopolyploid and neopolyploid species was linked to diversification of Vellinae lineages. This suggests dynamic and independent changes in rDNA site number during speciation processes and a

  11. RNA-seq pinpoints a Xanthomonas TAL-effector activated resistance gene in a large-crop genome.

    PubMed

    Strauss, Tina; van Poecke, Remco M P; Strauss, Annett; Römer, Patrick; Minsavage, Gerald V; Singh, Sylvia; Wolf, Christina; Strauss, Axel; Kim, Seungill; Lee, Hyun-Ah; Yeom, Seon-In; Parniske, Martin; Stall, Robert E; Jones, Jeffrey B; Choi, Doil; Prins, Marcel; Lahaye, Thomas

    2012-11-20

    Transcription activator-like effector (TALE) proteins of the plant pathogenic bacterial genus Xanthomonas bind to and transcriptionally activate host susceptibility genes, promoting disease. Plant immune systems have taken advantage of this mechanism by evolving TALE binding sites upstream of resistance (R) genes. For example, the pepper Bs3 and rice Xa27 genes are hypersensitive reaction plant R genes that are transcriptionally activated by corresponding TALEs. Both R genes have a hallmark expression pattern in which their transcripts are detectable only in the presence and not the absence of the corresponding TALE. By transcriptome profiling using next-generation sequencing (RNA-seq), we tested whether we could avoid laborious positional cloning for the isolation of TALE-induced R genes. In a proof-of-principle experiment, RNA-seq was used to identify a candidate for Bs4C, an R gene from pepper that mediates recognition of the Xanthomonas TALE protein AvrBs4. We identified one major Bs4C candidate transcript by RNA-seq that was expressed exclusively in the presence of AvrBs4. Complementation studies confirmed that the candidate corresponds to the Bs4C gene and that an AvrBs4 binding site in the Bs4C promoter directs its transcriptional activation. Comparison of Bs4C with a nonfunctional allele that is unable to recognize AvrBs4 revealed a 2-bp polymorphism within the TALE binding site of the Bs4C promoter. Bs4C encodes a structurally unique R protein and Bs4C-like genes that are present in many solanaceous genomes seem to be as tightly regulated as pepper Bs4C. These findings demonstrate that TALE-specific R genes can be cloned from large-genome crops with a highly efficient RNA-seq approach.

  12. ``Black Holes" and Bacterial Pathogenicity: A Large Genomic Deletion that Enhances the Virulence of Shigella spp. and Enteroinvasive Escherichia coli

    NASA Astrophysics Data System (ADS)

    Maurelli, Anthony T.; Fernandez, Reinaldo E.; Bloch, Craig A.; Rode, Christopher K.; Fasano, Alessio

    1998-03-01

    Plasmids, bacteriophages, and pathogenicity islands are genomic additions that contribute to the evolution of bacterial pathogens. For example, Shigella spp., the causative agents of bacillary dysentery, differ from the closely related commensal Escherichia coli in the presence of a plasmid in Shigella that encodes virulence functions. However, pathogenic bacteria also may lack properties that are characteristic of nonpathogens. Lysine decarboxylate (LDC) activity is present in ≈ 90% of E. coli strains but is uniformly absent in Shigella strains. When the gene for LDC, cadA, was introduced into Shigella flexneri 2a, virulence became attenuated, and enterotoxin activity was inhibited greatly. The enterotoxin inhibitor was identified as cadaverine, a product of the reaction catalyzed by LDC. Comparison of the S. flexneri 2a and laboratory E. coli K-12 genomes in the region of cadA revealed a large deletion in Shigella. Representative strains of Shigella spp. and enteroinvasive E. coli displayed similar deletions of cadA. Our results suggest that, as Shigella spp. evolved from E. coli to become pathogens, they not only acquired virulence genes on a plasmid but also shed genes via deletions. The formation of these ``black holes,'' deletions of genes that are detrimental to a pathogenic lifestyle, provides an evolutionary pathway that enables a pathogen to enhance virulence. Furthermore, the demonstration that cadaverine can inhibit enterotoxin activity may lead to more general models about toxin activity or entry into cells and suggests an avenue for antitoxin therapy. Thus, understanding the role of black holes in pathogen evolution may yield clues to new treatments of infectious diseases.

  13. "Black holes" and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli.

    PubMed

    Maurelli, A T; Fernández, R E; Bloch, C A; Rode, C K; Fasano, A

    1998-03-31

    Plasmids, bacteriophages, and pathogenicity islands are genomic additions that contribute to the evolution of bacterial pathogens. For example, Shigella spp., the causative agents of bacillary dysentery, differ from the closely related commensal Escherichia coli in the presence of a plasmid in Shigella that encodes virulence functions. However, pathogenic bacteria also may lack properties that are characteristic of nonpathogens. Lysine decarboxylase (LDC) activity is present in approximately 90% of E. coli strains but is uniformly absent in Shigella strains. When the gene for LDC, cadA, was introduced into Shigella flexneri 2a, virulence became attenuated, and enterotoxin activity was inhibited greatly. The enterotoxin inhibitor was identified as cadaverine, a product of the reaction catalyzed by LDC. Comparison of the S. flexneri 2a and laboratory E. coli K-12 genomes in the region of cadA revealed a large deletion in Shigella. Representative strains of Shigella spp. and enteroinvasive E. coli displayed similar deletions of cadA. Our results suggest that, as Shigella spp. evolved from E. coli to become pathogens, they not only acquired virulence genes on a plasmid but also shed genes via deletions. The formation of these "black holes," deletions of genes that are detrimental to a pathogenic lifestyle, provides an evolutionary pathway that enables a pathogen to enhance virulence. Furthermore, the demonstration that cadaverine can inhibit enterotoxin activity may lead to more general models about toxin activity or entry into cells and suggests an avenue for antitoxin therapy. Thus, understanding the role of black holes in pathogen evolution may yield clues to new treatments of infectious diseases.

  14. Genomic structure and promoter functional analysis of GnRH3 gene in large yellow croaker (Larimichthys crocea).

    PubMed

    Huang, Wei; Zhang, Jianshe; Liao, Zhi; Lv, Zhenming; Wu, Huifei; Zhu, Aiyi; Wu, Changwen

    2016-01-15

    Gonadotropin-releasing hormone III (GnRH3) is considered to be a key neurohormone in fish reproduction control. In the present study, the cDNA and genomic sequences of GnRH3 were cloned and characterized from large yellow croaker Larimichthys crocea. The cDNA encoded a protein of 99 amino acids with four functional motifs. The full-length genome sequence was composed of 3797 nucleotides, including four exons and three introns. Higher identities of amino acid sequences and conserved exon-intron organizations were found between LcGnRH3 and other GnRH3 genes. In addition, some special features of the sequences were detected in partial species. For example, two specific residues (V and A) were found in the family Sciaenidae, and the unique 75-72 bp type of the open reading frame 2 and 3 existed in the family Cyprinidae. Analysis of the 2576 bp promoter fragment of LcGnRH3 showed a number of transcription factor binding sites, such as AP1, CREB, GATA-1, HSF, FOXA2, and FOXL1. Promoter functional analysis using an EGFP reporter fusion in zebrafish larvae presented positive signals in the brain, including the olfactory region, the terminal nerve ganglion, the telencephalon, and the hypothalamus. The expression pattern was generally consistent with the endogenous GnRH3 GFP-expressing transgenic zebrafish lines, but the details were different. These results indicate that the structure and function of LcGnRH3 are generally similar to the other teleost GnRH3 genes, but there exist some distinctions among them.

  15. Large number of replacement polymorphisms in rapidly evolving genes of Drosophila. Implications for genome-wide surveys of DNA polymorphism.

    PubMed Central

    Schmid, K J; Nigro, L; Aquadro, C F; Tautz, D

    1999-01-01

    We present a survey of nucleotide polymorphism of three novel, rapidly evolving genes in populations of Drosophila melanogaster and D. simulans. Levels of silent polymorphism are comparable to other loci, but the number of replacement polymorphisms is higher than that in most other genes surveyed in D. melanogaster and D. simulans. Tests of neutrality fail to reject neutral evolution with one exception. This concerns a gene located in a region of high recombination rate in D. simulans and in a region of low recombination rate in D. melanogaster, due to an inversion. In the latter case it shows a very low number of polymorphisms, presumably due to selective sweeps in the region. Patterns of nucleotide polymorphism suggest that most substitutions are neutral or nearly neutral and that weak (positive and purifying) selection plays a significant role in the evolution of these genes. At all three loci, purifying selection of slightly deleterious replacement mutations appears to be more efficient in D. simulans than in D. melanogaster, presumably due to different effective population sizes. Our analysis suggests that current knowledge about genome-wide patterns of nucleotide polymorphism is far from complete with respect to the types and range of nucleotide substitutions and that further analysis of differences between local populations will be required to understand the forces more completely. We note that rapidly diverging and nearly neutrally evolving genes cannot be expected only in the genome of Drosophila, but are likely to occur in large numbers also in other organisms and that their function and evolution are little understood so far. PMID:10581279

  16. Extensive chordate and annelid macrosynteny reveals ancestral homeobox gene organization.

    PubMed

    Hui, Jerome H L; McDougall, Carmel; Monteiro, Ana S; Holland, Peter W H; Arendt, Detlev; Balavoine, Guillaume; Ferrier, David E K

    2012-01-01

    Genes with the homeobox motif are crucial in developmental biology and widely implicated in the evolution of development. The Antennapedia (ANTP)-class is one of the two major classes of animal homeobox genes, and includes the Hox genes, renowned for their role in patterning the anterior-posterior axis of animals. The origin and evolution of the ANTP-class genes are a matter of some debate. A principal guiding hypothesis has been the existence of an ancient gene Mega-cluster deep in animal ancestry. This hypothesis was largely established from linkage data from chordates, and the Mega-cluster hypothesis remains to be seriously tested in protostomes. We have thus mapped ANTP-class homeobox genes to the chromosome level in a lophotrochozoan protostome. Our comparison of gene organization in Platynereis dumerilii and chordates indicates that the Mega-cluster, if it did exist, had already been broken up onto four chromosomes by the time of the protostome-deuterostome ancestor (PDA). These results not only elucidate an aspect of the genome organization of the PDA but also reveal high levels of macrosynteny between P. dumerilii and chordates. This implies a very low rate of interchromosomal genome rearrangement in the lineages leading to P. dumerilii and the chordate ancestor since the time of the PDA.

  17. The presence of the ancestral insect telomeric motif in kissing bugs (Triatominae) rules out the hypothesis of its loss in evolutionarily advanced Heteroptera (Cimicomorpha)

    PubMed Central

    Pita, Sebastián; Panzera, Francisco; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Lorite, Pedro

    2016-01-01

    Abstract Next-generation sequencing data analysis on Triatoma infestans Klug, 1834 (Heteroptera, Cimicomorpha, Reduviidae) revealed the presence of the ancestral insect (TTAGG)n telomeric motif in its genome. Fluorescence in situ hybridization confirms that chromosomes bear this telomeric sequence in their chromosomal ends. Furthermore, motif amount estimation was about 0.03% of the total genome, so that the average telomere length in each chromosomal end is almost 18 kb long. We also detected the presence of (TTAGG)n telomeric repeat in mitotic and meiotic chromosomes in other three species of Triatominae: Triatoma dimidiata Latreille, 1811, Dipetalogaster maxima Uhler, 1894, and Rhodnius prolixus Ståhl, 1859. This is the first report of the (TTAGG)n telomeric repeat in the infraorder Cimicomorpha, contradicting the currently accepted hypothesis that evolutionarily recent heteropterans lack this ancestral insect telomeric sequence. PMID:27830050

  18. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals.

    PubMed

    Masuda, Y; Misztal, I; Tsuruta, S; Legarra, A; Aguilar, I; Lourenco, D A L; Fragomeni, B O; Lawlor, T J

    2016-03-01

    The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic relationship matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic relationship matrix GAPY(-1) based on a direct inversion of genomic relationship matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1

  19. Visual system evolution and the nature of the ancestral snake.

    PubMed

    Simões, B F; Sampaio, F L; Jared, C; Antoniazzi, M M; Loew, E R; Bowmaker, J K; Rodriguez, A; Hart, N S; Hunt, D M; Partridge, J C; Gower, D J

    2015-07-01

    The dominant hypothesis for the evolutionary origin of snakes from 'lizards' (non-snake squamates) is that stem snakes acquired many snake features while passing through a profound burrowing (fossorial) phase. To investigate this, we examined the visual pigments and their encoding opsin genes in a range of squamate reptiles, focusing on fossorial lizards and snakes. We sequenced opsin transcripts isolated from retinal cDNA and used microspectrophotometry to measure directly the spectral absorbance of the photoreceptor visual pigments in a subset of samples. In snakes, but not lizards, dedicated fossoriality (as in Scolecophidia and the alethinophidian Anilius scytale) corresponds with loss of all visual opsins other than RH1 (λmax 490-497 nm); all other snakes (including less dedicated burrowers) also have functional sws1 and lws opsin genes. In contrast, the retinas of all lizards sampled, even highly fossorial amphisbaenians with reduced eyes, express functional lws, sws1, sws2 and rh1 genes, and most also express rh2 (i.e. they express all five of the visual opsin genes present in the ancestral vertebrate). Our evidence of visual pigment complements suggests that the visual system of stem snakes was partly reduced, with two (RH2 and SWS2) of the ancestral vertebrate visual pigments being eliminated, but that this did not extend to the extreme additional loss of SWS1 and LWS that subsequently occurred (probably independently) in highly fossorial extant scolecophidians and A. scytale. We therefore consider it unlikely that the ancestral snake was as fossorial as extant scolecophidians, whether or not the latter are para- or monophyletic.

  20. In Silico Resurrection of the Major Vault Protein Suggests It Is Ancestral in Modern Eukaryotes

    PubMed Central

    Daly, Toni K.; Sutherland-Smith, Andrew J.; Penny, David

    2013-01-01

    Vaults are very large oligomeric ribonucleoproteins conserved among a variety of species. The rat vault 3D structure shows an ovoid oligomeric particle, consisting of 78 major vault protein monomers, each of approximately 861 amino acids. Vaults are probably the largest ribonucleoprotein structures in eukaryote cells, being approximately 70 nm in length with a diameter of 40 nm—the size of three ribosomes and with a lumen capacity of 50 million Å3. We use both protein sequences and inferred ancestral sequences for in silico virtual resurrection of tertiary and quaternary structures to search for vaults in a wide variety of eukaryotes. We find that the vault’s phylogenetic distribution is widespread in eukaryotes, but is apparently absent in some notable model organisms. Our conclusion from the distribution of vaults is that they were present in the last eukaryote common ancestor but they have apparently been lost from a number of groups including fungi, insects, and probably plants. Our approach of inferring ancestral 3D and quaternary structures is expected to be useful generally. PMID:23887922

  1. In silico resurrection of the major vault protein suggests it is ancestral in modern eukaryotes.

    PubMed

    Daly, Toni K; Sutherland-Smith, Andrew J; Penny, David

    2013-01-01

    Vaults are very large oligomeric ribonucleoproteins conserved among a variety of species. The rat vault 3D structure shows an ovoid oligomeric particle, consisting of 78 major vault protein monomers, each of approximately 861 amino acids. Vaults are probably the largest ribonucleoprotein structures in eukaryote cells, being approximately 70 nm in length with a diameter of 40 nm--the size of three ribosomes and with a lumen capacity of 50 million Å(3). We use both protein sequences and inferred ancestral sequences for in silico virtual resurrection of tertiary and quaternary structures to search for vaults in a wide variety of eukaryotes. We find that the vault's phylogenetic distribution is widespread in eukaryotes, but is apparently absent in some notable model organisms. Our conclusion from the distribution of vaults is that they were present in the last eukaryote common ancestor but they have apparently been lost from a number of groups including fungi, insects, and probably plants. Our approach of inferring ancestral 3D and quaternary structures is expected to be useful generally.

  2. Nanochannel Platform for Single-DNA Studies; From DNA-Protein Interaction to Large Scale Genome Sequencing

    NASA Astrophysics Data System (ADS)

    van der Maarel, Johan; van Kan, Jeroen; Zhang, Ce

    2014-03-01

    The study of nanochannel-confined DNA molecules is of importance from both biotechnological and biophysical points of view. We produce nanochannels in elastomer-based biochips with soft lithography using proton beam writing technology. The cross-sectional diameter of the channels is in the range of 50 to 300 nm. Single DNA molecules confined inside these channels can be visualized with fluorescence microscopy. Two related issues concerning DNA confined in such a nanospace will be discussed. For the first issue, the dynamic effects of nucleoid associated proteins (H-NS and HU) and protamine on the conformation and condensation of DNA will be presented. We use a novel, cross-channel device, which allows the monitoring of the conformational response after a change in environmental solution conditions in situ. The second issue concerns bottlebrush-coated DNA. The bottlebrush has an increased bending rigidity and thickness, which results in an amplified stretch once it is confined inside a nanochannel. It will be demonstrated that large-scale genomic organization can be sequenced using single DNA molecules on an array of elastomeric nanochannels with cross-sectional diameters of 200 nm. Overall, our results show that the effects of proteins on the conformation and folding of DNA are not only related to protein binding, osmolarity, and charge, but that the interplay with confinement in a nanospace is of paramount importance.

  3. Patterns of metabolite changes identified from large-scale gene perturbations in Arabidopsis using a genome-scale metabolic network.

    PubMed

    Kim, Taehyong; Dreher, Kate; Nilo-Poyanco, Ricardo; Lee, Insuk; Fiehn, Oliver; Lange, Bernd Markus; Nikolau, Basil J; Sumner, Lloyd; Welti, Ruth; Wurtele, Eve S; Rhee, Seung Y

    2015-04-01

    Metabolomics enables quantitative evaluation of metabolic changes caused by genetic or environmental perturbations. However, little is known about how perturbing a single gene changes the metabolic system as a whole and which network and functional properties are involved in this response. To answer this question, we investigated the metabolite profiles from 136 mutants with single gene perturbations of functionally diverse Arabidopsis (Arabidopsis thaliana) genes. Fewer than 10 metabolites were changed significantly relative to the wild type in most of the mutants, indicating that the metabolic network was robust to perturbations of single metabolic genes. These changed metabolites were closer to each other in a genome-scale metabolic network than expected by chance, supporting the notion that the genetic perturbations changed the network more locally than globally. Surprisingly, the changed metabolites were close to the perturbed reactions in only 30% of the mutants of the well-characterized genes. To determine the factors that contributed to the distance between the observed metabolic changes and the perturbation site in the network, we examined nine network and functional properties of the perturbed genes. Only the isozyme number affected the distance between the perturbed reactions and changed metabolites. This study revealed patterns of metabolic changes from large-scale gene perturbations and relationships between characteristics of the perturbed genes and metabolic changes.

  4. Extensive Chromosomal Reorganization in the Evolution of New World Muroid Rodents (Cricetidae, Sigmodontinae): Searching for Ancestral Phylogenetic Traits

    PubMed Central

    Pereira, Adenilson Leão; Malcher, Stella Miranda; Nagamachi, Cleusa Yoshiko; O’Brien, Patricia Caroline Mary; Ferguson-Smith, Malcolm Andrew; Mendes-Oliveira, Ana Cristina; Pieczarka, Julio Cesar

    2016-01-01

    Sigmodontinae rodents show great diversity and complexity in morphology and ecology. This diversity is accompanied by extensive chromosome variation challenging attempts to reconstruct their ancestral genome. The species Hylaeamys megacephalus–HME (Oryzomyini, 2n = 54), Necromys lasiurus—NLA (Akodontini, 2n = 34) and Akodon sp.–ASP (Akodontini, 2n = 10) have extreme diploid numbers that make it difficult to understand the rearrangements that are responsible for such differences. In this study we analyzed these changes using whole chromosome probes of HME in cross-species painting of NLA and ASP to construct chromosome homology maps that reveal the rearrangements between species. We include data from the literature for other Sigmodontinae previously studied with probes from HME and Mus musculus (MMU) probes. We also use the HME probes on MMU chromosomes for the comparative analysis of NLA with other species already mapped by MMU probes. Our results show that NLA and ASP have highly rearranged karyotypes when compared to HME. Eleven HME syntenic blocks are shared among the species studied here. Four syntenies may be ancestral to Akodontini (HME2/18, 3/25, 18/25 and 4/11/16) and eight to Sigmodontinae (HME26, 1/12, 6/21, 7/9, 5/17, 11/16, 20/13 and 19/14/19). Using MMU data we identified six associations shared among rodents from seven subfamilies, where MMU3/18 and MMU8/13 are phylogenetic signatures of Sigmodontinae. We suggest that the associations MMU2entire, MMU6proximal/12entire, MMU3/18, MMU8/13, MMU1/17, MMU10/17, MMU12/17, MMU5/16, MMU5/6 and MMU7/19 are part of the ancestral Sigmodontinae genome. PMID:26800516

  5. Human Genetic Ancestral Composition Correlates with the Origin of Mycobacterium leprae Strains in a Leprosy Endemic Population.

    PubMed

    Cardona-Castro, Nora; Cortés, Edwin; Beltrán, Camilo; Romero, Marcela; Badel-Mogollón, Jaime E; Bedoya, Gabriel

    2015-01-01

    Recent reports have suggested that leprosy originated in Africa, extended to Asia and Europe, and arrived in the Americas during European colonization and the African slave trade. Due to colonization, the contemporary Colombian population is an admixture of Native-American, European and African ancestries. Because microorganisms are known to accompany humans during migrations, patterns of human migration can be traced by examining genomic changes in associated microbes. The current study analyzed 118 leprosy cases and 116 unrelated controls from two Colombian regions endemic for leprosy (Atlantic and Andean) in order to determine possible associations of leprosy with patient ancestral background (determined using 36 ancestry informative markers), Mycobacterium leprae genotype and/or patient geographical origin. We found significant differences between ancestral genetic composition. European components were predominant in Andean populations. In contrast, African components were higher in the Atlantic region. M. leprae genotypes were then analyzed for cluster associations and compared with the ancestral composition of leprosy patients. Two M. leprae principal clusters were found: haplotypes C54 and T45. Haplotype C54 associated with African origin and was more frequent in patients from the Atlantic region with a high African component. In contrast, haplotype T45 associated with European origin and was more frequent in Andean patients with a higher European component. These results suggest that the human and M. leprae genomes have co-existed since the African and European origins of the disease, with leprosy ultimately arriving in Colombia during colonization. Distinct M. leprae strains followed European and African settlement in the country and can be detected in contemporary Colombian populations.

  6. Human Genetic Ancestral Composition Correlates with the Origin of Mycobacterium leprae Strains in a Leprosy Endemic Population

    PubMed Central

    Cardona-Castro, Nora; Cortés, Edwin; Beltrán, Camilo; Romero, Marcela; Badel-Mogollón, Jaime E.; Bedoya, Gabriel

    2015-01-01

    Recent reports have suggested that leprosy originated in Africa, extended to Asia and Europe, and arrived in the Americas during European colonization and the African slave trade. Due to colonization, the contemporary Colombian population is an admixture of Native-American, European and African ancestries. Because microorganisms are known to accompany humans during migrations, patterns of human migration can be traced by examining genomic changes in associated microbes. The current study analyzed 118 leprosy cases and 116 unrelated controls from two Colombian regions endemic for leprosy (Atlantic and Andean) in order to determine possible associations of leprosy with patient ancestral background (determined using 36 ancestry informative markers), Mycobacterium leprae genotype and/or patient geographical origin. We found significant differences between ancestral genetic composition. European components were predominant in Andean populations. In contrast, African components were higher in the Atlantic region. M. leprae genotypes were then analyzed for cluster associations and compared with the ancestral composition of leprosy patients. Two M. leprae principal clusters were found: haplotypes C54 and T45. Haplotype C54 associated with African origin and was more frequent in patients from the Atlantic region with a high African component. In contrast, haplotype T45 associated with European origin and was more frequent in Andean patients with a higher European component. These results suggest that the human and M. leprae genomes have co-existed since the African and European origins of the disease, with leprosy ultimately arriving in Colombia during colonization. Distinct M. leprae strains followed European and African settlement in the country and can be detected in contemporary Colombian populations. PMID:26360617

  7. Analysis of BRCA1and BRCA2 large genomic rearrangements in Sri Lankan familial breast cancer patients and at risk individuals

    PubMed Central

    2014-01-01

    Background Majority of mutations found to date in the BRCA1/BRCA2 genes in breast and/or ovarian cancer families are point mutations or small insertions and deletions scattered over the coding sequence and splice junctions. Such mutations and sequence variants of BRCA1 and BRCA2 genes were previously identified in a group of Sri Lankan breast cancer patients. Large genomic rearrangements have been characterized in BRCA1 and BRCA2 genes in several populations but these have not been characterized in Sri Lankan breast cancer patients. Findings A cohort of familial breast cancer patients (N = 57), at risk individuals (N = 25) and healthy controls (N = 23) were analyzed using multiplex ligation-dependent probe amplification method to detect BRCA1 and BRCA2 large genomic rearrangements. One familial breast cancer patient showed an ambiguous deletion in exon 6 of BRCA1 gene. Full sequencing of the ambiguous region was used to confirm MLPA results. Ambiguous deletion detected by MLPA was found to be a false positive result confirming that BRCA1 large genomic rearrangements were absent in the subjects studied. No BRCA2 rearrangement was also identified in the cohort. Conclusion Thus this study demonstrates that BRCA1 and BRCA2 large genomic rearrangements are unlikely to make a significant contribution to aetiology of breast cancer in Sri Lanka. PMID:24906410

  8. Genesis of ancestral haplotypes: RNA modifications and reverse transcription-mediated polymorphisms.

    PubMed

    Steele, Edward J; Williamson, Joseph F; Lester, Susan; Stewart, Brent J; Millman, John A; Carnegie, Pat; Lindley, Robyn A; Pain, Geoff N; Dawkins, Roger L

    2011-03-01

    Understanding the genesis of the block haplotype structure of the genome is a major challenge. With the completion of the sequencing of the Human Genome and the initiation of the HapMap project the concept that the chromosomes of the mammalian genome are a mosaic, or patchwork, of conserved extended block haplotype sequences is now accepted by the mainstream genomics research community. Ancestral Haplotypes (AHs) can be viewed as a recombined string of smaller Polymorphic Frozen Blocks (PFBs). How have such variant extended DNA sequence tracts emerged in evolution? Here the relevant literature on the problem is reviewed from various fields of molecular and cell biology particularly molecular immunology and comparative and functional genomics. Based on our synthesis we then advance a testable molecular and cellular model. A critical part of the analysis concerns the origin of the strand biased mutation signatures in the transcribed regions of the human and higher primate genome, A-to-G versus T-to-C (ratio ∼ 1.5 fold) and C-to-T versus G-to-A (≥ 1.5 fold). A comparison and evaluation of the current state of the fields of immunoglobulin Somatic Hypermutation (SHM) and Transcription-Coupled DNA Repair focused on how mutations in newly synthesized RNA might be copied back to DNA thus accounting for some of the genome-wide strand biases (e.g., the A-to-G vs T-to-C component of the strand biased spectrum). We hypothesize that the genesis of PFBs and extended AHs occurs during mutagenic episodes in evolution (e.g., retroviral infections) and that many of the critical DNA sequence diversifying events occur first at the RNA level, e.g., recombination between RNA strings resulting in tandem and dispersed RNA duplications (retroduplications), RNA mutations via adenosine-to-inosine pre-mRNA editing events as well as error prone RNA synthesis. These are then copied back into DNA by a cellular reverse transcription process (also likely to be error-prone) that we have called

  9. Whole Genome Sequence Analysis of a Large Isoniazid-Resistant Tuberculosis Outbreak in London: A Retrospective Observational Study

    PubMed Central

    Casali, Nicola; Broda, Agnieszka; Harris, Simon R.; Brown, Timothy; Drobniewski, Francis

    2016-01-01

    Background A large isoniazid-resistant tuberculosis outbreak centred on London, United Kingdom, has been ongoing since 1995. The aim of this study was to investigate the power and value of whole genome sequencing (WGS) to resolve the transmission network compared to current molecular strain typing approaches, including analysis of intra-host diversity within a specimen, across body sites, and over time, with identification of genetic factors underlying the epidemiological success of this cluster. Methods and Findings We sequenced 344 outbreak isolates from individual patients collected over 14 y (2 February 1998–22 June 2012). This demonstrated that 96 (27.9%) were indistinguishable, and only one differed from this major clone by more than five single nucleotide polymorphisms (SNPs). The maximum number of SNPs between any pair of isolates was nine SNPs, and the modal distance between isolates was two SNPs. WGS was able to reveal the direction of transmission of tuberculosis in 16 cases within the outbreak (4.7%), including within a multidrug-resistant cluster that carried a rare rpoB mutation associated with rifampicin resistance. Eleven longitudinal pairs of patient pulmonary isolates collected up to 48 mo apart differed from each other by between zero and four SNPs. Extrapulmonary dissemination resulted in acquisition of a SNP in two of five cases. WGS analysis of 27 individual colonies cultured from a single patient specimen revealed ten loci differed amongst them, with a maximum distance between any pair of six SNPs. A limitation of this study, as in previous studies, is that indels and SNPs in repetitive regions were not assessed due to the difficulty in reliably determining this variation. Conclusions Our study suggests that (1) certain paradigms need to be revised, such as the 12 SNP distance as the gold standard upper threshold to identify plausible transmissions; (2) WGS technology is helpful to rule out the possibility of direct transmission when

  10. A Large Genome-Wide Association Study of Age-Related Hearing Impairment Using Electronic Health Records

    PubMed Central

    Hoffmann, Thomas J.; Keats, Bronya J.; Yoshikawa, Noriko; Risch, Neil

    2016-01-01

    Age-related hearing impairment (ARHI), one of the most common sensory disorders, can be mitigated, but not cured or eliminated. To identify genetic influences underlying ARHI, we conducted a genome-wide association study of ARHI in 6,527 cases and 45,882 controls among the non-Hispanic whites from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. We identified two novel genome-wide significant SNPs: rs4932196 (odds ratio = 1.185, p = 4.0x10-11), 52Kb 3’ of ISG20, which replicated in a meta-analysis of the other GERA race/ethnicity groups (1,025 cases, 12,388 controls, p = 0.00094) and in a UK Biobank case-control analysis (30,802 self-reported cases, 78,586 controls, p = 0.015); and rs58389158 (odds ratio = 1.132, p = 1.8x10-9), which replicated in the UK Biobank (p = 0.00021). The latter SNP lies just outside exon 8 and is highly correlated (r2 = 0.96) with the missense SNP rs5756795 in exon 7 of TRIOBP, a gene previously associated with prelingual nonsyndromic hearing loss. We further tested these SNPs in phenotypes from audiologist notes available on a subset of GERA (4,903 individuals), stratified by case/control status, to construct an independent replication test, and found a significant effect of rs58389158 on speech reception threshold (SRT; overall GERA meta-analysis p = 1.9x10-6). We also tested variants within exons of 132 other previously-identified hearing loss genes, and identified two common additional significant SNPs: rs2877561 (synonymous change in ILDR1, p = 6.2x10-5), which replicated in the UK Biobank (p = 0.00057), and had a significant GERA SRT (p = 0.00019) and speech discrimination score (SDS; p = 0.0019); and rs9493627 (missense change in EYA4, p = 0.00011) which replicated in the UK Biobank (p = 0.0095), other GERA groups (p = 0.0080), and had a consistent significant result for SRT (p = 0.041) and suggestive result for SDS (p = 0.081). Large cohorts with GWAS data and electronic health records may be a useful

  11. The Organellar Genomes of Chromera and Vitrella, the Phototrophic Relatives of Apicomplexan Parasites.

    PubMed

    Oborník, Miroslav; Lukeš, Julius

    2015-01-01

    Apicomplexa are known to contain greatly reduced organellar genomes. Their mitochondrial genome carries only three protein-coding genes, and their plastid genome is reduced to a 35-kb-long circle. The discovery of coral-endosymbiotic algae Chromera velia and Vitrella brassicaformis, which share a common ancestry with Apicomplexa, provided an opportunity to study possibly ancestral forms of organellar genomes, a unique glimpse into the evolutionary history of apicomplexan parasites. The structurally similar mitochondrial genomes of Chromera and Vitrella differ in gene content, which is reflected in the composition of their respiratory chains. Thus, Chromera lacks respiratory complexes I and III, whereas Vitrella and apicomplexan parasites are missing only complex I. Plastid genomes differ substantially between these algae, particularly in structure: The Chromera plastid genome is a linear, 120-kb molecule with large and divergent genes, whereas the plastid genome of Vitrella is a highly compact circle that is only 85 kb long but nonetheless contains more genes than that of Chromera. It appears that organellar genomes have already been reduced in free-living phototrophic ancestors of apicomplexan parasites, and such reduction is not associated with parasitism.

  12. Functional conservation of an ancestral Pellino protein in helminth species

    PubMed Central

    Cluxton, Christopher D.; Caffrey, Brian E.; Kinsella, Gemma K.; Moynagh, Paul N.; Fares, Mario A.; Fallon, Padraic G.

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans. PMID:26120048

  13. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes

    PubMed Central

    Koonin, Eugene V; Fedorova, Natalie D; Jackson, John D; Jacobs, Aviva R; Krylov, Dmitri M; Makarova, Kira S; Mazumder, Raja; Mekhedov, Sergei L; Nikolskaya, Anastasia N; Rao, B Sridhar; Rogozin, Igor B; Smirnov, Sergei; Sorokin, Alexander V; Sverdlov, Alexander V; Vasudevan, Sona; Wolf, Yuri I; Yin, Jodie J; Natale, Darren A

    2004-01-01

    the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes. Conclusions The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms. PMID:14759257

  14. Redefining the differences in gene content between Yersinia pestis and Yersinia pseudotuberculosis using large-scale comparative genomics

    PubMed Central

    Califf, Katy J.; Keim, Paul S.; Wagner, David M.

    2015-01-01

    Yersinia pestis, the causative agent of plague, is best known for historical pandemics, but still actively causes disease in many parts of the world. Y. pestis is a recently derived clone of the pathogenic species Yersinia pseudotuberculosis, but is more associated with human infection. Numerous studies have documented genomic changes since the two species differentiated, although all of these studies used a relatively small sample set for defining these differences. In this study, we compared the complete genomic content between a diverse set of Y. pestis and Y. pseudotuberculosis genomes, and identified unique loci that could serve as diagnostic markers or for better understanding the evolution and pathogenesis of each group. Comparative genomics analyses also identified subtle variations in gene content between individual monophyletic clades within these species, based on a core genome single nucleotide polymorphism phylogeny that would have been undetected in a less comprehensive genome dataset. We also screened loci that were identified in other published studies as unique to either species and generally found a non-uniform distribution, suggesting that the assignment of these unique genes to either species should be re-evaluated in the context of current sequencing efforts. Overall, this study provides a high-resolution view into the genomic differences between Y. pestis and Y. pseudotuberculosis, demonstrating fine-scale differentiation and unique gene composition in both species. PMID:28348813

  15. Genome Reduction Uncovers a Large Dispensable Genome and Adaptive Role for Copy Number Variation in Asexually Propagated Solanum tuberosum[OPEN

    PubMed Central

    Hardigan, Michael A.; Crisovan, Emily; Hamilton, John P.; Laimbeer, Parker; Leisner, Courtney P.; Manrique-Carpintero, Norma C.; Newton, Linsey; Pham, Gina M.; Vaillancourt, Brieanne; Zeng, Zixian; Jiang, Jiming

    2016-01-01

    Clonally reproducing plants have the potential to bear a significantly greater mutational load than sexually reproducing species. To investigate this possibility, we examined the breadth of genome-wide structural variation in a panel of monoploid/doubled monoploid clones generated from native populations of diploid potato (Solanum tuberosum), a highly heterozygous asexually propagated plant. As rare instances of purely homozygous clones, they provided an ideal set for determining the degree of structural variation tolerated by this species and deriving its minimal gene complement. Extensive copy number variation (CNV) was uncovered, impacting 219.8 Mb (30.2%) of the potato genome with nearly 30% of genes subject to at least partial duplication or deletion, revealing the highly heterogeneous nature of the potato genome. Dispensable genes (>7000) were associated with limited transcription and/or a recent evolutionary history, with lower deletion frequency observed in genes conserved across angiosperms. Association of CNV with plant adaptation was highlighted by enrichment in gene clusters encoding functions for environmental stress response, with gene duplication playing a part in species-specific expansions of stress-related gene families. This study revealed unique impacts of CNV in a species with asexual reproductive habits and how CNV may drive adaption through evolution of key stress pathways. PMID:26772996

  16. Sex chromosomes evolved from independent ancestral linkage groups in winged insects.

    PubMed

    Pease, James B; Hahn, Matthew W

    2012-06-01

    The evolution of a pair of chromosomes that differ in appearance between males and females (heteromorphic sex chromosomes) has occurred repeatedly across plants and animals. Recent work has shown that the male heterogametic (XY) and female heterogametic (ZW) sex chromosomes evolved independently from different pairs of homomorphic autosomes in the common ancestor of birds and mammals but also that X and Z chromosomes share many convergent molecular features. However, little is known about how often heteromorphic sex chromosomes have either evolved convergently from different autosomes or in parallel from the same pair of autosomes and how universal patterns of molecular evolution on sex chromosomes really are. Among winged insects with sequenced genomes, there are male heterogametic species in both the Diptera (e.g., Drosophila melanogaster) and the Coleoptera (Tribolium castaneum), female heterogametic species in the Lepidoptera (Bombyx mori), and haplodiploid species in the Hymenoptera (e.g., Nasonia vitripennis). By determining orthologous relationships among genes on the X and Z chromosomes of insects with sequenced genomes, we are able to show that these chromosomes are not homologous to one another but are homologous to autosomes in each of the other species. These results strongly imply that heteromorphic sex chromosomes have evolved independently from different pairs of ancestral chromosomes in each of the insect orders studied. We also find that the convergently evolved X chromosomes of Diptera and Coleoptera share genomic features with each other and with vertebrate X chromosomes, including excess gene movement from the X to the autosomes. However, other patterns of molecular evolution--such as increased codon bias, decreased gene density, and the paucity of male-biased genes on the X--differ among the insect X and Z chromosomes. Our results provide evidence for both differences and nearly universal similarities in patterns of evolution among

  17. A BAC pooling strategy combined with PCR-based screenings in a large, highly repetitive genome enables integration of the maize genetic and physical maps

    PubMed Central

    Yim, Young-Sun; Moak, Patricia; Sanchez-Villeda, Hector; Musket, Theresa A; Close, Pamela; Klein, Patricia E; Mullet, John E; McMullen, Michael D; Fang, Zheiwei; Schaeffer, Mary L; Gardiner, Jack M; Coe, Edward H; Davis, Georgia L

    2007-01-01

    Background Molecular markers serve three important functions in physical map assembly. First, they provide anchor points to genetic maps facilitating functional genomic studies. Second, they reduce the overlap required for BAC contig assembly from 80 to 50 percent. Finally, they validate assemblies based solely on BAC fingerprints. We employed a six-dimensional BAC pooling strategy in combination with a high-throughput PCR-based screening method to anchor the maize genetic and physical maps. Results A total of 110,592 maize BAC clones (~ 6x haploid genome equivalents) were pooled into six different matrices, each containing 48 pools of BAC DNA. The quality of the BAC DNA pools and their utility for identifying BACs containing target genomic sequences was tested using 254 PCR-based STS markers. Five types of PCR-based STS markers were screened to assess potential uses for the BAC pools. An average of 4.68 BAC clones were identified per marker analyzed. These results were integrated with BAC fingerprint data generated by the Arizona Genomics Institute (AGI) and the Arizona Genomics Computational Laboratory (AGCoL) to assemble the BAC contigs using the FingerPrinted Contigs (FPC) software and contribute to the construction and anchoring of the physical map. A total of 234 markers (92.5%) anchored BAC contigs to their genetic map positions. The results can be viewed on the integrated map of maize [1,2]. Conclusion This BAC pooling strategy is a rapid, cost effective method for genome assembly and anchoring. The requirement for six replicate positive amplifications makes this a robust method for use in large genomes with high amounts of repetitive DNA such as maize. This strategy can be used to physically map duplicate loci, provide order information for loci in a small genetic interval or with no genetic recombination, and loci with conflicting hybridization-based information. PMID:17291341

  18. Large scale full-length cDNA sequencing reveals a unique genomic landscape in a lepidopteran model insect, Bombyx mori.

    PubMed

    Suetsugu, Yoshitaka; Futahashi, Ryo; Kanamori, Hiroyuki; Kadono-Okuda, Keiko; Sasanuma, Shun-ichi; Narukawa, Junko; Ajimura, Masahiro; Jouraku, Akiya; Namiki, Nobukazu; Shimomura, Michihiko; Sezutsu, Hideki; Osanai-Futahashi, Mizuko; Suzuki, Masataka G; Daimon, Takaaki; Shinoda, Tetsuro; Taniai, Kiyoko; Asaoka, Kiyoshi; Niwa, Ryusuke; Kawaoka, Shinpei; Katsuma, Susumu; Tamura, Toshiki; Noda, Hiroaki; Kasahara, Masahiro; Sugano, Sumio; Suzuki, Yutaka; Fujiwara, Haruhiko; Kataoka, Hiroshi; Arunkumar, Kallare P; Tomar, Archana; Nagaraju, Javaregowda; Goldsmith, Marian R; Feng, Qili; Xia, Qingyou; Yamamoto, Kimiko; Shimada, Toru; Mita, Kazuei

    2013-09-04

    The establishment of a complete genomic sequence of silkworm, the model species of Lepidoptera, laid a foundation for its functional genomics. A more complete annotation of the genome will benefit functional and comparative studies and accelerate extensive industrial applications for this insect. To realize these goals, we embarked upon a large-scale full-length cDNA collection from 21 full-length cDNA libraries derived from 14 tissues of the domesticated silkworm and performed full sequencing by primer walking for 11,104 full-length cDNAs. The large average intron size was 1904 bp, resulting from a high accumulation of transposons. Using gene models predicted by GLEAN and published mRNAs, we identified 16,823 gene loci on the silkworm genome assembly. Orthology analysis of 153 species, including 11 insects, revealed that among three Lepidoptera including Monarch and Heliconius butterflies, the 403 largest silkworm-specific genes were composed mainly of protective immunity, hormone-related, and characteristic structural proteins. Analysis of testis-/ovary-specific genes revealed distinctive features of sexual dimorphism, including depletion of ovary-specific genes on the Z chromosome in contrast to an enrichment of testis-specific genes. More than 40% of genes expressed in specific tissues mapped in tissue-specific chromosomal clusters. The newly obtained FL-cDNA sequences enabled us to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of lepidopteran-specific genes.

  19. Large Scale Full-Length cDNA Sequencing Reveals a Unique Genomic Landscape in a Lepidopteran Model Insect, Bombyx mori

    PubMed Central

    Suetsugu, Yoshitaka; Futahashi, Ryo; Kanamori, Hiroyuki; Kadono-Okuda, Keiko; Sasanuma, Shun-ichi; Narukawa, Junko; Ajimura, Masahiro; Jouraku, Akiya; Namiki, Nobukazu; Shimomura, Michihiko; Sezutsu, Hideki; Osanai-Futahashi, Mizuko; Suzuki, Masataka G; Daimon, Takaaki; Shinoda, Tetsuro; Taniai, Kiyoko; Asaoka, Kiyoshi; Niwa, Ryusuke; Kawaoka, Shinpei; Katsuma, Susumu; Tamura, Toshiki; Noda, Hiroaki; Kasahara, Masahiro; Sugano, Sumio; Suzuki, Yutaka; Fujiwara, Haruhiko; Kataoka, Hiroshi; Arunkumar, Kallare P.; Tomar, Archana; Nagaraju, Javaregowda; Goldsmith, Marian R.; Feng, Qili; Xia, Qingyou; Yamamoto, Kimiko; Shimada, Toru; Mita, Kazuei

    2013-01-01

    The establishment of a complete genomic sequence of silkworm, the model species of Lepidoptera, laid a foundation for its functional genomics. A more complete annotation of the genome will benefit functional and comparative studies and accelerate extensive industrial applications for this insect. To realize these goals, we embarked upon a large-scale full-length cDNA collection from 21 full-length cDNA libraries derived from 14 tissues of the domesticated silkworm and performed full sequencing by primer walking for 11,104 full-length cDNAs. The large average intron size was 1904 bp, resulting from a high accumulation of transposons. Using gene models predicted by GLEAN and published mRNAs, we identified 16,823 gene loci on the silkworm genome assembly. Orthology analysis of 153 species, including 11 insects, revealed that among three Lepidoptera including Monarch and Heliconius butterflies, the 403 largest silkworm-specific genes were composed mainly of protective immunity, hormone-related, and characteristic structural proteins. Analysis of testis-/ovary-specific genes revealed distinctive features of sexual dimorphism, including depletion of ovary-specific genes on the Z chromosome in contrast to an enrichment of testis-specific genes. More than 40% of genes expressed in specific tissues mapped in tissue-specific chromosomal clusters. The newly obtained FL-cDNA sequences enabled us to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of lepidopteran-specific genes. PMID:23821615

  20. Evidence that the large noncoding sequence is the main control region of maternally and paternally transmitted mitochondrial genomes of the marine mussel (Mytilus spp.).

    PubMed Central

    Cao, Liqin; Kenchington, Ellen; Zouros, Eleftherios; Rodakis, George C

    2004-01-01

    Both the maternal (F-type) and paternal (M-type) mitochondrial genomes of the Mytilus species complex M. edulis/galloprovincialis contain a noncoding sequence between the l-rRNA and the tRNA(Tyr) genes, here called the large unassigned region (LUR). The LUR, which is shorter in M genomes, is capable of forming secondary structures and contains motifs of significant sequence similarity with elements known to have specific functions in the sea urchin and the mammalian control region. Such features are not present in other noncoding regions of the F or M Mytilus mtDNA. The LUR can be divided on the basis of indels and nucleotide variation in three domains, which is reminiscent of the tripartite structure of the mammalian control region. These features suggest that the LUR is the main control region of the Mytilus mitochondrial genome. The middle domain has diverged by only 1.5% between F and M genomes, while the average divergence over the whole molecule is approximately 20%. In contrast, the first domain is among the most divergent parts of the genome. This suggests that different parts of the LUR are under different selection constraints that are also different from those acting on the coding parts of the molecule. PMID:15238532

  1. Did warfare among ancestral hunter-gatherers affect the evolution of human social behaviors?

    PubMed

    Bowles, Samuel

    2009-06-05

    Since Darwin, intergroup hostilities have figured prominently in explanations of the evolution of human social behavior. Yet whether ancestral humans were largely "peaceful" or "warlike" remains controversial. I ask a more precise question: If more cooperative groups were more likely to prevail in conflicts with other groups, was the level of intergroup violence sufficient to influence the evolution of human social behavior? Using a model of the evolutionary impact of between-group competition and a new data set that combines archaeological evidence on causes of death during the Late Pleistocene and early Holocene with ethnographic and historical reports on hunter-gatherer populations, I find that the estimated level of mortality in intergroup conflicts would have had substantial effects, allowing the proliferation of group-beneficial behaviors that were quite costly to the individual altruist.

  2. Once a Batesian mimic, not always a Batesian mimic: mimic reverts back to ancestral phenotype when the model is absent.

    PubMed

    Prudic, Kathleen L; Oliver, Jeffrey C

    2008-05-22

    Batesian mimics gain protection from predation through the evolution of physical similarities to a model species that possesses anti-predator defences. This protection should not be effective in the absence of the model since the predator does not identify the mimic as potentially dangerous and both the model and the mimic are highly conspicuous. Thus, Batesian mimics should probably encounter strong predation pressure outside the geographical range of the model species. There are several documented examples of Batesian mimics occurring in locations without their models, but the evolutionary responses remain largely unidentified. A mimetic species has four alternative evolutionary responses to the loss of model presence. If predation is weak, it could maintain its mimetic signal. If predation is intense, it is widely presumed the mimic will go extinct. However, the mimic could also evolve a new colour pattern to mimic another model species or it could revert back to its ancestral, less conspicuous phenotype. We used molecular phylogenetic approaches to reconstruct and test the evolution of mimicry in the North American admiral butterflies (Limenitis: Nymphalidae). We confirmed that the more cryptic white-banded form is the ancestral phenotype of North American admiral butterflies. However, one species, Limenitis arthemis, evolved the black pipevine swallowtail mimetic form but later reverted to the white-banded more cryptic ancestral form. This character reversion is strongly correlated with the geographical absence of the model species and its host plant, but not the host plant distribution of L. arthemis. Our results support the prediction that a Batesian mimic does not persist in locations without its model, but it does not go extinct either. The mimic can revert back to its ancestral, less conspicuous form and persist.

  3. e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences.

    PubMed

    Krüger, Jan; Sczyrba, Alexander; Kurtz, Stefan; Giegerich, Robert

    2004-07-01

    e2g is a web-based server which efficiently maps large expressed sequence tag (EST) and cDNA datasets to genomic DNA. It significantly extends the volume of data that can be mapped in reasonable time, and makes this improved efficiency available as a web service. Our server hosts large collections of EST sequences (e.g. 4.1 million mouse ESTs of 1.87 Gb) in precomputed indexed data structures for efficient sequence comparison. The user can upload a genomic DNA sequence of interest and rapidly compare this to the complete collection of ESTs on the server. This delivers a mapping of the ESTs on the genomic DNA. The e2g web interface provides a graphical overview of the mapping. Alignments of the mapped EST regions with parts of the genomic sequence are visualized. Zooming functions allow the user to interactively explore the results. Mapped sequences can be downloaded for further analysis. e2g is available on the Bielefeld University Bioinformatics Server at http://bibiserv.techfak.uni-bielefeld.de/e2g/.

  4. Catastrophic debris avalanche from ancestral Mount Shasta volcano, California

    NASA Astrophysics Data System (ADS)

    Crandell, D. R.; Miller, C. D.; Glicken, H. X.; Christiansen, R. L.; Newhall, C. G.

    1984-03-01

    A debris-avalanche deposit extends 43 km northwestward from the base of Mount Shasta across the floor of Shasta Valley, California, where it covers an area of at least 450 km2. The surface of the deposit is dotted with hundreds of mounds, hills, and ridges, all formed of blocks of pyroxene andesite and unconsolidated volcaniclastic deposits derived from an ancestral Mount Shasta. Individual hills are separated by flat-topped laharlike deposits that also form the matrix of the debris avalanche and slope northwestward about 5 m/km. Radiometric ages of rocks in the deposit and of a postavalanche basalt flow indicate that the avalanche occurred between about 300,000 and 360,000 yr ago. An inferred average thickness of the deposit, plus a computed volume of about 4 km3 for the hills and ridges, indicate an estimated volume of about 26 km3, making it the largest known Quaternary landslide on Earth.

  5. Computational analysis and functional expression of ancestral copepod luciferase.

    PubMed

    Takenaka, Yasuhiro; Noda-Ogura, Akiko; Imanishi, Tadashi; Yamaguchi, Atsushi; Gojobori, Takashi; Shigeri, Yasushi

    2013-10-10

    We recently reported the cDNA sequences of 11 copepod luciferases from the superfamily Augaptiloidea in the order Calanoida. They were classified into two groups, Metridinidae and Heterorhabdidae/Lucicutiidae families, by phylogenetic analyses. To elucidate the evolutionary processes, we have now further isolated 12 copepod luciferases from Augaptiloidea species (Metridia asymmetrica, Metridia curticauda, Pleuromamma scutullata, Pleuromamma xiphias, Lucicutia ovaliformis and Heterorhabdus tanneri). Codon-based synonymous/nonsynonymous tests of positive selection for 25 identified copepod luciferases suggested that positive Darwinian selection operated in the evolution of Heterorhabdidae luciferases, whereas two types of Metridinidae luciferases had diversified via neutral mechanism. By in silico analysis of the decoded amino acid sequences of 25 copepod luciferases, we inferred two protein sequences as ancestral copepod luciferases. They were expressed in HEK293 cells where they exhibited notable luciferase activity both in intracellular lysates and cultured media, indicating that the luciferase activity was established before evolutionary diversification of these copepod species.

  6. Female song is widespread and ancestral in songbirds.

    PubMed

    Odom, Karan J; Hall, Michelle L; Riebel, Katharina; Omland, Kevin E; Langmore, Naomi E

    2014-03-04

    Bird song has historically been considered an almost exclusively male trait, an observation fundamental to the formulation of Darwin's theory of sexual selection. Like other male ornaments, song is used by male songbirds to attract females and compete with rivals. Thus, bird song has become a textbook example of the power of sexual selection to lead to extreme neurological and behavioural sex differences. Here we present an extensive survey and ancestral state reconstruction of female song across songbirds showing that female song is present in 71% of surveyed species including 32 families, and that females sang in the common ancestor of modern songbirds. Our results reverse classical assumptions about the evolution of song and sex differences in birds. The challenge now is to identify whether sexual selection alone or broader processes, such as social or natural selection, best explain the evolution of elaborate traits in both sexes.

  7. Advances on Genome Duplication Distances

    NASA Astrophysics Data System (ADS)

    Gagnon, Yves; Savard, Olivier Tremblay; Bertrand, Denis; El-Mabrouk, Nadia

    Given a phylogenetic tree involving Whole Genome Duplication events, we contribute to the problem of computing the rearrangement distance on a branch of a tree linking a duplication node d to a speciation node or a leaf s. In the case of a genome G at s containing exactly two copies of each gene, the genome halving problem is to find a perfectly duplicated genome D at d minimizing the rearrangement distance with G. We generalize the existing exact linear-time algorithm for genome halving to the case of a genome G with missing gene copies. In the case of a known ancestral duplicated genome D, we develop a greedy approach for computing the distance between G and D that is shown time-efficient and very accurate for both the rearrangement and DCJ distances.

  8. Ancestral dichlorodiphenyltrichloroethane (DDT) exposure promotes epigenetic transgenerational inheritance of obesity

    PubMed Central

    2013-01-01

    Background Ancestral environmental exposures to a variety of environmental factors and toxicants have been shown to promote the epigenetic transgenerational inheritance of adult onset disease. The present work examined the potential transgenerational actions of the insecticide dichlorodiphenyltrichloroethane (DDT) on obesity and associated disease. Methods Outbred gestating female rats were transiently exposed to a vehicle control or DDT and the F1 generation offspring bred to generate the F2 generation and F2 generation bred to generate the F3 generation. The F1 and F3 generation control and DDT lineage rats were aged and various pathologies investigated. The F3 generation male sperm were collected to investigate methylation between the control and DDT lineage male sperm. Results The F1 generation offspring (directly exposed as a fetus) derived from the F0 generation exposed gestating female rats were not found to develop obesity. The F1 generation DDT lineage animals did develop kidney disease, prostate disease, ovary disease and tumor development as adults. Interestingly, the F3 generation (great grand-offspring) had over 50% of males and females develop obesity. Several transgenerational diseases previously shown to be associated with metabolic syndrome and obesity were observed in the testis, ovary and kidney. The transgenerational transmission of disease was through both female (egg) and male (sperm) germlines. F3 generation sperm epimutations, differential DNA methylation regions (DMR), induced by DDT were identified. A number of the genes associated with the DMR have previously been shown to be associated with obesity. Conclusions Observations indicate ancestral exposure to DDT can promote obesity and associated disease transgenerationally. The etiology of disease such as obesity may be in part due to environmentally induced epigenetic transgenerational inheritance. PMID:24228800

  9. Palaeohistological Evidence for Ancestral High Metabolic Rate in Archosaurs.

    PubMed

    Legendre, Lucas J; Guénard, Guillaume; Botha-Brink, Jennifer; Cubo, Jorge

    2016-11-01

    Metabolic heat production in archosaurs has played an important role in their evolutionary radiation during the Mesozoic, and their ancestral metabolic condition has long been a matter of debate in systematics and palaeontology. The study of fossil bone histology provides crucial information on bone growth rate, which has been used to indirectly investigate the evolution of thermometabolism in archosaurs. However, no quantitative estimation of metabolic rate has ever been performed on fossils using bone histological features. Moreover, to date, no inference model has included phylogenetic information in the form of predictive variables. Here we performed statistical predictive modeling using the new method of phylogenetic eigenvector maps on a set of bone histological features for a sample of extant and extinct vertebrates, to estimate metabolic rates of fossil archosauromorphs. This modeling procedure serves as a case study for eigenvector-based predictive modeling in a phylogenetic context, as well as an investigation of the poorly known evolutionary patterns of metabolic rate in archosaurs. Our results show that Mesozoic theropod dinosaurs exhibit metabolic rates very close to those found in modern birds, that archosaurs share a higher ancestral metabolic rate than that of extant ectotherms, and that this derived high metabolic rate was acquired at a much more inclusive level of the phylogenetic tree, among non-archosaurian archosauromorphs. These results also highlight the difficulties of assigning a given heat production strategy (i.e., endothermy, ectothermy) to an estimated metabolic rate value, and confirm findings of previous studies that the definition of the endotherm/ectotherm dichotomy may be ambiguous.

  10. Complexity Reduction of Polymorphic Sequences (CRoPS™): A Novel Approach for Large-Scale Polymorphism Discovery in Complex Genomes

    PubMed Central

    van Orsouw, Nathalie J.; Hogers, René C. J.; Janssen, Antoine; Yalcin, Feyruz; Snoeijers, Sandor; Verstege, Esther; Schneiders, Harrie; van der Poel, Hein; van Oeveren, Jan; Verstegen, Harold; van Eijk, Michiel J. T.

    2007-01-01

    Application of single nucleotide polymorphisms (SNPs) is revolutionizing human bio-medical research. However, discovery of polymorphisms in low polymorphic species is still a challenging and costly endeavor, despite widespread availability of Sanger sequencing technology. We present CRoPS™ as a novel approach for polymorphism discovery by combining the power of reproducible genome complexity reduction of AFLP® with Genome Sequencer (GS) 20/GS FLX next-generation sequencing technology. With CRoPS, hundreds-of-thousands of sequence reads derived from complexity-reduced genome sequences of two or more samples are processed and mined for SNPs using a fully-automated bioinformatics pipeline. We show that over 75% of putative maize SNPs discovered using CRoPS are successfully converted to SNPWave® assays, confirming them to be true SNPs derived from unique (single-copy) genome sequences. By using CRoPS, polymorphism discovery will become affordable in organisms with high levels of repetitive DNA in the genome and/or low levels of polymorphism in the (breeding) germplasm without the need for prior sequence information. PMID:18000544

  11. Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle

    PubMed Central

    Aranda, M.; Li, Y.; Liew, Y. J.; Baumgarten, S.; Simakov, O.; Wilson, M. C.; Piel, J.; Ashoor, H.; Bougouffa, S.; Bajic, V. B.; Ryu, T.; Ravasi, T.; Bayer, T.; Micklem, G.; Kim, H.; Bhak, J.; LaJeunesse, T. C.; Voolstra, C. R.

    2016-01-01

    Despite half a century of research, the biology of dinoflagellates remains enigmatic: they defy many functional and genetic traits attributed to typical eukaryotic cells. Genomic approaches to study dinoflagellates are often stymied due to their large, multi-gigabase genomes. Members of the genus Symbiodinium are photosynthetic endosymbionts of stony corals that provide the foundation of coral reef ecosystems. Their smaller genome sizes provide an opportunity to interrogate evolution and functionality of dinoflagellate genomes and endosymbiosis. We sequenced the genome of the ancestral Symbiodinium microadriaticum and compared it to the genomes of the more derived Symbiodinium minutum and Symbiodinium kawagutii and eukaryote model systems as well as transcriptomes from other dinoflagellates. Comparative analyses of genome and transcriptome protein sets show that all dinoflagellates, not only Symbiodinium, possess significantly more transmembrane transporters involved in the exchange of amino acids, lipids, and glycerol than other eukaryotes. Importantly, we find that only Symbiodinium harbor an extensive transporter repertoire associated with the provisioning of carbon and nitrogen. Analyses of these transporters show species-specific expansions, which provides a genomic basis to explain differential compatibilities to an array of hosts and environments, and highlights the putative importance of gene duplications as an evolutionary mechanism in dinoflagellates and Symbiodinium. PMID:28004835

  12. Genome-wide DNA markers to support genetic management for domestication and commercial production in a large rodent, the Ghanaian grasscutter (Thryonomys swinderianus).

    PubMed

    Adenyo, C; Ogden, R; Kayang, B; Onuma, M; Nakajima, N; Inoue-Murayama, M

    2017-02-01

    Domestication and commercial production of the grasscutter, Thryonomys swinderianus, a large rodent, represents an important opportunity to secure sustainable animal protein for local communities in West Africa. To support production, DNA markers are required for population diversity assessment, pedigree analysis and marker-assisted selection. This study reports the application of double-digest RAD sequencing to simultaneously discover and genotype SNP markers in 24 wild and recently domesticated grasscutters. An initial panel of 1209 SNP loci was characterised from a total of more than 21 000 candidate loci containing single SNPs. This genome-wide resource represents the first application of its type to commercial production of a large rodent for food and advances the use of agricultural genomics in Ghana.

  13. Population Genomics of sub-saharan Drosophila melanogaster: African diversity and non-African admixture.

    PubMed

    Pool, John E; Corbett-Detig, Russell B; Sugino, Ryuichi P; Stevens, Kristian A; Cardeno, Charis M; Crepeau, Marc W; Duchen, Pablo; Emerson, J J; Saelao, Perot; Begun, David J; Langley, Charles H

    2012-01-01

    Drosophila melanogaster has played a pivotal role in the development of modern population genetics. However, many basic questions regarding the demographic and adaptive history of this species remain unresolved. We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection within an African population, between African populations, and between European and African populations. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa F(ST) were found to be enriched in genomic regions of locally elevated

  14. A large-scale zebrafish gene knockout resource for the genome-wide study of gene function.

    PubMed

    Varshney, Gaurav K; Lu, Jing; Gildea, Derek E; Huang, Haigen; Pei, Wuhong; Yang, Zhongan; Huang, Sunny C; Schoenfeld, David; Pho, Nam H; Casero, David; Hirase, Takashi; Mosbrook-Davis, Deborah; Zhang, Suiyuan; Jao, Li-En; Zhang, Bo; Woods, Ian G; Zimmerman, Steven; Schier, Alexander F; Wolfsberg, Tyra G; Pellegrini, Matteo; Burgess, Shawn M; Lin, Shuo

    2013-04-01

    With the completion of the zebrafish genome sequencing project, it becomes possible to analyze the function of zebrafish genes in a systematic way. The first step in such an analysis is to inactivate each protein-coding gene by targeted or random mutation. Here we describe a streamlined pipeline using proviral insertions coupled with high-throughput sequencing and mapping technologies to widely mutagenize genes in the zebrafish genome. We also report the first 6144 mutagenized and archived F1's predicted to carry up to 3776 mutations in annotated genes. Using in vitro fertilization, we have rescued and characterized ~0.5% of the predicted mutations, showing mutation efficacy and a variety of phenotypes relevant to both developmental processes and human genetic diseases. Mutagenized fish lines are being made freely available to the public through the Zebrafish International Resource Center. These fish lines establish an important milestone for zebrafish genetics research and should greatly facilitate systematic functional studies of the vertebrate genome.

  15. Origin of avian genome size and structure in non-avian dinosaurs.

    PubMed

    Organ, Chris L; Shedlock, Andrew M; Meade, Andrew; Pagel, Mark; Edwards, Scott V

    2007-03-08

    Avian genomes are small and streamlined compared with those of other amniotes by virtue of having fewer repetitive elements and less non-coding DNA. This condition has been suggested to represent a key adaptation for flight in birds, by reducing the metabolic costs associated with having large genome and cell sizes. However, the evolution of genome architecture in birds, or any other lineage, is difficult to study because genomic information is often absent for long-extinct relatives. Here we use a novel bayesian comparative method to show that bone-cell size correlates well with genome size in extant vertebrates, and hence use this relationship to estimate the genome sizes of 31 species of extinct dinosaur, including several species of extinct birds. Our results indicate that the small genomes typically associated with avian flight evolved in the saurischian dinosaur lineage between 230 and 250 million years ago, long before this lineage gave rise to the first birds. By comparison, ornithischian dinosaurs are inferred to have had much larger genomes, which were probably typical for ancestral Dinosauria. Using comparative genomic data, we estimate that genome-wide interspersed mobile elements, a class of repetitive DNA, comprised 5-12% of the total genome size in the saurischian dinosaur lineage, but was 7-19% of total genome size in ornithischian dinosaurs, suggesting that repetitive elements became less active in the saurischian lineage. These genomic characteristics should be added to the list of attributes previously considered avian but now thought to have arisen in non-avian dinosaurs, such as feathers, pulmonary innovations, and parental care and nesting.

  16. Genome sequence reveals that Pseudomonas fluorescens F113 possesses a large and diverse array of systems for rhizosphere function and host interaction

    PubMed Central

    2013-01-01

    Background Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) isolated from the sugar-beet rhizosphere. This bacterium has been extensively studied as a model strain for genetic regulation of secondary metabolite production in P. fluorescens, as a candidate biocontrol agent against phytopathogens, and as a heterologous host for expression of genes with biotechnological application. The F113 genome sequence and annotation has been recently reported. Results Comparative analysis of 50 genome sequences of strains belonging to the P. fluorescens group has revealed the existence of five distinct subgroups. F113 belongs to subgroup I, which is mostly composed of strains classified as P. brassicacearum. The core genome of these five strains is highly conserved and represents approximately 76% of the protein-coding genes in any given genome. Despite this strong conservation, F113 also contains a large number of unique protein-coding genes that encode traits potentially involved in the rhizocompetence of this strain. These features include protein coding genes required for denitrification, diterpenoids catabolism, motility and chemotaxis, protein secretion and production of antimicrobial compounds and insect toxins. Conclusions The genome of P. fluorescens F113 is composed of numerous protein-coding genes, not usually found together in previously sequenced genomes, which are potentially decisive during the colonisation of the rhizosphere and/or interaction with other soil organisms. This includes genes encoding proteins involved in the production of a second flagellar apparatus, the use of abietic acid as a growth substrate, the complete denitrification pathway, the possible production of a macrolide antibiotic and the assembly of multiple protein secretion systems. PMID:23350846

  17. Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes

    PubMed Central

    2012-01-01

    Background The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic group that consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive genome comparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50 conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryotic viruses. Results We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrained tree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the core genes have been independently acquired from different sources by different NCLDV lineages whereas for the majority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDV was demonstrated. Conclusions A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexity and diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherence between the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a common ancestral virus although the set of ancestral genes might be smaller than previously inferred from patterns of gene presence-absence. PMID:22891861

  18. Estimation of the ancestral effective population sizes of African great apes under different selection regimes.

    PubMed

    Schrago, Carlos G

    2014-08-01

    Reliable estimates of ancestral effective population sizes are necessary to unveil the population-level phenomena that shaped the phylogeny and molecular evolution of the African great apes. Although several methods have previously been applied to infer ancestral effective population sizes, an analysis of the influence of the selective regime on the estimates of ancestral demography has not been thoroughly conducted. In this study, three independent data sets under different selective regimes were used were composed to tackle this issue. The results showed that selection had a significant impact on the estimates of ancestral effective population sizes of the African great apes. The inference of the ancestral demography of African great apes was affected by the selection regime. The effects, however, were not homogeneous along the ancestral populations of great apes. The effective population size of the ancestor of humans and chimpanzees was more impacted by the selection regime when compared to the same parameter in the ancestor of humans, chimpanzees and gorillas. Because the selection regime influenced the estimates of ancestral effective population size, it is reasonable to assume that a portion of the discrepancy found in previous studies that inferred the ancestral effective population size may be attributable to the differential action of selection on the genes sampled.

  19. Estimating Ancestral Ranges: Testing Methods with a Clade of Neotropical Lizards (Iguania: Liolaemidae)

    PubMed Central

    Díaz Gómez, Juan Manuel

    2011-01-01

    Establishing the ancestral ranges of distribution of a monophyletic clade, called the ancestral area, is one of the central objectives of historical biogeography. In this study, I used three common methodologies to establish the ancestral area of an important clade of Neotropical lizards, the family Liolaemidae. The methods used were: Fitch optimization, Weighted Ancestral Area Analysis and Dispersal-Vicariance Analysis (DIVA). A main difference from previous studies is that the areas used in the analysis are defined based on actual distributions of the species of Liolaemidae, instead of areas defined arbitrarilyor based on other taxa. The ancestral area of Liolaemidae found by Fitch optimization is Prepuna on Argentina, Central Chile and Coastal Peru. Weighted Ancestral Area Analysis found Central Chile, Coquimbo, Payunia, Austral Patagonia and Coastal Peru. Dispersal-Vicariance analysis found an ancestral area that includes almost all the areas occupied by Liolaemidae, except Atacama, Coquimbo and Austral Patagonia. The results can be resumed on two opposing hypothesis: a restricted ancestral area for the ancestor of Liolaemidae in Central Chile and Patagonia, or a widespread ancestor distributed along the Andes. Some limitations of the methods were identified, for example the excessive importance of plesiomorphic areas in the cladograms. PMID:22028873

  20. Adaptive Memory: Ancestral Priorities and the Mnemonic Value of Survival Processing

    ERIC Educational Resources Information Center

    Nairne, James S.; Pandeirada, Josefa N. S.

    2010-01-01

    Evolutionary psychologists often propose that humans carry around "stone-age" brains, along with a toolkit of cognitive adaptations designed originally to solve hunter-gatherer problems. This perspective predicts that optimal cognitive performance might sometimes be induced by ancestrally-based problems, those present in ancestral environments,…

  1. Small but Powerful, the Primary Endosymbiont of Moss Bugs, Candidatus Evansia muelleri, Holds a Reduced Genome with Large Biosynthetic Capabilities

    PubMed Central

    Santos-Garcia, Diego; Latorre, Amparo; Moya, Andrés; Gibbs, George; Hartung, Viktor; Dettner, Konrad; Kuechler, Stefan Martin; Silva, Francisco J.

    2014-01-01

    Moss bugs (Coleorrhyncha: Peloridiidae) are members of the order Hemiptera, and like many hemipterans, they have symbiotic associations with intracellular bacteria to fulfill nutritional requirements resulting from their unbalanced diet. The primary endosymbiont of the moss bugs, Candidatus Evansia muelleri, is phylogenetically related to Candidatus Carsonella ruddii and Candidatus Portiera aleyrodidarum, primary endosymbionts of psyllids and whiteflies, respectively. In this work, we report the genome of Candidatus Evansia muelleri Xc1 from Xenophyes cascus, which is the only obligate endosymbiont present in the association. This endosymbiont possesses an extremely reduced genome similar to Carsonella and Portiera. It has crossed the borderline to be considered as an autonomous cell, requiring the support of the insect host for some housekeeping cell functions. Interestingly, in spite of its small genome size, Evansia maintains enriched amino acid (complete or partial pathways for ten essential and six nonessential amino acids) and sulfur metabolisms, probably related to the poor diet of the insect, based on bryophytes, which contains very low levels of nitrogenous and sulfur compounds. Several facts, including the congruence of host (moss bugs, whiteflies, and psyllids) and endosymbiont phylogenies and the retention of the same ribosomal RNA operon during genome reduction in Evansia, Portiera, and Carsonella, suggest the existence of an ancient endosymbiotic Halomonadaceae clade associated with Hemiptera. Three possible scenarios for the origin of these three primary endosymbiont genera are proposed and discussed. PMID:25115011

  2. Small but powerful, the primary endosymbiont of moss bugs, Candidatus Evansia muelleri, holds a reduced genome with large biosynthetic capabilities.

    PubMed

    Santos-Garcia, Diego; Latorre, Amparo; Moya, Andrés; Gibbs, George; Hartung, Viktor; Dettner, Konrad; Kuechler, Stefan Martin; Silva, Francisco J

    2014-07-01

    Moss bugs (Coleorrhyncha: Peloridiidae) are members of the order Hemiptera, and like many hemipterans, they have symbiotic associations with intracellular bacteria to fulfill nutritional requirements resulting from their unbalanced diet. The primary endosymbiont of the moss bugs, Candidatus Evansia muelleri, is phylogenetically related to Candidatus Carsonella ruddii and Candidatus Portiera aleyrodidarum, primary endosymbionts of psyllids and whiteflies, respectively. In this work, we report the genome of Candidatus Evansia muelleri Xc1 from Xenophyes cascus, which is the only obligate endosymbiont present in the association. This endosymbiont possesses an extremely reduced genome similar to Carsonella and Portiera. It has crossed the borderline to be considered as an autonomous cell, requiring the support of the insect host for some housekeeping cell functions. Interestingly, in spite of its small genome size, Evansia maintains enriched amino acid (complete or partial pathways for ten essential and six nonessential amino acids) and sulfur metabolisms, probably related to the poor diet of the insect, based on bryophytes, which contains very low levels of nitrogenous and sulfur compounds. Several facts, including the congruence of host (moss bugs, whiteflies, and psyllids) and endosymbiont phylogenies and the retention of the same ribosomal RNA operon during genome reduction in Evansia, Portiera, and Carsonella, suggest the existence of an ancient endosymbiotic Halomonadaceae clade associated with Hemiptera. Three possible scenarios for the origin of these three primary endosymbiont genera are proposed and discussed.

  3. Large-scale reduction of the Bacillus subtilis genome: consequences for the transcriptional network, resource allocation, and metabolism.

    PubMed

    Reuß, Daniel R; Altenbuchner, Josef; Mäder, Ulrike; Rath, Hermann; Ischebeck, Till; Sappa, Praveen Kumar; Thürmer, Andrea; Guérin, Cyprien; Nicolas, Pierre; Steil, Leif; Zhu, Bingyao; Feussner, Ivo; Klumpp, Stefan; Daniel, Rolf; Commichau, Fabian M; Völker, Uwe; Stülke, Jörg

    2017-02-01

    Understanding cellular life requires a comprehensive knowledge of the essential cellular functions, the components involved, and their interactions. Minimized genomes are an important tool to gain this knowledge. We have constructed strains of the model bacterium, Bacillus subtilis, whose genomes have been reduced by ∼36%. These strains are fully viable, and their growth rates in complex medium are comparable to those of wild type strains. An in-depth multi-omics analysis of the genome reduced strains revealed how the deletions affect the transcription regulatory network of the cell, translation resource allocation, and metabolism. A comparison of gene counts and resource allocation demonstrates drastic differences in the two parameters, with 50% of the genes using as little as 10% of translation capacity, whereas the 6% essential genes require 57% of the translation resources. Taken together, the results are a valuable resource on gene dispensability in B. subtilis, and they suggest the roads to further genome reduction to approach the final aim of a minimal cell in which all functions are understood.

  4. Genes Suggest Ancestral Colour Polymorphisms Are Shared across Morphologically Cryptic Species in Arctic Bumblebees.

    PubMed

    Williams, Paul H; Byvaltsev, Alexandr M; Cederberg, Björn; Berezin, Mikhail V; Ødegaard, Frode; Rasmussen, Claus; Richardson, Leif L; Huang, Jiaxing; Sheffield, Cory S; Williams, Suzanne T

    2015-01-01

    Our grasp of biodiversity is fine-tuned through the process of revisionary taxonomy. If species do exist in nature and can be discovered with available techniques, then we expect these revisions to converge on broadly shared interpretations of species. But for the primarily arctic bumblebees of the subgenus Alpinobombus of the genus Bombus, revisions by some of the most experienced specialists are unusual for bumblebees in that they have all reached different conclusions on the number of species present. Recent revisions based on skeletal morphology have concluded that there are from four to six species, while variation in colour pattern of the hair raised questions as to whether at least seven species might be present. Even more species are supported if we accept the recent move away from viewing species as morphotypes to viewing them instead as evolutionarily independent lineages (EILs) using data from genes. EILs are recognised here in practice from the gene coalescents that provide direct evidence for their evolutionary independence. We show from fitting both general mixed Yule/coalescent (GMYC) models and Poisson-tree-process (PTP) models to data for the mitochondrial COI gene that there is support for nine species in the subgenus Alpinobombus. Examination of the more slowly evolving nuclear PEPCK gene shows further support for a previously unrecognised taxon as a new species in northwestern North America. The three pairs of the most morphologically similar sister species are separated allopatrically and prevented from interbreeding by oceans. We also find that most of the species show multiple shared colour patterns, giving the appearance of mimicry among parts of the different species. However, reconstructing ancestral colour-pattern states shows that speciation is likely to have cut across widespread ancestral polymorphisms, without or largely without convergence. In the particular case of Alpinobombus, morphological, colour-pattern, and genetic groups show

  5. Molecular phylogeny of extant equids and effects of ancestral polymorphism in resolving species-level phylogenies.

    PubMed

    Steiner, Cynthia C; Mitelberg, Anna; Tursi, Rosanna; Ryder, Oliver A

    2012-11-01

    Short divergence times and processes such as incomplete lineage sorting and species hybridization are known to hinder the inference of species-level phylogenies due to the lack of sufficient informative genetic variation or the presence of shared but incongruent polymorphism among taxa. Extant equids (horses, zebras, and asses) are an example of a recently evolved group of mammals with an unresolved phylogeny, despite a large number of molecular studies. Previous surveys have proposed trees with rather poorly supported nodes, and the bias caused by genetic introgression or ancestral polymorphism has not been assessed. Here we studied the phylogenetic relationships of all extant species of Equidae by analyzing 22 partial mitochondrial and nuclear genes using maximum likelihood and Bayesian inferences that account for heterogeneous gene histories. We also examined genetic signatures of lineage sorting and/or genetic introgression in zebras by evaluating patterns of intraspecific genetic variation. Our study improved the resolution and support of the Equus phylogeny and in particular the controversial positions of the African wild ass (E. asinus) and mountain zebra (E. zebra): the African wild ass is placed as a sister species of the Asiatic asses and the mountain zebra as the sister taxon of Grevy's and Burchell's zebras. A shared polymorphism (indel) detected among zebra species in the Estrogen receptor 1 gene was likely due to incomplete lineage sorting and not genetic introgression as also indicated by other mitochondrial (Cytochrome b) and nuclear (Y chromosome and microsatellites) markers. Ancestral polymorphism in equids might have contributed to the long-standing lack of clarity in the phylogeny of this highly threatened group of mammals.

  6. Genes Suggest Ancestral Colour Polymorphisms Are Shared across Morphologically Cryptic Species in Arctic Bumblebees

    PubMed Central

    Williams, Paul H.; Byvaltsev, Alexandr M.; Cederberg, Björn; Berezin, Mikhail V.; Ødegaard, Frode; Rasmussen, Claus; Richardson, Leif L.; Huang, Jiaxing; Sheffield, Cory S.; Williams, Suzanne T.

    2015-01-01

    Our grasp of biodiversity is fine-tuned through the process of revisionary taxonomy. If species do exist in nature and can be discovered with available techniques, then we expect these revisions to converge on broadly shared interpretations of species. But for the primarily arctic bumblebees of the subgenus Alpinobombus of the genus Bombus, revisions by some of the most experienced specialists are unusual for bumblebees in that they have all reached different conclusions on the number of species present. Recent revisions based on skeletal morphology have concluded that there are from four to six species, while variation in colour pattern of the hair raised questions as to whether at least seven species might be present. Even more species are supported if we accept the recent move away from viewing species as morphotypes to viewing them instead as evolutionarily independent lineages (EILs) using data from genes. EILs are recognised here in practice from the gene coalescents that provide direct evidence for their evolutionary independence. We show from fitting both general mixed Yule/coalescent (GMYC) models and Poisson-tree-process (PTP) models to data for the mitochondrial COI gene that there is support for nine species in the subgenus Alpinobombus. Examination of the more slowly evolving nuclear PEPCK gene shows further support for a previously unrecognised taxon as a new species in northwestern North America. The three pairs of the most morphologically similar sister species are separated allopatrically and prevented from interbreeding by oceans. We also find that most of the species show multiple shared colour patterns, giving the appearance of mimicry among parts of the different species. However, reconstructing ancestral colour-pattern states shows that speciation is likely to have cut across widespread ancestral polymorphisms, without or largely without convergence. In the particular case of Alpinobombus, morphological, colour-pattern, and genetic groups show

  7. The narrow sheath duplicate genes: sectors of dual aneuploidy reveal ancestrally conserved gene functions during maize leaf development.

    PubMed Central

    Scanlon, M J; Chen, K D; McKnight CC, I V

    2000-01-01

    The narrow sheath mutant of maize displays a leaf and plant stature phenotype controlled by the duplicate factor mutations narrow sheath1 and narrow sheath2. Mutant leaves fail to develop a lateral domain that includes the leaf margins. Genetic data are presented to show that the narrow sheath mutations map to duplicated chromosomal regions, reflecting an ancestral duplication of the maize genome. Genetic and cytogenetic evidence indicates that the original mutation at narrow sheath2 is associated with a chromosomal inversion on the long arm of chromosome 4. Meristematic sectors of dual aneuploidy were generated, producing plants genetically mosaic for NARROW SHEATH function. These mosaic plants exhibited characteristic half-plant phenotypes, in which leaves from one side of the plant were of nonmutant morphology and leaves from the opposite side were of narrow sheath mutant phenotype. The data suggest that the narrow sheath duplicate genes may perform ancestrally conserved, redundant functions in the development of a lateral domain in the maize leaf. PMID:10880496

  8. Phylogenetic analysis of a newfound bat-borne hantavirus supports a laurasiatherian host association for ancestral mammalian hantaviruses.

    PubMed

    Witkowski, Peter T; Drexler, Jan F; Kallies, René; Ličková, Martina; Bokorová, Silvia; Mananga, Gael D; Szemes, Tomáš; Leroy, Eric M; Krüger, Detlev H; Drosten, Christian; Klempa, Boris

    2016-07-01

    Until recently, hantaviruses (family Bunyaviridae) were believed to originate from rodent reservoirs. However, genetically distinct hantaviruses were lately found in shrews and moles, as well as in bats from Africa and Asia. Bats (order Chiroptera) are considered important reservoir hosts for emerging human pathogens. Here, we report on the identification of a novel hantavirus, provisionally named Makokou virus (MAKV), in Noack's Roundleaf Bat (Hipposideros ruber) in Gabon, Central Africa. Phylogenetic analysis of the genomic l-segment showed that MAKV was the most closely related to other bat-borne hantaviruses and shared a most recent common ancestor with the Asian hantaviruses Xuan Son and Laibin. Breakdown of the virus load in a bat animal showed that MAKV resembles rodent-borne hantaviruses in its organ distribution in that it predominantly occurred in the spleen and kidney; this provides a first insight into the infection pattern of bat-borne hantaviruses. Ancestral state reconstruction based on a tree of l gene sequences of all relevant hantavirus lineages was combined with phylogenetic fossil host hypothesis testing, leading to a statistically significant rejection of the mammalian superorder Euarchontoglires (including rodents) but not the superorder Laurasiatheria (including shrews, moles, and bats) as potential hosts of ancestral hantaviruses at most basal tree nodes. Our data supports the emerging concept of bats as previously overlooked hantavirus reservoir hosts.

  9. Streptococcus thermophilus Biofilm Formation: A Remnant Trait of Ancestral Commensal Life?

    PubMed Central

    Gautier, Céline; Renault, Pierre; Briandet, Romain; Guédon, Eric

    2015-01-01

    Microorganisms have a long history of use in food production and preservation. Their adaptation to food environments has profoundly modified their features, mainly through genomic flux. Streptococcus thermophilus, one of the most frequent starter culture organisms consumed daily by humans emerged recently from a commensal ancestor. As such, it is a useful model for genomic studies of bacterial domestication processes. Many streptococcal species form biofilms, a key feature of the major lifestyle of these bacteria in nature. However, few descriptions of S. thermophilus biofilms have been reported. An analysis of the ability of a representative collection of natural isolates to form biofilms revealed that S. thermophilus was a poor biofilm producer and that this characteristic was associated with an inability to attach firmly to surfaces. The identification of three biofilm-associated genes in the strain producing the most biofilms shed light on the reasons for the rarity of this trait in this species. These genes encode proteins involved in crucial stages of biofilm formation and are heterogeneously distributed between strains. One of the biofilm genes appears to have been acquired by horizontal transfer. The other two are located in loci presenting features of reductive evolution, and are absent from most of the strains analyzed. Their orthologs in commensal bacteria are involved in adhesion to host cells, suggesting that they are remnants of ancestral functions. The biofilm phenotype appears to be a commensal trait that has been lost during the genetic domestication of S. thermophilus, consistent with its adaptation to the milk environment and the selection of starter strains for dairy fermentations. PMID:26035177

  10. Streptococcus thermophilus Biofilm Formation: A Remnant Trait of Ancestral Commensal Life?

    PubMed

    Couvigny, Benoit; Thérial, Claire; Gautier, Céline; Renault, Pierre; Briandet, Romain; Guédon, Eric

    2015-01-01

    Microorganisms have a long history of use in food production and preservation. Their adaptation to food environments has profoundly modified their features, mainly through genomic flux. Streptococcus thermophilus, one of the most frequent starter culture organisms consumed daily by humans emerged recently from a commensal ancestor. As such, it is a useful model for genomic studies of bacterial domestication processes. Many streptococcal species form biofilms, a key feature of the major lifestyle of these bacteria in nature. However, few descriptions of S. thermophilus biofilms have been reported. An analysis of the ability of a representative collection of natural isolates to form biofilms revealed that S. thermophilus was a poor biofilm producer and that this characteristic was associated with an inability to attach firmly to surfaces. The identification of three biofilm-associated genes in the strain producing the most biofilms shed light on the reasons for the rarity of this trait in this species. These genes encode proteins involved in crucial stages of biofilm formation and are heterogeneously distributed between strains. One of the biofilm genes appears to have been acquired by horizontal transfer. The other two are located in loci presenting features of reductive evolution, and are absent from most of the strains analyzed. Their orthologs in commensal bacteria are involved in adhesion to host cells, suggesting that they are remnants of ancestral functions. The biofilm phenotype appears to be a commensal trait that has been lost during the genetic domestication of S. thermophilus, consistent with its adaptation to the milk environment and the selection of starter strains for dairy fermentations.

  11. Evolutionary site-number changes of ribosomal DNA loci during speciation: complex scenarios of ancestral and more recent polyploid events

    PubMed Central

    Rosato, Marcela; Moreno-Saiz, Juan C.; Galián, José A.; Rosselló, Josep A.

    2015-01-01

    Several genome duplications have been identified in the evolution of seed plants, providing unique systems for studying karyological processes promoting diversification and speciation. Knowledge about the number of ribosomal DNA (rDNA) loci, together with their chromosomal distribution and structure, provides clues about organismal and molecular evolution at various phylogenetic levels. In this work, we aim to elucidate the evolutionary dynamics of karyological and rDNA site-number variation in all known taxa of subtribe Vellinae, showing a complex scenario of ancestral and more recent polyploid events. Specifically, we aim to infer the ancestral chromosome numbers and patterns of chromosome number variation, assess patterns of variation of both 45S and 5S rDNA families, trends in site-number change of rDNA loci within homoploid and polyploid series, and reconstruct the evolutionary history of rDNA site number using a phylogenetic hypothesis as a framework. The best-fitting model of chromosome number evolution with a high likelihood score suggests that the Vellinae core showing x = 17 chromosomes arose by duplication events from a recent x = 8 ancestor. Our survey suggests more complex patterns of polyploid evolution than previously noted for Vellinae. High polyploidization events (6x, 8x) arose independently in the basal clade Vella castrilensis–V. lucentina, where extant diploid species are unknown. Reconstruction of ancestral rDNA states in Vellinae supports the inference that the ancestral number of loci in the subtribe was two for each multigene family, suggesting that an overall tendency towards a net loss of 5S rDNA loci occurred during the splitting of Vellinae ancestors from the remaining Brassiceae lineages. A contrasting pattern for rDNA site change in both paleopolyploid and neopolyploid species was linked to diversification of Vellinae lineages. This suggests dynamic and independent changes in rDNA site number during speciation processes and a

  12. Vanishing GC-rich isochores in mammalian genomes.

    PubMed Central

    Duret, Laurent; Semon, Marie; Piganeau, Gwenaël; Mouchiroud, Dominique; Galtier, Nicolas

    2002-01-01

    To understand the origin and evolution of isochores-the peculiar spatial distribution of GC content within mammalian genomes-we analyzed the synonymous substitution pattern in coding sequences from closely related species in different mammalian orders. In primate and cetartiodactyls, GC-rich genes are undergoing a large excess of GC --> AT substitutions over AT --> GC substitutions: GC-rich isochores are slowly disappearing from the genome of these two mammalian orders. In rodents, our analyses suggest both a decrease in GC content of GC-rich isochores and an increase in GC-poor isochores, but more data will be necessary to assess the significance of this pattern. These observations question the conclusions of previous works that assumed that base composition was at equilibrium. Analysis of allele frequency in human polymorphism data, however, confirmed that in the GC-rich parts of the genome, GC alleles have a higher probability of fixation than AT alleles. This fixation bias appears not strong enough to overcome the large excess of GC --> AT mutations. Thus, whatever the evolutionary force (neutral or selective) at the origin of GC-rich isochores, this force is no longer effective in mammals. We propose a model based on the biased gene conversion hypothesis that accounts for the origin of GC-rich isochores in the ancestral amniote genome and for their decline in present-day mammals. PMID:12524353

  13. Trans-Ancestral Studies Fine Map the SLE-Susceptibility Locus TNFSF4

    PubMed Central

    Manku, Harinder; Langefeld, Carl D.; Guerra, Sandra G.; Malik, Talat H.; Alarcon-Riquelme, Marta; Anaya, Juan-Manuel; Bae, Sang-Cheol; Boackle, Susan A.; Brown, Elizabeth E.; Criswell, Lindsey A.; Freedman, Barry I.; Gaffney, Patrick M.; Gregersen, Peter A.; Guthridge, Joel M.; Han, Sang-Hoon; Harley, John B.; Jacob, Chaim O.; James, Judith A.; Kamen, Diane L.; Kaufman, Kenneth M.; Kelly, Jennifer A.; Martin, Javier; Merrill, Joan T.; Moser, Kathy L.; Niewold, Timothy B.; Park, So-Yeon; Pons-Estel, Bernardo A.; Sawalha, Amr H.; Scofield, R. Hal; Shen, Nan; Stevens, Anne M.; Sun, Celi; Gilkeson, Gary S.; Edberg, Jeff C.; Kimberly, Robert P.; Nath, Swapan K.; Tsao, Betty P.; Vyse, Tim J.

    2013-01-01

    We previously established an 80 kb haplotype upstream of TNFSF4 as a susceptibility locus in the autoimmune disease SLE. SLE-associated alleles at this locus are associated with inflammatory disorders, including atherosclerosis and ischaemic stroke. In Europeans, the TNFSF4 causal variants have remained elusive due to strong linkage disequilibrium exhibited by alleles spanning the region. Using a trans-ancestral approach to fine-map the locus, utilising 17,900 SLE and control subjects including Amerindian/Hispanics (1348 cases, 717 controls), African-Americans (AA) (1529, 2048) and better powered cohorts of Europeans and East Asians, we find strong association of risk alleles in all ethnicities; the AA association replicates in African-American Gullah (152,122). The best evidence of association comes from two adjacent markers: rs2205960-T (P = 1.71×10−34, OR = 1.43[1.26–1.60]) and rs1234317-T (P = 1.16×10−28, OR = 1.38[1.24–1.54]). Inference of fine-scale recombination rates for all populations tested finds the 80 kb risk and non-risk haplotypes in all except African-Americans. In this population the decay of recombination equates to an 11 kb risk haplotype, anchored in the 5′ region proximal to TNFSF4 and tagged by rs2205960-T after 1000 Genomes phase 1 (v3) imputation. Conditional regression analyses delineate the 5′ risk signal to rs2205960-T and the independent non-risk signal to rs1234314-C. Our case-only and SLE-control cohorts demonstrate robust association of rs2205960-T with autoantibody production. The rs2205960-T is predicted to form part of a decameric motif which binds NF-κBp65 with increased affinity compared to rs2205960-G. ChIP-seq data also indicate NF-κB interaction with the DNA sequence at this position in LCL cells. Our research suggests association of rs2205960-T with SLE across multiple groups and an independent non-risk signal at rs1234314-C. rs2205960-T is associated with autoantibody production and

  14. Divergence in Enzymatic Activities in the Soybean GST Supergene Family Provides New Insight into the Evolutionary Dynamics of Whole-Genome Duplicates.

    PubMed

    Liu, Hai-Jing; Tang, Zhen-Xin; Han, Xue-Min; Yang, Zhi-Ling; Zhang, Fu-Min; Yang, Hai-Ling; Liu, Yan-Jing; Zeng, Qing-Yin

    2015-11-01

    Whole-genome duplication (WGD), or polyploidy, is a major force in plant genome evolution. A duplicate of all genes is present in the genome immediately following a WGD event. However, the evolutionary mechanisms responsible for the loss of, or retention and subsequent functional divergence of polyploidy-derived duplicates remain largely unknown. In this study we reconstructed the evolutionary history of the glutathione S-transferase (GST) gene family from the soybean genome, and identified 72 GST duplicated gene pairs formed by a recent Glycine-specific WGD event occurring approximately 13 Ma. We found that 72% of duplicated GST gene pairs experienced gene losses or pseudogenization, whereas 28% of GST gene pairs have been retained in the soybean genome. The GST pseudogenes were under relaxed selective constraints, whereas functional GSTs were subject to strong purifying selection. Plant GST genes play important roles in stress tolerance and detoxification metabolism. By examining the gene expression responses to abiotic stresses and enzymatic properties of the ancestral and current proteins, we found that polyploidy-derived GST duplicates show the divergence in enzymatic activities. Through site-directed mutagenesis of ancestral proteins, this study revealed that nonsynonymous substitutions of key amino acid sites play an important role in the divergence of enzymatic functions of polyploidy-derived GST duplicates. These findings provide new insights into the evolutionary and functional dynamics of polyploidy-derived duplicate genes.

  15. Serotype IV Streptococcus agalactiae ST-452 has arisen from large genomic recombination events between CC23 and the hypervirulent CC17 lineages

    PubMed Central

    Campisi, Edmondo; Rinaudo, C. Daniela; Donati, Claudio; Barucco, Mara; Torricelli, Giulia; Edwards, Morven S.; Baker, Carol J.; Margarit, Imma; Rosini, Roberto

    2016-01-01

    Streptococcus agalactiae (Group B Streptococcus, GBS) causes life-threatening infections in newborns and adults with chronic medical conditions. Serotype IV strains are emerging both among carriers and as cause of invasive disease and recent studies revealed two main Sequence Types (STs), ST-452 and ST-459 assigned to Clonal Complexes CC23 and CC1, respectively. Whole genome sequencing of 70 type IV GBS and subsequent phylogenetic analysis elucidated the localization of type IV isolates in a SNP-based phylogenetic tree and suggested that ST-452 could have originated through genetic recombination. SNPs density analysis of the core genome confirmed that the founder strain of this lineage originated from a single large horizontal gene transfer event between CC23 and the hypervirulent CC17. Indeed, ST-452 genomes are composed by two parts that are nearly identical to corresponding regions in ST-24 (CC23) and ST-291 (CC17). Chromosome mapping of the major GBS virulence factors showed that ST-452 strains have an intermediate yet unique profile among CC23 and CC17 strains. We described unreported large recombination events, involving the cps IV operon and resulting in the expansion of serotype IV to CC23. This work sheds further light on the evolution of GBS providing new insights on the recent emergence of serotype IV. PMID:27411639

  16. A large-scale introgression of genomic components of Brassica rapa into B. napus by the bridge of hexaploid derived from hybridization between B. napus and B. oleracea.

    PubMed

    Li, Qinfei; Mei, Jiaqin; Zhang, Yongjing; Li, Jiana; Ge, Xianhong; Li, Zaiyun; Qian, Wei

    2013-08-01

    Brassica rapa (AA) has been used to widen the genetic basis of B. napus (AACC), which is a new but important oilseed crop worldwide. In the present study, we have proposed a strategy to develop new type B. napus carrying genomic components of B. rapa by crossing B. rapa with hexaploid (AACCCC) derived from B. napus and B. oleracea (CC). The hexaploid exhibited large flowers and high frequency of normal chromosome segregation, resulting in good seed set (average of 4.48 and 12.53 seeds per pod by self and open pollination, respectively) and high pollen fertility (average of 87.05 %). It was easy to develop new type B. napus by crossing the hexaploid with 142 lines of B. rapa from three ecotype groups, with the average crossability of 9.24 seeds per pod. The genetic variation of new type B. napus was diverse from that of current B. napus, especially in the A subgenome, revealed by genome-specific simple sequence repeat markers. Our data suggest that the strategy proposed here is a large-scale and highly efficient method to introgress genomic components of B. rapa into B. napus.

  17. Characterization of high-copy-number retrotransposons from the large genomes of the louisiana iris species and their use as molecular markers.

    PubMed Central

    Kentner, Edward K; Arnold, Michael L; Wessler, Susan R

    2003-01-01

    The Louisiana iris species Iris brevicaulis and I. fulva are morphologically and karyotypically distinct yet frequently hybridize in nature. A group of high-copy-number TY3/gypsy-like retrotransposons was characterized from these species and used to develop molecular markers that take advantage of the abundance and distribution of these elements in the large iris genome. The copy number of these IRRE elements (for iris retroelement), is approximately 1 x 10(5), accounting for approximately 6-10% of the approximately 10,000-Mb haploid Louisiana iris genome. IRRE elements are transcriptionally active in I. brevicaulis and I. fulva and their F(1) and backcross hybrids. The LTRs of the elements are more variable than the coding domains and can be used to define several distinct IRRE subfamilies. Transposon display or S-SAP markers specific to two of these subfamilies have been developed and are highly polymorphic among wild-collected individuals of each species. As IRRE elements are present in each of 11 iris species tested, the marker system has the potential to provide valuable comparative data on the dynamics of retrotransposition in large plant genomes. PMID:12807789

  18. Serotype IV Streptococcus agalactiae ST-452 has arisen from large genomic recombination events between CC23 and the hypervirulent CC17 lineages.

    PubMed

    Campisi, Edmondo; Rinaudo, C Daniela; Donati, Claudio; Barucco, Mara; Torricelli, Giulia; Edwards, Morven S; Baker, Carol J; Margarit, Imma; Rosini, Roberto

    2016-07-14

    Streptococcus agalactiae (Group B Streptococcus, GBS) causes life-threatening infections in newborns and adults with chronic medical conditions. Serotype IV strains are emerging both among carriers and as cause of invasive disease and recent studies revealed two main Sequence Types (STs), ST-452 and ST-459 assigned to Clonal Complexes CC23 and CC1, respectively. Whole genome sequencing of 70 type IV GBS and subsequent phylogenetic analysis elucidated the localization of type IV isolates in a SNP-based phylogenetic tree and suggested that ST-452 could have originated through genetic recombination. SNPs density analysis of the core genome confirmed that the founder strain of this lineage originated from a single large horizontal gene transfer event between CC23 and the hypervirulent CC17. Indeed, ST-452 genomes are composed by two parts that are nearly identical to corresponding regions in ST-24 (CC23) and ST-291 (CC17). Chromosome mapping of the major GBS virulence factors showed that ST-452 strains have an intermediate yet unique profile among CC23 and CC17 strains. We described unreported large recombination events, involving the cps IV operon and resulting in the expansion of serotype IV to CC23. This work sheds further light on the evolution of GBS providing new insights on the recent emergence of serotype IV.

  19. Array comparative genomic hybridization reveals similarities between nodular lymphocyte predominant Hodgkin lymphoma and T cell/histiocyte rich large B cell lymphoma.

    PubMed

    Hartmann, Sylvia; Döring, Claudia; Vucic, Emily; Chan, Fong Chun; Ennishi, Daisuke; Tousseyn, Thomas; de Wolf-Peeters, Christiane; Perner, Sven; Wlodarska, Iwona; Steidl, Christian; Gascoyne, Randy D; Hansmann, Martin-Leo

    2015-05-01

    Nodular lymphocyte predominant Hodgkin lymphoma (NLPHL) and T cell/histiocyte rich large B cell lymphoma (THRLBCL) usually affect middle-aged men, show tumour cells with a B cell phenotype and a low tumour cell content. Whereas the clinical behaviour of NLPHL is indolent, THRLBCL presents with advanced stage disease and an aggressive behaviour. In the present study, array comparative genomic hybridization was performed in seven typical NLPHL, four THRLBCL-like NLPHL variants, six THRLBCL and four diffuse large B cell lymphomas (DLBCL) derived from NLPHL. The number of genomic aberrations was higher in THRLBCL compared with typical and THRLBCL-like variant of NLPHL. Gains of 2p16.1 and losses of 2p11.2 and 9p11.2 were commonly observed in typical and THRLBCL-like variants of NLPHL as well as THRLBCL. Gains of 2p16.1, affecting the REL locus were confirmed in an independent cohort. Expression of the REL protein was observed at similar frequencies in typical and THRLBCL-like variant of NLPHL as well as THRLBCL (33-38%). In conclusion, the present study reveals further similarities between NLPHL and THRLBCL on the genomic level, confirming that these entities are part of a pathobiological spectrum with common molecular features, but varying clinical presentations.

  20. Large-scale functional annotation and expanded implementations of the P{wHy} hybrid transposon in the Drosophila melanogaster genome.

    PubMed

    Myrick, Kyl V; Huet, François; Mohr, Stephanie E; Alvarez-García, Inés; Lu, Jeffrey T; Smith, Mark A; Crosby, Madeline A; Gelbart, William M

    2009-07-01

    Whole genome sequencing of the model organisms has created increased demand for efficient tools to facilitate the genome annotation efforts. Accordingly, we report the further implementations and analyses stemming from our publicly available P{wHy} library for Drosophila melanogaster. A two-step regime-large scale transposon mutagenesis followed by hobo-induced nested deletions-allows mutation saturation and provides significant enhancements to existing genomic coverage. We previously showed that, for a given starting insert, deletion saturation is readily obtained over a 60-kb interval; here, we perform a breakdown analysis of efficiency to identify rate-limiting steps in the process. Transrecombination, the hobo-induced recombination between two P{wHy} half molecules, was shown to further expand the P{wHy} mutational range, pointing to a potent, iterative process of transrecombination-reconstitution-transrecombination for alternating between very large and very fine-grained deletions in a self-contained manner. A number of strains also showed partial or complete repression of P{wHy} markers, depending on chromosome location, whereby asymmetric marker silencing allowed continuous phenotypic detection, indicating that P{wHy}-based saturational mutagenesis should be useful for the study of heterochromatin/positional effects.

  1. Evolution of genome size in pines (Pinus) and its life-history correlates: supertree analyses.

    PubMed

    Grotkopp, Eva; Rejmánek, Marcel; Sanderson, Michael J; Rost, Thomas L

    2004-08-01

    Genome size has been suggested to be a fundamental biological attribute in determining life-history traits in many groups of organisms. We examined the relationships between pine genome sizes and pine phylogeny, environmental factors (latitude, elevation, annual rainfall), and biological traits (latitudinal and elevational ranges, seed mass, minimum generation time, interval between large seed crops, seed dispersal mode, relative growth rate, measures of potential and actual invasiveness, and level of rarity). Genome sizes were determined for 60 pine taxa and then combined with published values to make a dataset encompassing 85 species, or 70% of species in the genus. Supertrees were constructed using 20 published source phylogenies. Ancestral genome size was estimated as 32 pg. Genome size has apparently remained stable or increased over evolutionary time in subgenus Strobus, while it has decreased in most subsections in subgenus Pinus. We analyzed relationships between genome size and life-history variables using cross-species correlations and phylogenetically independent contrasts derived from supertree constructions. The generally assumed positive relation between genome size and minimum generation time could not be confirmed in phylogenetically controlled analyses. We found that the strongest correlation was between genome size and seed mass. Because the growth quantities specific leaf area and leaf area ratio (and to a lesser extent relative growth rate) are strongly negatively related to seed mass, they were also negatively correlated with genome size. Northern latitudinal limit was negatively correlated with genome size. Invasiveness, particularly of wind-dispersed species, was negatively associated with both genome size and seed mass. Seed mass and its relationships with seed number, dispersal mode, and growth rate contribute greatly to the differences in life-history strategies of pines. Many life-history patterns are therefore indirectly, but

  2. Multiple Lineages of Ancient CR1 Retroposons Shaped the Early Genome Evolution of Amniotes

    PubMed Central

    Suh, Alexander; Churakov, Gennady; Ramakodi, Meganathan P.; Platt, Roy N.; Jurka, Jerzy; Kojima, Kenji K.; Caballero, Juan; Smit, Arian F.; Vliet, Kent A.; Hoffmann, Federico G.; Brosius, Jürgen; Green, Richard E.; Braun, Edward L.; Ray, David A.; Schmitz, Jürgen

    2015-01-01

    Chicken repeat 1 (CR1) retroposons are long interspersed elements (LINEs) that are ubiquitous within amniote genomes and constitute the most abundant family of transposed elements in birds, crocodilians, turtles, and snakes. They are also present in mammalian genomes, where they reside as numerous relics of ancient retroposition events. Yet, despite their relevance for understanding amniote genome evolution, the diversity and evolution of CR1 elements has never been studied on an amniote-wide level. We reconstruct the temporal and quantitative activity of CR1 subfamilies via presence/absence analyses across crocodilian phylogeny and comparative analyses of 12 crocodilian genomes, revealing relative genomic stasis of retroposition during genome evolution of extant Crocodylia. Our large-scale phylogenetic analysis of amniote CR1 subfamilies suggests the presence of at least seven ancient CR1 lineages in the amniote ancestor; and amniote-wide analyses of CR1 successions and quantities reveal differential retention (presence of ancient relics or recent activity) of these CR1 lineages across amniote genome evolution. Interestingly, birds and lepidosaurs retained the fewest ancient CR1 lineages among amniotes and also exhibit smaller genome sizes. Our study is the first to analyze CR1 evolution in a genome-wide and amniote-wide context and the data strongly suggest that the ancestral amniote genome contained myriad CR1 elements from multiple ancient lineages, and remnants of these are still detectable in the relatively stable genomes of crocodilians and turtles. Early mammalian genome evolution was thus characterized by a drastic shift from CR1 prevalence to dominance and hyperactivity of L2 LINEs in monotremes and L1 LINEs in therians. PMID:25503085

  3. Ancestral resurrection reveals evolutionary mechanisms of kinase plasticity

    PubMed Central

    Howard, Conor J; Hanson-Smith, Victor; Kennedy, Kristopher J; Miller, Chad J; Lou, Hua Jane; Johnson, Alexander D; Turk, Benjamin E; Holt, Liam J

    2014-01-01

    Protein kinases have evolved diverse specificities to enable cellular information processing. To gain insight into the mechanisms underlying kinase diversification, we studied the CMGC protein kinases using ancestral reconstruction. Within this group, the cyclin dependent kinases (CDKs) and mitogen activated protein kinases (MAPKs) require proline at the +1 position of their substrates, while Ime2 prefers arginine. The resurrected common ancestor of CDKs, MAPKs, and Ime2 could phosphorylate substrates with +1 proline or arginine, with preference for proline. This specificity changed to a strong preference for +1 arginine in the lineage leading to Ime2 via an intermediate with equal specificity for proline and arginine. Mutant analysis revealed that a variable residue within the kinase catalytic cleft, DFGx, modulates +1 specificity. Expansion of Ime2 kinase specificity by mutation of this residue did not cause dominant deleterious effects in vivo. Tolerance of cells to new specificities likely enabled the evolutionary divergence of kinases. DOI: http://dx.doi.org/10.7554/eLife.04126.001 PMID:25310241

  4. Estimating Causal Effects with Ancestral Graph Markov Models

    PubMed Central

    Malinsky, Daniel; Spirtes, Peter

    2017-01-01

    We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent variables. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs. Then, for each model in the equivalence class, we perform the appropriate regression (using causal structure information to determine which covariates to include in the regression) to estimate a set of possible causal effects. Our approach is based on the “IDA” procedure of Maathuis et al. (2009), which assumes that all relevant variables have been measured (i.e., no unmeasured confounders). We generalize their work by relaxing this assumption, which is often violated in applied contexts. We validate the performance of our algorithm on simulated data and demonstrate improved precision over IDA when latent variables are present. PMID:28217244

  5. Deep phylogeny, ancestral groups and the four ages of life.

    PubMed

    Cavalier-Smith, Thomas

    2010-01-12

    Organismal phylogeny depends on cell division, stasis, mutational divergence, cell mergers (by sex or symbiogenesis), lateral gene transfer and death. The tree of life is a useful metaphor for organismal genealogical history provided we recognize that branches sometimes fuse. Hennigian cladistics emphasizes only lineage splitting, ignoring most other major phylogenetic processes. Though methodologically useful it has been conceptually confusing and harmed taxonomy, especially in mistakenly opposing ancestral (paraphyletic) taxa. The history of life involved about 10 really major innovations in cell structure. In membrane topology, there were five successive kinds of cell: (i) negibacteria, with two bounding membranes, (ii) unibacteria, with one bounding and no internal membranes, (iii) eukaryotes with endomembranes and mitochondria, (iv) plants with chloroplasts and (v) finally, chromists with plastids inside the rough endoplasmic reticulum. Membrane chemistry divides negibacteria into the more advanced Glycobacteria (e.g. Cyanobacteria and Proteobacteria) with outer membrane lipolysaccharide and primitive Eobacteria without lipopolysaccharide (deserving intenser study). It also divides unibacteria into posibacteria, ancestors of eukaryotes, and archaebacteria-the sisters (not ancestors) of eukaryotes and the youngest bacterial phylum. Anaerobic eobacteria, oxygenic cyanobacteria, desiccation-resistant posibacteria and finally neomura (eukaryotes plus archaebacteria) successively transformed Earth. Accidents and organizational constraints are as important as adaptiveness in body plan evolution.

  6. Deep phylogeny, ancestral groups and the four ages of life

    PubMed Central

    Cavalier-Smith, Thomas

    2010-01-01

    Organismal phylogeny depends on cell division, stasis, mutational divergence, cell mergers (by sex or symbiogenesis), lateral gene transfer and death. The tree of life is a useful metaphor for organismal genealogical history provided we recognize that branches sometimes fuse. Hennigian cladistics emphasizes only lineage splitting, ignoring most other major phylogenetic processes. Though methodologically useful it has been conceptually confusing and harmed taxonomy, especially in mistakenly opposing ancestral (paraphyletic) taxa. The history of life involved about 10 really major innovations in cell structure. In membrane topology, there were five successive kinds of cell: (i) negibacteria, with two bounding membranes, (ii) unibacteria, with one bounding and no internal membranes, (iii) eukaryotes with endomembranes and mitochondria, (iv) plants with chloroplasts and (v) finally, chromists with plastids inside the rough endoplasmic reticulum. Membrane chemistry divides negibacteria into the more advanced Glycobacteria (e.g. Cyanobacteria and Proteobacteria) with outer membrane lipolysaccharide and primitive Eobacteria without lipopolysaccharide (deserving intenser study). It also divides unibacteria into posibacteria, ancestors of eukaryotes, and archaebacteria—the sisters (not ancestors) of eukaryotes and the youngest bacterial phylum. Anaerobic eobacteria, oxygenic cyanobacteria, desiccation-resistant posibacteria and finally neomura (eukaryotes plus archaebacteria) successively transformed Earth. Accidents and organizational constraints are as important as adaptiveness in body plan evolution. PMID:20008390

  7. Ancestral genetic complexity of arachidonic acid metabolism in Metazoa.

    PubMed

    Yuan, Dongjuan; Zou, Qiuqiong; Yu, Ting; Song, Cuikai; Huang, Shengfeng; Chen, Shangwu; Ren, Zhenghua; Xu, Anlong

    2014-09-01

    Eicosanoids play an important role in inducing complex and crucial physiological processes in animals. Eicosanoid biosynthesis in animals is widely reported; however, eicosanoid production in invertebrate tissue is remarkably different to vertebrates and in certain respects remains elusive. We, for the first time, compared the orthologs involved in arachidonic acid (AA) metabolism in 14 species of invertebrates and 3 species of vertebrates. Based on parsimony, a complex AA-metabolic system may have existed in the common ancestor of the Metazoa, and then expanded and diversified through invertebrate lineages. A primary vertebrate-like AA-metabolic system via cyclooxygenase (COX), lipoxygenase (LOX), and cytochrome P450 (CYP) pathways was further identified in the basal chordate, amphioxus. The expression profiling of AA-metabolic enzymes and lipidomic analysis of eicosanoid production in the tissues of amphioxus supported our supposition. Thus, we proposed that the ancestral complexity of AA-metabolic network diversified with the different lineages of invertebrates, adapting with the diversity of body plans and ecological opportunity, and arriving at the vertebrate-like pattern in the basal chordate, amphioxus.

  8. Ancestral role of caudal genes in axis elongation and segmentation.

    PubMed

    Copf, Tijana; Schröder, Reinhard; Averof, Michalis

    2004-12-21

    caudal (cad/Cdx) genes are essential for the formation of posterior structures in Drosophila, Caenorhabditis elegans, and vertebrates. In contrast to Drosophila, the majority of arthropods generate their segments sequentially from a posteriorly located growth zone, a process known as short-germ development. caudal homologues are expressed in the growth zone of diverse short-germ arthropods, but until now their functional role in these animals had not been studied. Here, we use RNA interference to examine the function of caudal genes in two short-germ arthropods, the crustacean Artemia franciscana and the beetle Tribolium castaneum. We show that, in both species, caudal is required for the formation of most body segments. In animals with reduced levels of caudal expression, axis elongation stops, resulting in severe truncations that remove most trunk segments. We also show that caudal function is required for the early phases of segmentation and Hox gene expression. The observed phenotypes suggest that in arthropods caudal had an ancestral role in axis elongation and segmentation, and was required for the formation of most body segments. Similarities to the function of vertebrate Cdx genes in the presomitic mesoderm, from which somites are generated, indicate that this role may also predate the origin of the Bilateria.

  9. Possible rules for the ancestral origin of Hox gene collinearity.

    PubMed

    Gaunt, Stephen J; Gaunt, Alexander L

    2016-12-07

    The Hox gene cluster is believed to have formed from a single ProtoHox gene by repeated cycles of the following events: tandem gene duplication, mutation to generate a new expression boundary along the embryonic axis, and acquisition of a new Hox patterning function. The Hox cluster in Bilateria evolved in compliance with the so-called collinearity rule. That is, the order of the genes along the chromosome corresponds with the order of their embryonic expression domains along the head-tail axis. Gaunt (2015) suggested that collinearity may have arisen as a mechanism to minimise the incidence of boundaries between active and inactive genes within the Hox cluster. We now attempt to clarify the model by presenting it in the form of three rules: 1) no two Hox genes may persist in the same cluster with the same anterior boundary of activity in the same tissue; 2) an inactive Hox gene must not be flanked by two active Hox genes; 3) an active Hox gene must not be flanked by two inactive genes. We provide evidence and illustrative computer simulations to show that these rules, which can apply only to partially overlapping patterns of Hox activity, may account for the ancestral origin of Hox gene collinearity.

  10. Ancestral TSH mechanism signals summer in a photoperiodic mammal.

    PubMed

    Hanon, Elodie A; Lincoln, Gerald A; Fustin, Jean-Michel; Dardente, Hugues; Masson-Pévet, Mireille; Morgan, Peter J; Hazlerigg, David G

    2008-08-05

    In mammals, day-length-sensitive (photoperiodic) seasonal breeding cycles depend on the pineal hormone melatonin, which modulates secretion of reproductive hormones by the anterior pituitary gland [1]. It is thought that melatonin acts in the hypothalamus to control reproduction through the release of neurosecretory signals into the pituitary portal blood supply, where they act on pituitary endocrine cells [2]. Contrastingly, we show here that during the reproductive response of Soay sheep exposed to summer day lengths, the reverse applies: Melatonin acts directly on anterior-pituitary cells, and these then relay the photoperiodic message back into the hypothalamus to control neuroendocrine output. The switch to long days causes melatonin-responsive cells in the pars tuberalis (PT) of the anterior pituitary to increase production of thyrotrophin (TSH). This acts locally on TSH-receptor-expressing cells in the adjacent mediobasal hypothalamus, leading to increased expression of type II thyroid hormone deiodinase (DIO2). DIO2 initiates the summer response by increasing hypothalamic tri-iodothyronine (T3) levels. These data and recent findings in quail [3] indicate that the TSH-expressing cells of the PT play an ancestral role in seasonal reproductive control in vertebrates. In mammals this provides the missing link between the pineal melatonin signal and thyroid-dependent seasonal biology.

  11. Allatotropin: An Ancestral Myotropic Neuropeptide Involved in Feeding

    PubMed Central

    Alzugaray, María Eugenia; Adami, Mariana Laura; Diambra, Luis Anibal; Hernandez-Martinez, Salvador; Damborenea, Cristina; Noriega, Fernando Gabriel; Ronderos, Jorge Rafael

    2013-01-01

    Background Cell-cell interactions are a basic principle for the organization of tissues and organs allowing them to perform integrated functions and to organize themselves spatially and temporally. Peptidic molecules secreted by neurons and epithelial cells play fundamental roles in cell-cell interactions, acting as local neuromodulators, neurohormones, as well as endocrine and paracrine messengers. Allatotropin (AT) is a neuropeptide originally described as a regulator of Juvenile Hormone synthesis, which plays multiple neural, endocrine and myoactive roles in insects and other organisms. Methods A combination of immunohistochemistry using AT-antibodies and AT-Qdot nanocrystal conjugates was used to identify immunoreactive nerve cells containing the peptide and epithelial-muscular cells targeted by AT in Hydra plagiodesmica. Physiological assays using AT and AT- antibodies revealed that while AT stimulated the extrusion of the hypostome in a dose-response fashion in starved hydroids, the activity of hypostome in hydroids challenged with food was blocked by treatments with different doses of AT-antibodies. Conclusions AT antibodies immunolabeled nerve cells in the stalk, pedal disc, tentacles and hypostome. AT-Qdot conjugates recognized epithelial-muscular cell in the same tissues, suggesting the existence of anatomical and functional relationships between these two cell populations. Physiological assays indicated that the AT-like peptide is facilitating food ingestion. Significance Immunochemical, physiological and bioinformatics evidence advocates that AT is an ancestral neuropeptide involved in myoregulatory activities associated with meal ingestion and digestion. PMID:24143240

  12. Large-scale comparative phenotypic and genomic analyses reveal ecological preferences of shewanella species and identify metabolic pathways conserved at the genus level.

    PubMed

    Rodrigues, Jorge L M; Serres, Margrethe H; Tiedje, James M

    2011-08-01

    The use of comparative genomics for the study of different microbiological species has increased substantially as sequence technologies become more affordable. However, efforts to fully link a genotype to its phenotype remain limited to the development of one mutant at a time. In this study, we provided a high-throughput alternative to this limiting step by coupling comparative genomics to the use of phenotype arrays for five sequenced Shewanella strains. Positive phenotypes were obtained for 441 nutrients (C, N, P, and S sources), with N-based compounds being the most utilized for all strains. Many genes and pathways predicted by genome analyses were confirmed with the comparative phenotype assay, and three degradation pathways believed to be missing in Shewanella were confirmed as missing. A number of previously unknown gene products were predicted to be parts of pathways or to have a function, expanding the number of gene targets for future genetic analyses. Ecologically, the comparative high-throughput phenotype analysis provided insights into niche specialization among the five different strains. For example, Shewanella amazonensis strain SB2B, isolated from the Amazon River delta, was capable of utilizing 60 C compounds, whereas Shewanella sp. strain W3-18-1, isolated from deep marine sediment, utilized only 25 of them. In spite of the large number of nutrient sources yielding positive results, our study indicated that except for the N sources, they were not sufficiently informative to predict growth phenotypes from increasing evolutionary distances. Our results indicate the importance of phenotypic evaluation for confirming genome predictions. This strategy will accelerate the functional discovery of genes and provide an ecological framework for microbial genome sequencing projects.

  13. Characterization of rubber tree microRNA in phytohormone response using large genomic DNA libraries, promoter sequence and gene expression analysis.

    PubMed

    Kanjanawattanawong, Supanath; Tangphatsornruang, Sithichoke; Triwitayakorn, Kanokporn; Ruang-areerate, Panthita; Sangsrakru, Duangjai; Poopear, Supannee; Somyong, Suthasinee; Narangajavana, Jarunya

    2014-10-01

    The para rubber tree is the most widely cultivated tree species for producing natural rubber (NR) latex. Unfortunately, rubber tree characteristics such as a long life cycle, heterozygous genetic backgrounds, and poorly understood genetic profiles are the obstacles to breeding new rubber tree varieties, such as those with improved NR yields. Recent evidence has revealed the potential importance of controlling microRNA (miRNA) decay in some aspects of NR regulation. To gain a better understanding of miRNAs and their relationship with rubber tree gene regulation networks, large genomic DNA insert-containing libraries were generated to complement the incomplete draft genome sequence and applied as a new powerful tool to predict a function of interested genes. Bacterial artificial chromosome and fosmid libraries, containing a total of 120,576 clones with an average insert size of 43.35 kb, provided approximately 2.42 haploid genome equivalents of coverage based on the estimated 2.15 gb rubber tree genome. Based on these library sequences, the precursors of 1 member of rubber tree-specific miRNAs and 12 members of conserved miRNAs were successfully identified. A panel of miRNAs was characterized for phytohormone response by precisely identifying phytohormone-responsive motifs in their promoter sequences. Furthermore, the quantitative real-time PCR on ethylene stimulation of rubber trees was performed to demonstrate that the miR2118, miR159, miR164 and miR166 are responsive to ethylene, thus confirmed the prediction by genomic DNA analysis. The cis-regulatory elements identified in the promoter regions of these miRNA genes help augment our understanding of miRNA gene regulation and provide a foundation for further investigation of the regulation of rubber tree miRNAs.

  14. Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research collaboration.

    PubMed Central

    Chokshi, Dave A.; Parker, Michael; Kwiatkowski, Dominic P.

    2006-01-01

    Genomic epidemiology is a field of research that seeks to improve the prevention and management of common diseases through an understanding of their molecular origins. It involves studying thousands of individuals, often from different populations, with exacting techniques. The scale and complexity of such research has required the formation of research consortia. Members of these consortia need to agree on policies for managing shared resources and handling genetic data. Here we consider data-sharing and intellectual property policies for an international research consortium working on the genomic epidemiology of malaria. We outline specific guidelines governing how samples and data are transferred among its members; how results are released into the public domain; when to seek protection for intellectual property; and how intellectual property should be managed. We outline some pragmatic solutions founded on the basic principles of promoting innovation and access. PMID:16710548

  15. A large-scale zebrafish gene knockout resource for the genome-wide study of gene function

    PubMed Central

    Varshney, Gaurav K.; Lu, Jing; Gildea, Derek E.; Huang, Haigen; Pei, Wuhong; Yang, Zhongan; Huang, Sunny C.; Schoenfeld, David; Pho, Nam H.; Casero, David; Hirase, Takashi; Mosbrook-Davis, Deborah; Zhang, Suiyuan; Jao, Li-En; Zhang, Bo; Woods, Ian G.; Zimmerman, Steven; Schier, Alexander F.; Wolfsberg, Tyra G.; Pellegrini, Matteo; Burgess, Shawn M.; Lin, Shuo

    2013-01-01

    With the completion of the zebrafish genome sequencing project, it becomes possible to analyze the function of zebrafish genes in a systematic way. The first step in such an analysis is to inactivate each protein-coding gene by targeted or random mutation. Here we describe a streamlined pipeline using proviral insertions coupled with high-throughput sequencing and mapping technologies to widely mutagenize genes in the zebrafish genome. We also report the first 6144 mutagenized and archived F1's predicted to carry up to 3776 mutations in annotated genes. Using in vitro fertilization, we have rescued and characterized ∼0.5% of the predicted mutations, showing mutation efficacy and a variety of phenotypes relevant to both developmental processes and human genetic diseases. Mutagenized fish lines are being made freely available to the public through the Zebrafish International Resource Center. These fish lines establish an important milestone for zebrafish genetics research and should greatly facilitate systematic functional studies of the vertebrate genome. PMID:23382537

  16. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    SciTech Connect

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.

  17. Extensive and biased intergenomic nonreciprocal DNA exchanges shaped a nascent polyploid genome, Gossypium (cotton)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cultivated cotton is composed of a tetraploid genome derived from two ancestral genomes that are related but divergent from each other. The “A” genome is derived from a cotton species that is used for low quality spinnable-fiber production in low production areas and has an African origin. The “D”...

  18. Genomics of Volvocine Algae

    PubMed Central

    Umen, James G.; Olson, Bradley J.S.C.

    2015-01-01

    Volvocine algae are a group of chlorophytes that together comprise a unique model for evolutionary and developmental biology. The species Chlamydomonas reinhardtii and Volvox carteri represent extremes in morphological diversity within the Volvocine clade. Chlamydomonas is unicellular and reflects the ancestral state of the group, while Volvox is multicellular and has evolved numerous innovations including germ-soma differentiation, sexual dimorphism, and complex morphogenetic patterning. The Chlamydomonas genome sequence has shed light on several areas of eukaryotic cell biology, metabolism and evolution, while the Volvox genome sequence has enabled a comparison with Chlamydomonas that reveals some of the underlying changes that enabled its transition to multicellularity, but also underscores the subtlety of this transition. Many of the tools and resources are in place to further develop Volvocine algae as a model for evolutionary genomics. PMID:25883411

  19. Conservation anchors in the vertebrate genome

    PubMed Central

    Aloni, Ronny; Lancet, Doron

    2005-01-01

    Genomic segments that do not code for proteins yet show high conservation among vertebrates have recently been identified by various computational methodologies. We refer to them as ANCORs (ancestral non-coding conserved regions). The frequency of individual ANCORs within the genome, along with their (correlated) inter-species identity scores, helps in assessing the probability that they function in transcription regulation or RNA coding. PMID:15998454

  20. Ancestral Heterogeneity in a Bi-ethnic Stroke Population

    PubMed Central

    Lisabeth, Lynda D; Morgenstern, Lewis B; Burke, David T; Sun, Yan V; Long, Jeffrey C

    2011-01-01

    SUMMARY To test for and characterize heterogeneity in ancestral contributions to individuals among a population of Mexican American (MA) and non-Hispanic white (NHW) stroke/TIA cases, data from a community-based stroke surveillance study in south Texas were used. Strokes/TIA cases were identified (2004–2006) with a random sample asked to provide blood. Race-ethnicity was self-reported. Thirty-three ancestry informative markers (AIMs) were genotyped and individual genetic admixture estimated using maximum likelihood methods. Three hypotheses were tested for each MA using likelihood ratio tests: 1) H0: μi=0 (100% Native American), 2) H0: μi=1.00 (100% European), 3) H0: μi=0.59 (average European). Among 154 self-identified MAs, estimated European ancestry varied from 0.26–0.98, with an average of 0.59(se=0.014). We rejected hypothesis 1 for every MA and rejected hypothesis 2 for all but two MAs. We rejected hypothesis 3 for 40 MAs (20<59%, 20>59%). Among 84 self-identified NHWs, the estimated fraction of European ancestry ranged from 0.83–1.0, with an average of 0.97 (se=0.014). Self-identified MAs, and to a lesser extent NHWs, from an established bi-ethnic community were heterogeneous with respect to genetic admixture. Researchers should not use simple race-ethnic categories as proxies for homogeneous genetic populations when conducting gene mapping and disease association studies in multi-ethnic populations. PMID:21668907

  1. The Microcephalin Ancestral Allele in a Neanderthal Individual

    PubMed Central

    Lari, Martina; Rizzi, Ermanno; Milani, Lucio; Corti, Giorgio; Balsamo, Carlotta; Vai, Stefania; Catalano, Giulio; Pilli, Elena; Longo, Laura; Condemi, Silvana; Giunti, Paolo; Hänni, Catherine; De Bellis, Gianluca; Orlando, Ludovic; Barbujani, Guido; Caramelli, David

    2010-01-01

    Background The high frequency (around 0.70 worlwide) and the relatively young age (between 14,000 and 62,000 years) of a derived group of haplotypes, haplogroup D, at the microcephalin (MCPH1) locus led to the proposal that haplogroup D originated in a human lineage that separated from modern humans >1 million years ago, evolved under strong positive selection, and passed into the human gene pool by an episode of admixture circa 37,000 years ago. The geographic distribution of haplogroup D, with marked differences between Africa and Eurasia, suggested that the archaic human form admixing with anatomically modern humans might have been Neanderthal. Methodology/Principal Findings Here we report the first PCR amplification and high- throughput sequencing of nuclear DNA at the microcephalin (MCPH1) locus from Neanderthal individual from Mezzena Rockshelter (Monti Lessini, Italy). We show that a well-preserved Neanderthal fossil dated at approximately 50,000 years B.P., was homozygous for the ancestral, non-D, allele. The high yield of Neanderthal mtDNA sequences of the studied specimen, the pattern of nucleotide misincorporation among sequences consistent with post-mortem DNA damage and an accurate control of the MCPH1 alleles in all personnel that manipulated the sample, make it extremely unlikely that this result might reflect modern DNA contamination. Conclusions/Significance The MCPH1 genotype of the Monti Lessini (MLS) Neanderthal does not prove that there was no interbreeding between anatomically archaic and modern humans in Europe, but certainly shows that speculations on a possible Neanderthal origin of what is now the most common MCPH1 haplogroup are not supported by empirical evidence from ancient DNA. PMID:20498832

  2. On the tomato trail: in search of ancestral roots.

    PubMed

    Estabrook, Barry

    2010-01-01

    A profile of Roger Chetelat, the director of the C.M. Rick Tomato Genetics Resource Center at the University of California, Davis. Chetelat maintains one of the largest collections of tomato seeds in the world. Many of those seeds come from wild tomato species that Chetelat and his associates collect on field research trips to the dry coastal areas of Chile, Peru, and Ecuador. Wild tomatoes are tough, versatile organisms that have evolved resistance to virtually all common tomato diseases and pests and stubbornly tolerate extreme environmental conditions. Some boast extraordinarily high levels of sugars, beta carotene, vitamin C, lycopene, and antioxidants. Chetelat has dedicated his career to finding and preserving these genetic riches. Modern cultivated tomatoes are a frail, inbred lot. They all trace their origins to a single, wild tomato plant that underwent a random mutation sometime in prehistory. Because of this genetic fluke, that plant's fruits were plump, juicy, and many, many times larger than the output of its progenitors. Offspring from that tomato were taken away from the Andes and domesticated in what is present-day Mexico, becoming severed from their wild ancestors and the vast pool of genetic diversity that tomatoes had evolved over the millennia. Botanists call this a “bottleneck.” It leaves subsequent generations susceptible to disease and unable to adjust to rapid climate changes. The stored wild seeds at the Rick Center enable plant breeders to re-incorporate desirable wild traits into new tomato varieties, literally reconnecting them to their ancestral roots, ensuring that this vast reservoir of genetic diversity will be available when it is needed.

  3. Plausibility of inferred ancestral phenotypes and the evaluation of alternative models of limb evolution in scincid lizards.

    PubMed

    Skinner, Adam; Lee, Michael S Y

    2010-06-23

    Phylogenetic approaches to inferring ancestral character states are becoming increasingly sophisticated; however, the potential remains for available methods to yield strongly supported but inaccurate ancestral state estimates. The consistency of ancestral states inferred for two or more characters affords a useful criterion for evaluating ancestral trait reconstructions. Ancestral state estimates for multiple characters that entail plausible phenotypes when considered together may reasonably be assumed to be reliable. However, the accuracy of inferred ancestral states for one or more characters may be questionable where combined reconstructions imply implausible phenotypes for a proportion of internal nodes. This criterion for assessing reconstructed ancestral states is applied here in evaluating inferences of ancestral limb morphology in the scincid lizard clade Lerista. Ancestral numbers of digits for the manus and pes inferred assuming the models that best fit the data entail ancestral digit configurations for many nodes that differ fundamentally from configurations observed among known species. However, when an alternative model is assumed for the pes, inferred ancestral digit configurations are invariably represented among observed phenotypes. This indicates that a suboptimal model for the pes (and not the model providing the best fit to the data) yields accurate ancestral state estimates.

  4. The Genome Sequence of Bacillus cereus ATCC 10987 Reveals Metabolic Adaptations and a Large Plasmid Related to Bacillus anthracis pXO1

    DTIC Science & Technology

    2004-01-01

    R.L. and Waites,K.B. (2003) Bacillus cereus bacteremia in a preterm neonate. J. Clin. Microbiol., 41, 3441±3444. 9. Ginsburg,A.S., Salazar,L.G., True... bacteremia and pneumonia due to Bacillus cereus . J. Clin. Microbiol., 35, 504±507. 12. Okinaka,R., Cloud,K., Hampton,O., Hoffmaster,A., Hill,K., Keim,P...The genome sequence of Bacillus cereus ATCC 10987 reveals metabolic adaptations and a large plasmid related to Bacillus anthracis pXO1 David A. Rasko

  5. Large genomic mutations within the ATM gene detected by MLPA, including a duplication of 41 kb from exon 4 to 20.

    PubMed

    Cavalieri, Simona; Funaro, Ada; Pappi, Patrizia; Migone, Nicola; Gatti, Richard A; Brusco, Alfredo

    2008-01-01

    Mutation detection remains problematic for large genes, primarily because PCR-based methodology fails to detect heterozygous deletions and any duplication. In the ATM gene only a handful of multi-exon deletions have been described to date, and this type of mutation has been considered rare. To address this issue we tested a new MLPA (Multiplex Ligation Probe Amplification) kit that covers 33 of the 66 ATM exons, using for controls two previously characterized genomic deletions in addition to three A-T patients, taken from a survey of nine, who had missing four mutations unidentified after conventional mutation screening. We identified for the first time: 1) a approximately 41 kb genomic duplication spanning exons 4-20 (c.-30_2816dup41kb)(a.k.a., ATM dup 41 kb); 2) a novel genomic deletion including exon 31, and 3) in hemizygosis a point mutation in the non-deleted exon 31. In this study we extended mutation detection to nine new Italian A-T patients, using a combined approach of haplotype analysis, DHPLC and MLPA. Overall we achieved a mutation detection rate of >97%, and can now define a spectrum of ATM mutations based on twenty-one consecutive Italian families with A-T.

  6. SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations

    PubMed Central

    Dereeper, Alexis; Homa, Felix; Andres, Gwendoline; Sempere, Guilhem; Sarah, Gautier; Hueber, Yann; Dufayard, Jean-François; Ruiz, Manuel

    2015-01-01

    SNiPlay is a web-based tool for detection, management and analysis of genetic variants including both single nucleotide polymorphisms (SNPs) and InDels. Version 3 now extends functionalities in order to easily manage and exploit SNPs derived from next generation sequencing technologies, such as GBS (genotyping by sequencing), WGRS (whole gre-sequencing) and RNA-Seq technologies. Based on the standard VCF (variant call format) format, the application offers an intuitive interface for filtering and comparing polymorphisms using user-defined sets of individuals and then establishing a reliable genotyping data matrix for further analyses. Namely, in addition to the various scaled-up analyses allowed by the application (genomic annotation of SNP, diversity analysis, haplotype reconstruction and network, linkage disequilibrium), SNiPlay3 proposes new modules for GWAS (genome-wide association studies), population stratification, distance tree analysis and visualization of SNP density. Additionally, we developed a suite of Galaxy wrappers for each step of the SNiPlay3 process, so that the complete pipeline can also be deployed on a Galaxy instance using the Galaxy ToolShed procedure and then be computed as a Galaxy workflow. SNiPlay is accessible at http://sniplay.southgreen.fr. PMID:26040700

  7. SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations.

    PubMed

    Dereeper, Alexis; Homa, Felix; Andres, Gwendoline; Sempere, Guilhem; Sarah, Gautier; Hueber, Yann; Dufayard, Jean-François; Ruiz, Manuel

    2015-07-01

    SNiPlay is a web-based tool for detection, management and analysis of genetic variants including both single nucleotide polymorphisms (SNPs) and InDels. Version 3 now extends functionalities in order to easily manage and exploit SNPs derived from next generation sequencing technologies, such as GBS (genotyping by sequencing), WGRS (whole gre-sequencing) and RNA-Seq technologies. Based on the standard VCF (variant call format) format, the application offers an intuitive interface for filtering and comparing polymorphisms using user-defined sets of individuals and then establishing a reliable genotyping data matrix for further analyses. Namely, in addition to the various scaled-up analyses allowed by the application (genomic annotation of SNP, diversity analysis, haplotype reconstruction and network, linkage disequilibrium), SNiPlay3 proposes new modules for GWAS (genome-wide association studies), population stratification, distance tree analysis and visualization of SNP density. Additionally, we developed a suite of Galaxy wrappers for each step of the SNiPlay3 process, so that the complete pipeline can also be deployed on a Galaxy instance using the Galaxy ToolShed procedure and then be computed as a Galaxy workflow. SNiPlay is accessible at http://sniplay.southgreen.fr.

  8. Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies.

    PubMed

    Rivadeneira, Fernando; Styrkársdottir, Unnur; Estrada, Karol; Halldórsson, Bjarni V; Hsu, Yi-Hsiang; Richards, J Brent; Zillikens, M Carola; Kavvoura, Fotini K; Amin, Najaf; Aulchenko, Yurii S; Cupples, L Adrienne; Deloukas, Panagiotis; Demissie, Serkalem; Grundberg, Elin; Hofman, Albert; Kong, Augustine; Karasik, David; van Meurs, Joyce B; Oostra, Ben; Pastinen, Tomi; Pols, Huibert A P; Sigurdsson, Gunnar; Soranzo, Nicole; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Williams, Frances M K; Wilson, Scott G; Zhou, Yanhua; Ralston, Stuart H; van Duijn, Cornelia M; Spector, Timothy; Kiel, Douglas P; Stefansson, Kari; Ioannidis, John P A; Uitterlinden, André G

    2009-11-01

    Bone mineral density (BMD) is a heritable complex trait used in the clinical diagnosis of osteoporosis and the assessment of fracture risk. We performed meta-analysis of five genome-wide association studies of femoral neck and lumbar spine BMD in 19,195 subjects of Northern European descent. We identified 20 BMD loci that reached genome-wide significance (GWS; P < 5 x 10(-8)), of which 13 map to regions not previously associated with this trait: 1p31.3 (GPR177), 2p21 (SPTBN1), 3p22 (CTNNB1), 4q21.1 (MEPE), 5q14 (MEF2C), 7p14 (STARD3NL), 7q21.3 (FLJ42280), 11p11.2 (LRP4, ARHGAP1, F2), 11p14.1 (DCDC5), 11p15 (SOX6), 16q24 (FOXL1), 17q21 (HDAC5) and 17q12 (CRHR1). The meta-analysis also confirmed at GWS level seven known BMD loci on 1p36 (ZBTB40), 6q25 (ESR1), 8q24 (TNFRSF11B), 11q13.4 (LRP5), 12q13 (SP7), 13q14 (TNFSF11) and 18q21 (TNFRSF11A). The many SNPs associated with BMD map to genes in signaling pathways with relevance to bone metabolism and highlight the complex genetic architecture that underlies osteoporosis and variation in BMD.

  9. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease.

    PubMed

    Nalls, Mike A; Pankratz, Nathan; Lill, Christina M; Do, Chuong B; Hernandez, Dena G; Saad, Mohamad; DeStefano, Anita L; Kara, Eleanna; Bras, Jose; Sharma, Manu; Schulte, Claudia; Keller, Margaux F; Arepalli, Sampath; Letson, Christopher; Edsall, Connor; Stefansson, Hreinn; Liu, Xinmin; Pliner, Hannah; Lee, Joseph H; Cheng, Rong; Ikram, M Arfan; Ioannidis, John P A; Hadjigeorgiou, Georgios M; Bis, Joshua C; Martinez, Maria; Perlmutter, Joel S; Goate, Alison; Marder, Karen; Fiske, Brian; Sutherland, Margaret; Xiromerisiou, Georgia; Myers, Richard H; Clark, Lorraine N; Stefansson, Kari; Hardy, John A; Heutink, Peter; Chen, Honglei; Wood, Nicholas W; Houlden, Henry; Payami, Haydeh; Brice, Alexis; Scott, William K; Gasser, Thomas; Bertram, Lars; Eriksson, Nicholas; Foroud, Tatiana; Singleton, Andrew B

    2014-09-01

    We conducted a meta-analysis of Parkinson's disease genome-wide association studies using a common set of 7,893,274 variants across 13,708 cases and 95,282 controls. Twenty-six loci were identified as having genome-wide significant association; these and 6 additional previously reported loci were then tested in an independent set of 5,353 cases and 5,551 controls. Of the 32 tested SNPs, 24 replicated, including 6 newly identified loci. Conditional analyses within loci showed that four loci, including GBA, GAK-DGKQ, SNCA and the HLA region, contain a secondary independent risk variant. In total, we identified and replicated 28 independent risk variants for Parkinson's disease across 24 loci. Although the effect of each individual locus was small, risk profile analysis showed substantial cumulative risk in a comparison of the highest and lowest quintiles of genetic risk (odds ratio (OR) = 3.31, 95% confidence interval (CI) = 2.55-4.30; P = 2 × 10(-16)). We also show six risk loci associated with proximal gene expression or DNA methylation.

  10. A large genomic island allows Neisseria meningitidis to utilize propionic acid, with implications for colonization of the human nasopharynx.

    PubMed

    Catenazzi, Maria Chiara E; Jones, Helen; Wallace, Iain; Clifton, Jacqueline; Chong, James P J; Jackson, Matthew A; Macdonald, Sandy; Edwards, James; Moir, James W B

    2014-07-01

    Neisseria meningitidis is an important human pathogen that is capable of killing within hours of infection. Its normal habitat is the nasopharynx of adult humans. Here we identify a genomic island (the prp gene cluster) in N. meningitidis that enables this species to utilize propionic acid as a supplementary carbon source during growth, particularly under nutrient poor growth conditions. The prp gene cluster encodes enzymes for a methylcitrate cycle. Novel aspects of the methylcitrate cycle in N. meningitidis include a propionate kinase which was purified and characterized, and a putative propionate transporter. This genomic island is absent from the close relative of N. meningitidis, the commensal Neisseria lactamica, which chiefly colonizes infants not adults. We reason that the possession of the prp genes provides a metabolic advantage to N. meningitidis in the adult oral cavity, which is rich in propionic acid-generating bacteria. Data from classical microbiological and sequence-based microbiome studies provide several lines of supporting evidence that N. meningitidis colonization is correlated with propionic acid generating bacteria, with a strong correlation between prp-containing Neisseria and propionic acid generating bacteria from the genus Porphyromonas, and that this may explain adolescent/adult colonization by N. meningitidis.

  11. Large scale genomic analysis shows no evidence for pathogen adaptation between the blood and cerebrospinal fluid niches during bacterial meningitis

    PubMed Central

    Lees, John A.; Kremer, Philip H. C.; Manso, Ana S.; Croucher, Nicholas J.; Ferwerda, Bart; Serón, Mercedes Valls; Oggioni, Marco R.; Parkhill, Julian; Brouwer, Matthijs C.; van der Ende, Arie; van de Beek, Diederik

    2017-01-01

    Recent studies have provided evidence for rapid pathogen genome diversification, some of which could potentially affect the course of disease. We have previously described such variation seen between isolates infecting the blood and cerebrospinal fluid (CSF) of a single patient during a case of bacterial meningitis. Here, we performed whole-genome sequencing of paired isolates from the blood and CSF of 869 meningitis patients to determine whether such variation frequently occurs between these two niches in cases of bacterial meningitis. Using a combination of reference-free variant calling approaches, we show that no genetic adaptation occurs in either invaded niche during bacterial meningitis for two major pathogen species, Streptococcus pneumoniae and Neisseria meningitidis. This study therefore shows that the bacteria capable of causing meningitis are already able to do this upon entering the blood, and no further sequence chang