Extreme-Scale De Novo Genome Assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Georganas, Evangelos; Hofmeyr, Steven; Egan, Rob
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and themore » large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.« less
Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade
Boothby, Thomas C.; Tenlen, Jennifer R.; Smith, Frank W.; Wang, Jeremy R.; Patanella, Kiera A.; Osborne Nishimura, Erin; Tintori, Sophia C.; Li, Qing; Jones, Corbin D.; Yandell, Mark; Glasscock, Jarret; Goldstein, Bob
2015-01-01
Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade, Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of the H. dujardini genome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes. PMID:26598659
Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade.
Boothby, Thomas C; Tenlen, Jennifer R; Smith, Frank W; Wang, Jeremy R; Patanella, Kiera A; Nishimura, Erin Osborne; Tintori, Sophia C; Li, Qing; Jones, Corbin D; Yandell, Mark; Messina, David N; Glasscock, Jarret; Goldstein, Bob
2015-12-29
Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade, Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of the H. dujardini genome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes.
Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E
2017-03-06
Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.
USDA-ARS?s Scientific Manuscript database
Eukaryotic genomes vary in size over five orders of magnitude ranging from microsporidia (~2.9Mb) to the lung-fish (~1.2Tb). This extraordinary variation is largely a result of the proliferation of mobile DNA elements also referred to as “genomic parasites.” The constraints on genome size may be imp...
Gusev, Oleg; Suetsugu, Yoshitaka; Cornette, Richard; Kawashima, Takeshi; Logacheva, Maria D.; Kondrashov, Alexey S.; Penin, Aleksey A.; Hatanaka, Rie; Kikuta, Shingo; Shimura, Sachiko; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Shagimardanova, Elena; Alexeev, Dmitry; Govorun, Vadim; Wisecaver, Jennifer; Mikheyev, Alexander; Koyanagi, Ryo; Fujie, Manabu; Nishiyama, Tomoaki; Shigenobu, Shuji; Shibata, Tomoko F.; Golygina, Veronika; Hasebe, Mitsuyasu; Okuda, Takashi; Satoh, Nori; Kikawada, Takahiro
2014-01-01
Anhydrobiosis represents an extreme example of tolerance adaptation to water loss, where an organism can survive in an ametabolic state until water returns. Here we report the first comparative analysis examining the genomic background of extreme desiccation tolerance, which is exclusively found in larvae of the only anhydrobiotic insect, Polypedilum vanderplanki. We compare the genomes of P. vanderplanki and a congeneric desiccation-sensitive midge P. nubifer. We determine that the genome of the anhydrobiotic species specifically contains clusters of multi-copy genes with products that act as molecular shields. In addition, the genome possesses several groups of genes with high similarity to known protective proteins. However, these genes are located in distinct paralogous clusters in the genome apart from the classical orthologues of the corresponding genes shared by both chironomids and other insects. The transcripts of these clustered paralogues contribute to a large majority of the mRNA pool in the desiccating larvae and most likely define successful anhydrobiosis. Comparison of expression patterns of orthologues between two chironomid species provides evidence for the existence of desiccation-specific gene expression systems in P. vanderplanki. PMID:25216354
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A
2012-01-03
Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
Interpreting Mammalian Evolution using Fugu Genome Comparisons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stubbs, L; Ovcharenko, I; Loots, G G
2004-04-02
Comparative sequence analysis of the human and the pufferfish Fugu rubripes (fugu) genomes has revealed several novel functional coding and noncoding regions in the human genome. In particular, the fugu genome has been extremely valuable for identifying transcriptional regulatory elements in human loci harboring unusually high levels of evolutionary conservation to rodent genomes. In such regions, the large evolutionary distance between human and fishes provides an additional filter through which functional noncoding elements can be detected with high efficiency.
USDA-ARS?s Scientific Manuscript database
Meishan is a famous Chinese indigenous pig breed known for its extremely high fecundity. To explore if Meishan has unique evolutionary process and genome characteristics differing from other pig breeds, we systematically analyzed its genetic divergence, and demographic history by large-scale reseque...
Selective significance of genome size in a plant community with heavy metal pollution.
Vidic, T; Greilhuber, J; Vilhar, B; Dermastia, M
2009-09-01
In eukaryotes, nuclear genome sizes vary by more than five orders of magnitude. This variation is not related to organismal complexity, and its origin and biological significance are still disputed. One of the open questions is whether genome size has an adaptive role. We tested the hypothesis that genome size has selective significance, using five grassland communities occurring on a gradient of metal pollution of the soil as a model. We detected a negative correlation between the concentration of contaminating metals in the soil and the number of vascular plant species. Analysis of genome sizes of 70 herbaceous dicot perennial species occurring on the investigated plots revealed a negative correlation between the concentration of contaminating metals in the soil and the proportion of species with large genomes in plant communities. Consistent with the hypothesis, these results show that species with large genomes are at selective disadvantage in extreme environmental conditions.
Xiao, Shijun; Li, Jiongtang; Ma, Fengshou; Fang, Lujing; Xu, Shuangbin; Chen, Wei; Wang, Zhi Yong
2015-09-03
Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner.
Aswad, Amr; Katzourakis, Aris
2014-01-01
Herpesviridae is a diverse family of large and complex pathogens whose genomes are extremely difficult to sequence. This is particularly true for clinical samples, and if the virus, host, or both genomes are being sequenced for the first time. Although herpesviruses are known to occasionally integrate in host genomes, and can also be inherited in a Mendelian fashion, they are notably absent from the genomic fossil record comprised of endogenous viral elements (EVEs). Here, we combine paleovirological and metagenomic approaches to both explore the constituent viral diversity of mammalian genomes and search for endogenous herpesviruses. We describe the first endogenous herpesvirus from the genome of the Philippine tarsier, belonging to the Roseolovirus genus, and characterize its highly defective genome that is integrated and flanked by unambiguous host DNA. From a draft assembly of the aye-aye genome, we use bioinformatic tools to reveal over 100,000 bp of a novel rhadinovirus that is the first lemur gammaherpesvirus, closely related to Kaposi's sarcoma-associated virus. We also identify 58 genes of Pan paniscus lymphocryptovirus 1, the bonobo equivalent of human Epstein-Barr virus. For each of the viruses, we postulate gene function via comparative analysis to known viral relatives. Most notably, the evidence from gene content and phylogenetics suggests that the aye-aye sequences represent the most basal known rhadinovirus, and indicates that tumorigenic herpesviruses have been infecting primates since their emergence in the late Cretaceous. Overall, these data show that a genomic fossil record of herpesviruses exists despite their extremely large genomes, and expands the known diversity of Herpesviridae, which will aid the characterization of pathogenesis. Our analytical approach illustrates the benefit of intersecting evolutionary approaches with metagenomics, genetics and paleovirology. PMID:24945689
Whole genome sequences of a male and female supercentenarian, ages greater than 114 years.
Sebastiani, Paola; Riva, Alberto; Montano, Monty; Pham, Phillip; Torkamani, Ali; Scherba, Eugene; Benson, Gary; Milton, Jacqueline N; Baldwin, Clinton T; Andersen, Stacy; Schork, Nicholas J; Steinberg, Martin H; Perls, Thomas T
2011-01-01
Supercentenarians (age 110+ years old) generally delay or escape age-related diseases and disability well beyond the age of 100 and this exceptional survival is likely to be influenced by a genetic predisposition that includes both common and rare genetic variants. In this report, we describe the complete genomic sequences of male and female supercentenarians, both age >114 years old. We show that: (1) the sequence variant spectrum of these two individuals' DNA sequences is largely comparable to existing non-supercentenarian genomes; (2) the two individuals do not appear to carry most of the well-established human longevity enabling variants already reported in the literature; (3) they have a comparable number of known disease-associated variants relative to most human genomes sequenced to-date; (4) approximately 1% of the variants these individuals possess are novel and may point to new genes involved in exceptional longevity; and (5) both individuals are enriched for coding variants near longevity-associated variants that we discovered through a large genome-wide association study. These analyses suggest that there are both common and rare longevity-associated variants that may counter the effects of disease-predisposing variants and extend lifespan. The continued analysis of the genomes of these and other rare individuals who have survived to extremely old ages should provide insight into the processes that contribute to the maintenance of health during extreme aging.
Whole Genome Sequences of a Male and Female Supercentenarian, Ages Greater than 114 Years
Sebastiani, Paola; Riva, Alberto; Montano, Monty; Pham, Phillip; Torkamani, Ali; Scherba, Eugene; Benson, Gary; Milton, Jacqueline N.; Baldwin, Clinton T.; Andersen, Stacy; Schork, Nicholas J.; Steinberg, Martin H.; Perls, Thomas T.
2012-01-01
Supercentenarians (age 110+ years old) generally delay or escape age-related diseases and disability well beyond the age of 100 and this exceptional survival is likely to be influenced by a genetic predisposition that includes both common and rare genetic variants. In this report, we describe the complete genomic sequences of male and female supercentenarians, both age >114 years old. We show that: (1) the sequence variant spectrum of these two individuals’ DNA sequences is largely comparable to existing non-supercentenarian genomes; (2) the two individuals do not appear to carry most of the well-established human longevity enabling variants already reported in the literature; (3) they have a comparable number of known disease-associated variants relative to most human genomes sequenced to-date; (4) approximately 1% of the variants these individuals possess are novel and may point to new genes involved in exceptional longevity; and (5) both individuals are enriched for coding variants near longevity-associated variants that we discovered through a large genome-wide association study. These analyses suggest that there are both common and rare longevity-associated variants that may counter the effects of disease-predisposing variants and extend lifespan. The continued analysis of the genomes of these and other rare individuals who have survived to extremely old ages should provide insight into the processes that contribute to the maintenance of health during extreme aging. PMID:22303384
Kelly, Laura J; Renny-Byfield, Simon; Pellicer, Jaume; Macas, Jiří; Novák, Petr; Neumann, Pavel; Lysak, Martin A; Day, Peter D; Berger, Madeleine; Fay, Michael F; Nichols, Richard A; Leitch, Andrew R; Leitch, Ilia J
2015-10-01
Plants exhibit an extraordinary range of genome sizes, varying by > 2000-fold between the smallest and largest recorded values. In the absence of polyploidy, changes in the amount of repetitive DNA (transposable elements and tandem repeats) are primarily responsible for genome size differences between species. However, there is ongoing debate regarding the relative importance of amplification of repetitive DNA versus its deletion in governing genome size. Using data from 454 sequencing, we analysed the most repetitive fraction of some of the largest known genomes for diploid plant species, from members of Fritillaria. We revealed that genomic expansion has not resulted from the recent massive amplification of just a handful of repeat families, as shown in species with smaller genomes. Instead, the bulk of these immense genomes is composed of highly heterogeneous, relatively low-abundance repeat-derived DNA, supporting a scenario where amplified repeats continually accumulate due to infrequent DNA removal. Our results indicate that a lack of deletion and low turnover of repetitive DNA are major contributors to the evolution of extremely large genomes and show that their size cannot simply be accounted for by the activity of a small number of high-abundance repeat families. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus
2014-12-01
Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
Compositional patterns in the genomes of unicellular eukaryotes.
Costantini, Maria; Alvarez-Valin, Fernando; Costantini, Susan; Cammarano, Rosalia; Bernardi, Giorgio
2013-11-05
The genomes of multicellular eukaryotes are compartmentalized in mosaics of isochores, large and fairly homogeneous stretches of DNA that belong to a small number of families characterized by different average GC levels, by different gene concentration (that increase with GC), different chromatin structures, different replication timing in the cell cycle, and other different properties. A question raised by these basic results concerns how far back in evolution the compartmentalized organization of the eukaryotic genomes arose. In the present work we approached this problem by studying the compositional organization of the genomes from the unicellular eukaryotes for which full sequences are available, the sample used being representative. The average GC levels of the genomes from unicellular eukaryotes cover an extremely wide range (19%-60% GC) and the compositional patterns of individual genomes are extremely different but all genomes tested show a compositional compartmentalization. The average GC range of the genomes of unicellular eukaryotes is very broad (as broad as that of prokaryotes) and individual compositional patterns cover a very broad range from very narrow to very complex. Both features are not surprising for organisms that are very far from each other both in terms of phylogenetic distances and of environmental life conditions. Most importantly, all genomes tested, a representative sample of all supergroups of unicellular eukaryotes, are compositionally compartmentalized, a major difference with prokaryotes.
Genomic diversity of the human intestinal parasite Entamoeba histolytica
2012-01-01
Background Entamoeba histolytica is a significant cause of disease worldwide. However, little is known about the genetic diversity of the parasite. We re-sequenced the genomes of ten laboratory cultured lines of the eukaryotic pathogen Entamoeba histolytica in order to develop a picture of genetic diversity across the genome. Results The extreme nucleotide composition bias and repetitiveness of the E. histolytica genome provide a challenge for short-read mapping, yet we were able to define putative single nucleotide polymorphisms in a large portion of the genome. The results suggest a rather low level of single nucleotide diversity, although genes and gene families with putative roles in virulence are among the more polymorphic genes. We did observe large differences in coverage depth among genes, indicating differences in gene copy number between genomes. We found evidence indicating that recombination has occurred in the history of the sequenced genomes, suggesting that E. histolytica may reproduce sexually. Conclusions E. histolytica displays a relatively low level of nucleotide diversity across its genome. However, large differences in gene family content and gene copy number are seen among the sequenced genomes. The pattern of polymorphism indicates that E. histolytica reproduces sexually, or has done so in the past, which has previously been suggested but not proven. PMID:22630046
Compositional patterns in the genomes of unicellular eukaryotes
2013-01-01
Background The genomes of multicellular eukaryotes are compartmentalized in mosaics of isochores, large and fairly homogeneous stretches of DNA that belong to a small number of families characterized by different average GC levels, by different gene concentration (that increase with GC), different chromatin structures, different replication timing in the cell cycle, and other different properties. A question raised by these basic results concerns how far back in evolution the compartmentalized organization of the eukaryotic genomes arose. Results In the present work we approached this problem by studying the compositional organization of the genomes from the unicellular eukaryotes for which full sequences are available, the sample used being representative. The average GC levels of the genomes from unicellular eukaryotes cover an extremely wide range (19%-60% GC) and the compositional patterns of individual genomes are extremely different but all genomes tested show a compositional compartmentalization. Conclusions The average GC range of the genomes of unicellular eukaryotes is very broad (as broad as that of prokaryotes) and individual compositional patterns cover a very broad range from very narrow to very complex. Both features are not surprising for organisms that are very far from each other both in terms of phylogenetic distances and of environmental life conditions. Most importantly, all genomes tested, a representative sample of all supergroups of unicellular eukaryotes, are compositionally compartmentalized, a major difference with prokaryotes. PMID:24188247
Youssef, Noha H.; Couger, M. B.; Struchtemeyer, Christopher G.; Liggenstoffer, Audra S.; Prade, Rolf A.; Najar, Fares Z.; Atiyeh, Hasan K.; Wilkins, Mark R.
2013-01-01
Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production. PMID:23709508
Youssef, Noha H; Couger, M B; Struchtemeyer, Christopher G; Liggenstoffer, Audra S; Prade, Rolf A; Najar, Fares Z; Atiyeh, Hasan K; Wilkins, Mark R; Elshahed, Mostafa S
2013-08-01
Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production.
Genome size variation in deep-sea amphipods
Jamieson, A. J.; Piertney, S. B.
2017-01-01
Genome size varies considerably across taxa, and extensive research effort has gone into understanding whether variation can be explained by differences in key ecological and life-history traits among species. The extreme environmental conditions that characterize the deep sea have been hypothesized to promote large genome sizes in eukaryotes. Here we test this supposition by examining genome sizes among 13 species of deep-sea amphipods from the Mariana, Kermadec and New Hebrides trenches. Genome sizes were estimated using flow cytometry and found to vary nine-fold, ranging from 4.06 pg (4.04 Gb) in Paralicella caperesca to 34.79 pg (34.02 Gb) in Alicella gigantea. Phylogenetic independent contrast analysis identified a relationship between genome size and maximum body size, though this was largely driven by those species that display size gigantism. There was a distinct shift in the genome size trait diversification rate in the supergiant amphipod A. gigantea relative to the rest of the group. The variation in genome size observed is striking and argues against genome size being driven by a common evolutionary history, ecological niche and life-history strategy in deep-sea amphipods. PMID:28989783
De novo assembly of human genomes with massively parallel short read sequencing.
Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2010-02-01
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Genomic legacy of the African cheetah, Acinonyx jubatus.
Dobrynin, Pavel; Liu, Shiping; Tamazian, Gaik; Xiong, Zijun; Yurchenko, Andrey A; Krasheninnikova, Ksenia; Kliver, Sergey; Schmidt-Küntzel, Anne; Koepfli, Klaus-Peter; Johnson, Warren; Kuderna, Lukas F K; García-Pérez, Raquel; Manuel, Marc de; Godinez, Ricardo; Komissarov, Aleksey; Makunin, Alexey; Brukhin, Vladimir; Qiu, Weilin; Zhou, Long; Li, Fang; Yi, Jian; Driscoll, Carlos; Antunes, Agostinho; Oleksyk, Taras K; Eizirik, Eduardo; Perelman, Polina; Roelke, Melody; Wildt, David; Diekhans, Mark; Marques-Bonet, Tomas; Marker, Laurie; Bhak, Jong; Wang, Jun; Zhang, Guojie; O'Brien, Stephen J
2015-12-10
Patterns of genetic and genomic variance are informative in inferring population history for human, model species and endangered populations. Here the genome sequence of wild-born African cheetahs reveals extreme genomic depletion in SNV incidence, SNV density, SNVs of coding genes, MHC class I and II genes, and mitochondrial DNA SNVs. Cheetah genomes are on average 95 % homozygous compared to the genomes of the outbred domestic cat (24.08 % homozygous), Virunga Mountain Gorilla (78.12 %), inbred Abyssinian cat (62.63 %), Tasmanian devil, domestic dog and other mammalian species. Demographic estimators impute two ancestral population bottlenecks: one >100,000 years ago coincident with cheetah migrations out of the Americas and into Eurasia and Africa, and a second 11,084-12,589 years ago in Africa coincident with late Pleistocene large mammal extinctions. MHC class I gene loss and dramatic reduction in functional diversity of MHC genes would explain why cheetahs ablate skin graft rejection among unrelated individuals. Significant excess of non-synonymous mutations in AKAP4 (p<0.02), a gene mediating spermatozoon development, indicates cheetah fixation of five function-damaging amino acid variants distinct from AKAP4 homologues of other Felidae or mammals; AKAP4 dysfunction may cause the cheetah's extremely high (>80 %) pleiomorphic sperm. The study provides an unprecedented genomic perspective for the rare cheetah, with potential relevance to the species' natural history, physiological adaptations and unique reproductive disposition.
Extending information retrieval methods to personalized genomic-based studies of disease.
Ye, Shuyun; Dawson, John A; Kendziorski, Christina
2014-01-01
Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.
Allying with armored snails: the complete genome of gammaproteobacterial endosymbiont.
Nakagawa, Satoshi; Shimamura, Shigeru; Takaki, Yoshihiro; Suzuki, Yohey; Murakami, Shun-ichi; Watanabe, Tamaki; Fujiyoshi, So; Mino, Sayaka; Sawabe, Tomoo; Maeda, Takahiro; Makita, Hiroko; Nemoto, Suguru; Nishimura, Shin-Ichiro; Watanabe, Hiromi; Watsuji, Tomo-o; Takai, Ken
2014-01-01
Deep-sea vents harbor dense populations of various animals that have their specific symbiotic bacteria. Scaly-foot gastropods, which are snails with mineralized scales covering the sides of its foot, have a gammaproteobacterial endosymbiont in their enlarged esophageal glands and diverse epibionts on the surface of their scales. In this study, we report the complete genome sequencing of gammaproteobacterial endosymbiont. The endosymbiont genome displays features consistent with ongoing genome reduction such as large proportions of pseudogenes and insertion elements. The genome encodes functions commonly found in deep-sea vent chemoautotrophs such as sulfur oxidation and carbon fixation. Stable carbon isotope ((13)C)-labeling experiments confirmed the endosymbiont chemoautotrophy. The genome also includes an intact hydrogenase gene cluster that potentially has been horizontally transferred from phylogenetically distant bacteria. Notable findings include the presence and transcription of genes for flagellar assembly, through which proteins are potentially exported from bacterium to the host. Symbionts of snail individuals exhibited extreme genetic homogeneity, showing only two synonymous changes in 19 different genes (13 810 positions in total) determined for 32 individual gastropods collected from a single colony at one time. The extremely low genetic individuality in endosymbionts probably reflects that the stringent symbiont selection by host prevents the random genetic drift in the small population of horizontally transmitted symbiont. This study is the first complete genome analysis of gastropod endosymbiont and offers an opportunity to study genome evolution in a recently evolved endosymbiont.
NASA Astrophysics Data System (ADS)
Zhan, Aibin; Bao, Zhenmin; Hu, Xiaoli; Lu, Wei; Hu, Jingjie
2009-06-01
Microsatellite markers have become one kind of the most important molecular tools used in various researches. A large number of microsatellite markers are required for the whole genome survey in the fields of molecular ecology, quantitative genetics and genomics. Therefore, it is extremely necessary to select several versatile, low-cost, efficient and time- and labor-saving methods to develop a large panel of microsatellite markers. In this study, we used Zhikong scallop ( Chlamys farreri) as the target species to compare the efficiency of the five methods derived from three strategies for microsatellite marker development. The results showed that the strategy of constructing small insert genomic DNA library resulted in poor efficiency, while the microsatellite-enriched strategy highly improved the isolation efficiency. Although the mining public database strategy is time- and cost-saving, it is difficult to obtain a large number of microsatellite markers, mainly due to the limited sequence data of non-model species deposited in public databases. Based on the results in this study, we recommend two methods, microsatellite-enriched library construction method and FIASCO-colony hybridization method, for large-scale microsatellite marker development. Both methods were derived from the microsatellite-enriched strategy. The experimental results obtained from Zhikong scallop also provide the reference for microsatellite marker development in other species with large genomes.
Genomes in Turmoil: Frugality Drives Microbial Community Structure in Extremely Acidic Environments
NASA Astrophysics Data System (ADS)
Holmes, D. S.
2016-12-01
Extremely acidic environments (To gain insight into these issues, we have conducted deep bioinformatic analyses, including metabolic reconstruction of key assimilatory pathways, phylogenomics and network scrutiny of >160 genomes of acidophiles, including representatives from Archaea, Bacteria and Eukarya and at least ten metagenomes of acidic environments [Cardenas JP, et al. pp 179-197 in Acidophiles, eds R. Quatrini and D. B. Johnson, Caister Academic Press, UK (2016)]. Results yielded valuable insights into cellular processes, including carbon and nitrogen management and energy production, linking biogeochemical processes to organismal physiology. They also provided insight into the evolutionary forces that shape the genomic structure of members of acidophile communities. Niche partitioning can explain diversity patterns in rapidly changing acidic environments such as bioleaching heaps. However, in spatially and temporally homogeneous acidic environments genome flux appears to provide deeper insight into the composition and evolution of acidic consortia. Acidophiles have undergone genome streamlining by gene loss promoting mutual coexistence of species that exploit complementarity use of scarce resources consistent with the Black Queen hypothesis [Morris JJ et al. mBio 3: e00036-12 (2012)]. Acidophiles also have a large pool of accessory genes (the microbial super-genome) that can be accessed by horizontal gene transfer. This further promotes dependency relationships as drivers of community structure and the evolution of keystone species. Acknowledgements: Fondecyt 1130683; Basal CCTE PFB16
Šmarda, Petr; Bureš, Petr; Horová, Lucie
2007-01-01
Background and Aims The spatial and statistical distribution of genome sizes and the adaptivity of genome size to some types of habitat, vegetation or microclimatic conditions were investigated in a tetraploid population of Festuca pallens. The population was previously documented to vary highly in genome size and is assumed as a model for the study of the initial stages of genome size differentiation. Methods Using DAPI flow cytometry, samples were measured repeatedly with diploid Festuca pallens as the internal standard. Altogether 172 plants from 57 plots (2·25 m2), distributed in contrasting habitats over the whole locality in South Moravia, Czech Republic, were sampled. The differences in DNA content were confirmed by the double peaks of simultaneously measured samples. Key Results At maximum, a 1·115-fold difference in genome size was observed. The statistical distribution of genome sizes was found to be continuous and best fits the extreme (Gumbel) distribution with rare occurrences of extremely large genomes (positive-skewed), as it is similar for the log-normal distribution of the whole Angiosperms. Even plants from the same plot frequently varied considerably in genome size and the spatial distribution of genome sizes was generally random and unautocorrelated (P > 0·05). The observed spatial pattern and the overall lack of correlations of genome size with recognized vegetation types or microclimatic conditions indicate the absence of ecological adaptivity of genome size in the studied population. Conclusions These experimental data on intraspecific genome size variability in Festuca pallens argue for the absence of natural selection and the selective non-significance of genome size in the initial stages of genome size differentiation, and corroborate the current hypothetical model of genome size evolution in Angiosperms (Bennetzen et al., 2005, Annals of Botany 95: 127–132). PMID:17565968
Dermauw, Wannes; Vanholme, Bartel; Tirry, Luc; Van Leeuwen, Thomas
2010-04-01
In this study we sequenced and analysed the complete mitochondrial (mt) genome of the Chilean predatory mite Phytoseiulus persimilis Athias-Henriot (Chelicerata: Acari: Mesostigmata: Phytoseiidae: Amblyseiinae). The 16 199 bp genome (79.8% AT) contains the standard set of 13 protein-coding and 24 RNA genes. Compared with the ancestral arthropod mtDNA pattern, the gene order is extremely reshuffled (35 genes changed position) and represents a novel arrangement within the arthropods. This is probably related to the presence of several large noncoding regions in the genome. In contrast with the mt genome of the closely related species Metaseiulus occidentalis (Phytoseiidae: Typhlodrominae) - which was reported to be unusually large (24 961 bp), to lack nad6 and nad3 protein-coding genes, and to contain 22 tRNAs without T-arms - the genome of P. persimilis has all the features of a standard metazoan mt genome. Consequently, we performed additional experiments on the M. occidentalis mt genome. Our preliminary restriction digests and Southern hybridization data revealed that this genome is smaller than previously reported. In addition, we cloned nad3 in M. occidentalis and positioned this gene between nad4L and 12S-rRNA on the mt genome. Finally, we report that at least 15 of the 22 tRNAs in the M. occidentalis mt genome can be folded into canonical cloverleaf structures similar to their counterparts in P. persimilis.
An efficient approach to BAC based assembly of complex genomes.
Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David
2016-01-01
There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
2009-01-01
Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416
Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg
2009-08-06
Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.
A Mitogenomic Phylogeny of Living Primates
Finstermeier, Knut; Zinner, Dietmar; Brameier, Markus; Meyer, Matthias; Kreuz, Eva; Hofreiter, Michael; Roos, Christian
2013-01-01
Primates, the mammalian order including our own species, comprise 480 species in 78 genera. Thus, they represent the third largest of the 18 orders of eutherian mammals. Although recent phylogenetic studies on primates are increasingly built on molecular datasets, most of these studies have focused on taxonomic subgroups within the order. Complete mitochondrial (mt) genomes have proven to be extremely useful in deciphering within-order relationships even up to deep nodes. Using 454 sequencing, we sequenced 32 new complete mt genomes adding 20 previously not represented genera to the phylogenetic reconstruction of the primate tree. With 13 new sequences, the number of complete mt genomes within the parvorder Platyrrhini was widely extended, resulting in a largely resolved branching pattern among New World monkey families. We added 10 new Strepsirrhini mt genomes to the 15 previously available ones, thus almost doubling the number of mt genomes within this clade. Our data allow precise date estimates of all nodes and offer new insights into primate evolution. One major result is a relatively young date for the most recent common ancestor of all living primates which was estimated to 66-69 million years ago, suggesting that the divergence of extant primates started close to the K/T-boundary. Although some relationships remain unclear, the large number of mt genomes used allowed us to reconstruct a robust primate phylogeny which is largely in agreement with previous publications. Finally, we show that mt genomes are a useful tool for resolving primate phylogenetic relationships on various taxonomic levels. PMID:23874967
Xu, Jian; Li, Jiong-Tang; Jiang, Yanliang; Peng, Wenzhu; Yao, Zongli; Chen, Baohua; Jiang, Likun; Feng, Jingyan; Ji, Peifeng; Liu, Guiming; Liu, Zhanjiang; Tai, Ruyu; Dong, Chuanju; Sun, Xiaoqing; Zhao, Zi-Xia; Zhang, Yan; Wang, Jian; Li, Shangqi; Zhao, Yunfeng; Yang, Jiuhui; Sun, Xiaowen; Xu, Peng
2017-01-01
The Amur ide (Leuciscus waleckii) is a cyprinid fish that is widely distributed in Northeast Asia. The Lake Dali Nur population inhabits one of the most extreme aquatic environments on Earth, with an alkalinity up to 50 mmol/L (pH 9.6), thus providing an exceptional model with which to characterize the mechanisms of genomic evolution underlying adaptation to extreme environments. Here, we developed the reference genome assembly for L. waleckii from Lake Dali Nur. Intriguingly, we identified unusual expanded long terminal repeats (LTRs) with higher nucleotide substitution rates than in many other teleosts, suggesting their more recent insertion into the L. waleckii genome. We also identified expansions in genes encoding egg coat proteins and natriuretic peptide receptors, possibly underlying the adaptation to extreme environmental stress. We further sequenced the genomes of 10 additional individuals from freshwater and 18 from Lake Dali Nur populations, and we detected a total of 7.6 million SNPs from both populations. In a genome scan and comparison of these two populations, we identified a set of genomic regions under selective sweeps that harbor genes involved in ion homoeostasis, acid-base regulation, unfolded protein response, reactive oxygen species elimination, and urea excretion. Our findings provide comprehensive insight into the genomic mechanisms of teleost fish that underlie their adaptation to extreme alkaline environments. PMID:28007977
Lee, Hae-Won; Kim, Dae-Won; Lee, Mi-Hwa; Kim, Byung-Yong; Cho, Yong-Joon; Yim, Kyung June; Song, Hye Seon; Rhee, Jin-Kyu; Seo, Myung-Ji; Choi, Hak-Jong; Choi, Jong-Soon; Lee, Dong-Gi; Yoon, Changmann; Nam, Young-Do; Roh, Seong Woon
2015-01-01
An extremely halophilic archaeon, Haladaptatus cibarius D43(T), was isolated from traditional Korean salt-rich fermented seafood. Strain D43(T) shows the highest 16S rRNA gene sequence similarity (98.7 %) with Haladaptatus litoreus RO1-28(T), is Gram-negative staining, motile, and extremely halophilic. Despite potential industrial applications of extremely halophilic archaea, their genome characteristics remain obscure. Here, we describe the whole genome sequence and annotated features of strain D43(T). The 3,926,724 bp genome includes 4,092 protein-coding and 57 RNA genes (including 6 rRNA and 49 tRNA genes) with an average G + C content of 57.76 %.
Berndt, Sonja I.; Gustafsson, Stefan; Mägi, Reedik; Ganna, Andrea; Wheeler, Eleanor; Feitosa, Mary F.; Justice, Anne E.; Monda, Keri L.; Croteau-Chonka, Damien C.; Day, Felix R.; Esko, Tõnu; Fall, Tove; Ferreira, Teresa; Gentilini, Davide; Jackson, Anne U.; Luan, Jian’an; Randall, Joshua C.; Vedantam, Sailaja; Willer, Cristen J.; Winkler, Thomas W.; Wood, Andrew R.; Workalemahu, Tsegaselassie; Hu, Yi-Juan; Lee, Sang Hong; Liang, Liming; Lin, Dan-Yu; Min, Josine L.; Neale, Benjamin M.; Thorleifsson, Gudmar; Yang, Jian; Albrecht, Eva; Amin, Najaf; Bragg-Gresham, Jennifer L.; Cadby, Gemma; den Heijer, Martin; Eklund, Niina; Fischer, Krista; Goel, Anuj; Hottenga, Jouke-Jan; Huffman, Jennifer E.; Jarick, Ivonne; Johansson, Åsa; Johnson, Toby; Kanoni, Stavroula; Kleber, Marcus E.; König, Inke R.; Kristiansson, Kati; Kutalik, Zoltán; Lamina, Claudia; Lecoeur, Cecile; Li, Guo; Mangino, Massimo; McArdle, Wendy L.; Medina-Gomez, Carolina; Müller-Nurasyid, Martina; Ngwa, Julius S.; Nolte, Ilja M.; Paternoster, Lavinia; Pechlivanis, Sonali; Perola, Markus; Peters, Marjolein J.; Preuss, Michael; Rose, Lynda M.; Shi, Jianxin; Shungin, Dmitry; Smith, Albert Vernon; Strawbridge, Rona J.; Surakka, Ida; Teumer, Alexander; Trip, Mieke D.; Tyrer, Jonathan; Van Vliet-Ostaptchouk, Jana V.; Vandenput, Liesbeth; Waite, Lindsay L.; Zhao, Jing Hua; Absher, Devin; Asselbergs, Folkert W.; Atalay, Mustafa; Attwood, Antony P.; Balmforth, Anthony J.; Basart, Hanneke; Beilby, John; Bonnycastle, Lori L.; Brambilla, Paolo; Bruinenberg, Marcel; Campbell, Harry; Chasman, Daniel I.; Chines, Peter S.; Collins, Francis S.; Connell, John M.; Cookson, William; de Faire, Ulf; de Vegt, Femmie; Dei, Mariano; Dimitriou, Maria; Edkins, Sarah; Estrada, Karol; Evans, David M.; Farrall, Martin; Ferrario, Marco M.; Ferrières, Jean; Franke, Lude; Frau, Francesca; Gejman, Pablo V.; Grallert, Harald; Grönberg, Henrik; Gudnason, Vilmundur; Hall, Alistair S.; Hall, Per; Hartikainen, Anna-Liisa; Hayward, Caroline; Heard-Costa, Nancy L.; Heath, Andrew C.; Hebebrand, Johannes; Homuth, Georg; Hu, Frank B.; Hunt, Sarah E.; Hyppönen, Elina; Iribarren, Carlos; Jacobs, Kevin B.; Jansson, John-Olov; Jula, Antti; Kähönen, Mika; Kathiresan, Sekar; Kee, Frank; Khaw, Kay-Tee; Kivimaki, Mika; Koenig, Wolfgang; Kraja, Aldi T.; Kumari, Meena; Kuulasmaa, Kari; Kuusisto, Johanna; Laitinen, Jaana H.; Lakka, Timo A.; Langenberg, Claudia; Launer, Lenore J.; Lind, Lars; Lindström, Jaana; Liu, Jianjun; Liuzzi, Antonio; Lokki, Marja-Liisa; Lorentzon, Mattias; Madden, Pamela A.; Magnusson, Patrik K.; Manunta, Paolo; Marek, Diana; März, Winfried; Mateo Leach, Irene; McKnight, Barbara; Medland, Sarah E.; Mihailov, Evelin; Milani, Lili; Montgomery, Grant W.; Mooser, Vincent; Mühleisen, Thomas W.; Munroe, Patricia B.; Musk, Arthur W.; Narisu, Narisu; Navis, Gerjan; Nicholson, George; Nohr, Ellen A.; Ong, Ken K.; Oostra, Ben A.; Palmer, Colin N.A.; Palotie, Aarno; Peden, John F.; Pedersen, Nancy; Peters, Annette; Polasek, Ozren; Pouta, Anneli; Pramstaller, Peter P.; Prokopenko, Inga; Pütter, Carolin; Radhakrishnan, Aparna; Raitakari, Olli; Rendon, Augusto; Rivadeneira, Fernando; Rudan, Igor; Saaristo, Timo E.; Sambrook, Jennifer G.; Sanders, Alan R.; Sanna, Serena; Saramies, Jouko; Schipf, Sabine; Schreiber, Stefan; Schunkert, Heribert; Shin, So-Youn; Signorini, Stefano; Sinisalo, Juha; Skrobek, Boris; Soranzo, Nicole; Stančáková, Alena; Stark, Klaus; Stephens, Jonathan C.; Stirrups, Kathleen; Stolk, Ronald P.; Stumvoll, Michael; Swift, Amy J.; Theodoraki, Eirini V.; Thorand, Barbara; Tregouet, David-Alexandre; Tremoli, Elena; Van der Klauw, Melanie M.; van Meurs, Joyce B.J.; Vermeulen, Sita H.; Viikari, Jorma; Virtamo, Jarmo; Vitart, Veronique; Waeber, Gérard; Wang, Zhaoming; Widén, Elisabeth; Wild, Sarah H.; Willemsen, Gonneke; Winkelmann, Bernhard R.; Witteman, Jacqueline C.M.; Wolffenbuttel, Bruce H.R.; Wong, Andrew; Wright, Alan F.; Zillikens, M. Carola; Amouyel, Philippe; Boehm, Bernhard O.; Boerwinkle, Eric; Boomsma, Dorret I.; Caulfield, Mark J.; Chanock, Stephen J.; Cupples, L. Adrienne; Cusi, Daniele; Dedoussis, George V.; Erdmann, Jeanette; Eriksson, Johan G.; Franks, Paul W.; Froguel, Philippe; Gieger, Christian; Gyllensten, Ulf; Hamsten, Anders; Harris, Tamara B.; Hengstenberg, Christian; Hicks, Andrew A.; Hingorani, Aroon; Hinney, Anke; Hofman, Albert; Hovingh, Kees G.; Hveem, Kristian; Illig, Thomas; Jarvelin, Marjo-Riitta; Jöckel, Karl-Heinz; Keinanen-Kiukaanniemi, Sirkka M.; Kiemeney, Lambertus A.; Kuh, Diana; Laakso, Markku; Lehtimäki, Terho; Levinson, Douglas F.; Martin, Nicholas G.; Metspalu, Andres; Morris, Andrew D.; Nieminen, Markku S.; Njølstad, Inger; Ohlsson, Claes; Oldehinkel, Albertine J.; Ouwehand, Willem H.; Palmer, Lyle J.; Penninx, Brenda; Power, Chris; Province, Michael A.; Psaty, Bruce M.; Qi, Lu; Rauramaa, Rainer; Ridker, Paul M.; Ripatti, Samuli; Salomaa, Veikko; Samani, Nilesh J.; Snieder, Harold; Sørensen, Thorkild I.A.; Spector, Timothy D.; Stefansson, Kari; Tönjes, Anke; Tuomilehto, Jaakko; Uitterlinden, André G.; Uusitupa, Matti; van der Harst, Pim; Vollenweider, Peter; Wallaschofski, Henri; Wareham, Nicholas J.; Watkins, Hugh; Wichmann, H.-Erich; Wilson, James F.; Abecasis, Goncalo R.; Assimes, Themistocles L.; Barroso, Inês; Boehnke, Michael; Borecki, Ingrid B.; Deloukas, Panos; Fox, Caroline S.; Frayling, Timothy; Groop, Leif C.; Haritunian, Talin; Heid, Iris M.; Hunter, David; Kaplan, Robert C.; Karpe, Fredrik; Moffatt, Miriam; Mohlke, Karen L.; O’Connell, Jeffrey R.; Pawitan, Yudi; Schadt, Eric E.; Schlessinger, David; Steinthorsdottir, Valgerdur; Strachan, David P.; Thorsteinsdottir, Unnur; van Duijn, Cornelia M.; Visscher, Peter M.; Di Blasio, Anna Maria; Hirschhorn, Joel N.; Lindgren, Cecilia M.; Morris, Andrew P.; Meyre, David; Scherag, André; McCarthy, Mark I.; Speliotes, Elizabeth K.; North, Kari E.; Loos, Ruth J.F.; Ingelsson, Erik
2014-01-01
Approaches exploiting extremes of the trait distribution may reveal novel loci for common traits, but it is unknown whether such loci are generalizable to the general population. In a genome-wide search for loci associated with upper vs. lower 5th percentiles of body mass index, height and waist-hip ratio, as well as clinical classes of obesity including up to 263,407 European individuals, we identified four new loci (IGFBP4, H6PD, RSRC1, PPP2R2A) influencing height detected in the tails and seven new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3, ZZZ3) for clinical classes of obesity. Further, we show that there is large overlap in terms of genetic structure and distribution of variants between traits based on extremes and the general population and little etiologic heterogeneity between obesity subgroups. PMID:23563607
Wilson, Thomas E; Arlt, Martin F; Park, So Hae; Rajendran, Sountharia; Paulsen, Michelle; Ljungman, Mats; Glover, Thomas W
2015-02-01
Copy number variants (CNVs) resulting from genomic deletions and duplications and common fragile sites (CFSs) seen as breaks on metaphase chromosomes are distinct forms of structural chromosome instability precipitated by replication inhibition. Although they share a common induction mechanism, it is not known how CNVs and CFSs are related or why some genomic loci are much more prone to their occurrence. Here we compare large sets of de novo CNVs and CFSs in several experimental cell systems to each other and to overlapping genomic features. We first show that CNV hotpots and CFSs occurred at the same human loci within a given cultured cell line. Bru-seq nascent RNA sequencing further demonstrated that although genomic regions with low CNV frequencies were enriched in transcribed genes, the CNV hotpots that matched CFSs specifically corresponded to the largest active transcription units in both human and mouse cells. Consistently, active transcription units >1 Mb were robust cell-type-specific predictors of induced CNV hotspots and CFS loci. Unlike most transcribed genes, these very large transcription units replicated late and organized deletion and duplication CNVs into their transcribed and flanking regions, respectively, supporting a role for transcription in replication-dependent lesion formation. These results indicate that active large transcription units drive extreme locus- and cell-type-specific genomic instability under replication stress, resulting in both CNVs and CFSs as different manifestations of perturbed replication dynamics. © 2015 Wilson et al.; Published by Cold Spring Harbor Laboratory Press.
Park, So Hae; Rajendran, Sountharia; Paulsen, Michelle; Ljungman, Mats; Glover, Thomas W.
2015-01-01
Copy number variants (CNVs) resulting from genomic deletions and duplications and common fragile sites (CFSs) seen as breaks on metaphase chromosomes are distinct forms of structural chromosome instability precipitated by replication inhibition. Although they share a common induction mechanism, it is not known how CNVs and CFSs are related or why some genomic loci are much more prone to their occurrence. Here we compare large sets of de novo CNVs and CFSs in several experimental cell systems to each other and to overlapping genomic features. We first show that CNV hotpots and CFSs occurred at the same human loci within a given cultured cell line. Bru-seq nascent RNA sequencing further demonstrated that although genomic regions with low CNV frequencies were enriched in transcribed genes, the CNV hotpots that matched CFSs specifically corresponded to the largest active transcription units in both human and mouse cells. Consistently, active transcription units >1 Mb were robust cell-type-specific predictors of induced CNV hotspots and CFS loci. Unlike most transcribed genes, these very large transcription units replicated late and organized deletion and duplication CNVs into their transcribed and flanking regions, respectively, supporting a role for transcription in replication-dependent lesion formation. These results indicate that active large transcription units drive extreme locus- and cell-type-specific genomic instability under replication stress, resulting in both CNVs and CFSs as different manifestations of perturbed replication dynamics. PMID:25373142
Diversity and function of prevalent symbiotic marine bacteria in the genus Endozoicomonas.
Neave, Matthew J; Apprill, Amy; Ferrier-Pagès, Christine; Voolstra, Christian R
2016-10-01
Endozoicomonas bacteria are emerging as extremely diverse and flexible symbionts of numerous marine hosts inhabiting oceans worldwide. Their hosts range from simple invertebrate species, such as sponges and corals, to complex vertebrates, such as fish. Although widely distributed, the functional role of Endozoicomonas within their host microenvironment is not well understood. In this review, we provide a summary of the currently recognized hosts of Endozoicomonas and their global distribution. Next, the potential functional roles of Endozoicomonas, particularly in light of recent microscopic, genomic, and genetic analyses, are discussed. These analyses suggest that Endozoicomonas typically reside in aggregates within host tissues, have a free-living stage due to their large genome sizes, show signs of host and local adaptation, participate in host-associated protein and carbohydrate transport and cycling, and harbour a high degree of genomic plasticity due to the large proportion of transposable elements residing in their genomes. This review will finish with a discussion on the methodological tools currently employed to study Endozoicomonas and host interactions and review future avenues for studying complex host-microbial symbioses.
Molecular hyperdiversity and evolution in very large populations.
Cutter, Asher D; Jovelin, Richard; Dey, Alivia
2013-04-01
The genomic density of sequence polymorphisms critically affects the sensitivity of inferences about ongoing sequence evolution, function and demographic history. Most animal and plant genomes have relatively low densities of polymorphisms, but some species are hyperdiverse with neutral nucleotide heterozygosity exceeding 5%. Eukaryotes with extremely large populations, mimicking bacterial and viral populations, present novel opportunities for studying molecular evolution in sexually reproducing taxa with complex development. In particular, hyperdiverse species can help answer controversial questions about the evolution of genome complexity, the limits of natural selection, modes of adaptation and subtleties of the mutation process. However, such systems have some inherent complications and here we identify topics in need of theoretical developments. Close relatives of the model organisms Caenorhabditis elegans and Drosophila melanogaster provide known examples of hyperdiverse eukaryotes, encouraging functional dissection of resulting molecular evolutionary patterns. We recommend how best to exploit hyperdiverse populations for analysis, for example, in quantifying the impact of noncrossover recombination in genomes and for determining the identity and micro-evolutionary selective pressures on noncoding regulatory elements. © 2013 Blackwell Publishing Ltd.
Evolutionary genomics of the cold-adapted diatom Fragilariopsis cylindrus
Mock, Thomas; Otillar, Robert P.; Strauss, Jan; ...
2017-01-26
The Southern Ocean houses a diverse and productive community of organisms. Unicellular eukaryotic diatoms are the main primary producers in this environment, where photosynthesis is limited by low concentrations of dissolved iron and large seasonal fluctuations in light, temperature and the extent of sea ice. How diatoms have adapted to this extreme environment is largely unknown. Here we present insights into the genome evolution of a cold-Adapted diatom from the Southern Ocean, Fragilariopsis cylindrus, based on a comparison with temperate diatoms. We find that approximately 24.7 per cent of the diploid F. cylindrus genome consists of genetic loci with allelesmore » that are highly divergent (15.1 megabases of the total genome size of 61.1 megabases). These divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO 2. Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation. Divergent alleles may be involved in adaptation to environmental fluctuations in the Southern Ocean.« less
Evolutionary genomics of the cold-adapted diatom Fragilariopsis cylindrus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mock, Thomas; Otillar, Robert P.; Strauss, Jan
The Southern Ocean houses a diverse and productive community of organisms. Unicellular eukaryotic diatoms are the main primary producers in this environment, where photosynthesis is limited by low concentrations of dissolved iron and large seasonal fluctuations in light, temperature and the extent of sea ice. How diatoms have adapted to this extreme environment is largely unknown. Here we present insights into the genome evolution of a cold-Adapted diatom from the Southern Ocean, Fragilariopsis cylindrus, based on a comparison with temperate diatoms. We find that approximately 24.7 per cent of the diploid F. cylindrus genome consists of genetic loci with allelesmore » that are highly divergent (15.1 megabases of the total genome size of 61.1 megabases). These divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO 2. Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation. Divergent alleles may be involved in adaptation to environmental fluctuations in the Southern Ocean.« less
Chelikani, Venkata; Ranjan, Tushar; Zade, Amrutraj; Shukla, Avi; Kondabagil, Kiran
2014-06-01
Genome packaging is a critical step in the virion assembly process. The putative ATP-driven genome packaging motor of Acanthamoeba polyphaga mimivirus (APMV) and other nucleocytoplasmic large DNA viruses (NCLDVs) is a distant ortholog of prokaryotic chromosome segregation motors, such as FtsK and HerA, rather than other viral packaging motors, such as large terminase. Intriguingly, APMV also encodes other components, i.e., three putative serine recombinases and a putative type II topoisomerase, all of which are essential for chromosome segregation in prokaryotes. Based on our analyses of these components and taking the limited available literature into account, here we propose for the first time a model for genome segregation and packaging in APMV that can possibly be extended to NCLDV subfamilies, except perhaps Poxviridae and Ascoviridae. This model might represent a unique variation of the prokaryotic system acquired and contrived by the large DNA viruses of eukaryotes. It is also consistent with previous observations that unicellular eukaryotes, such as amoebae, are melting pots for the advent of chimeric organisms with novel mechanisms. Extremely large viruses with DNA genomes infect a wide range of eukaryotes, from human beings to amoebae and from crocodiles to algae. These large DNA viruses, unlike their much smaller cousins, have the capability of making most of the protein components required for their multiplication. Once they infect the cell, these viruses set up viral replication centers, known as viral factories, to carry out their multiplication with very little help from the host. Our sequence analyses show that there is remarkable similarity between prokaryotes (bacteria and archaea) and large DNA viruses, such as mimivirus, vaccinia virus, and pandoravirus, in the way that they process their newly synthesized genetic material to make sure that only one copy of the complete genome is generated and is meticulously placed inside the newly synthesized viral particle. These findings have important evolutionary implications about the origin and evolution of large viruses.
Chelikani, Venkata; Ranjan, Tushar; Zade, Amrutraj; Shukla, Avi
2014-01-01
ABSTRACT Genome packaging is a critical step in the virion assembly process. The putative ATP-driven genome packaging motor of Acanthamoeba polyphaga mimivirus (APMV) and other nucleocytoplasmic large DNA viruses (NCLDVs) is a distant ortholog of prokaryotic chromosome segregation motors, such as FtsK and HerA, rather than other viral packaging motors, such as large terminase. Intriguingly, APMV also encodes other components, i.e., three putative serine recombinases and a putative type II topoisomerase, all of which are essential for chromosome segregation in prokaryotes. Based on our analyses of these components and taking the limited available literature into account, here we propose for the first time a model for genome segregation and packaging in APMV that can possibly be extended to NCLDV subfamilies, except perhaps Poxviridae and Ascoviridae. This model might represent a unique variation of the prokaryotic system acquired and contrived by the large DNA viruses of eukaryotes. It is also consistent with previous observations that unicellular eukaryotes, such as amoebae, are melting pots for the advent of chimeric organisms with novel mechanisms. IMPORTANCE Extremely large viruses with DNA genomes infect a wide range of eukaryotes, from human beings to amoebae and from crocodiles to algae. These large DNA viruses, unlike their much smaller cousins, have the capability of making most of the protein components required for their multiplication. Once they infect the cell, these viruses set up viral replication centers, known as viral factories, to carry out their multiplication with very little help from the host. Our sequence analyses show that there is remarkable similarity between prokaryotes (bacteria and archaea) and large DNA viruses, such as mimivirus, vaccinia virus, and pandoravirus, in the way that they process their newly synthesized genetic material to make sure that only one copy of the complete genome is generated and is meticulously placed inside the newly synthesized viral particle. These findings have important evolutionary implications about the origin and evolution of large viruses. PMID:24623441
Miklós, István
2009-01-01
Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746
Pathology and genomics of pediatric melanoma: A critical reexamination and new insights.
Bahrami, Armita; Barnhill, Raymond L
2018-02-01
The clinicopathologic features of pediatric melanoma are distinct from those of the adult counterpart. For example, most childhood melanomas exhibit a uniquely favorable biologic behavior, save for those arising in large/giant congenital nevi. Recent studies suggest that the characteristically favorable biologic behavior of childhood melanoma may be related to extreme telomere shortening and dysfunction in the cancer cells. Herein, we review the genomic profiles that have been defined for the different subtypes of pediatric melanoma and particularly emphasize the potential prognostic value of telomerase reverse transcriptase alterations for these tumors. © 2017 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poliakov, Alexander; Couronne, Olivier
2002-11-04
Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less
Legault, Boris A; Lopez-Lopez, Arantxa; Alba-Casado, Jose Carlos; Doolittle, W Ford; Bolhuis, Henk; Rodriguez-Valera, Francisco; Papke, R Thane
2006-01-01
Background Mature saturated brine (crystallizers) communities are largely dominated (>80% of cells) by the square halophilic archaeon "Haloquadratum walsbyi". The recent cultivation of the strain HBSQ001 and thesequencing of its genome allows comparison with the metagenome of this taxonomically simplified environment. Similar studies carried out in other extreme environments have revealed very little diversity in gene content among the cell lineages present. Results The metagenome of the microbial community of a crystallizer pond has been analyzed by end sequencing a 2000 clone fosmid library and comparing the sequences obtained with the genome sequence of "Haloquadratum walsbyi". The genome of the sequenced strain was retrieved nearly complete within this environmental DNA library. However, many ORF's that could be ascribed to the "Haloquadratum" metapopulation by common genome characteristics or scaffolding to the strain genome were not present in the specific sequenced isolate. Particularly, three regions of the sequenced genome were associated with multiple rearrangements and the presence of different genes from the metapopulation. Many transposition and phage related genes were found within this pool which, together with the associated atypical GC content in these areas, supports lateral gene transfer mediated by these elements as the most probable genetic cause of this variability. Additionally, these sequences were highly enriched in putative regulatory and signal transduction functions. Conclusion These results point to a large pan-genome (total gene repertoire of the genus/species) even in this highly specialized extremophile and at a single geographic location. The extensive gene repertoire is what might be expected of a population that exploits a diverse nutrient pool, resulting from the degradation of biomass produced at lower salinities. PMID:16820057
Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide
2014-01-01
The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.
2010-01-01
Background The family Tetranychidae (Chelicerata: Acari) includes ~1200 species, many of which are of agronomic importance. To date, mitochondrial genomes of only two Tetranychidae species have been sequenced, and it has been found that these two mitochondrial genomes are characterized by many unusual features in genome organization and structure such as gene order and nucleotide frequency. The scarcity of available sequence data has greatly impeded evolutionary studies in Acari (mites and ticks). Information on Tetranychidae mitochondrial genomes is quite important for phylogenetic evaluation and population genetics, as well as the molecular evolution of functional genes such as acaricide-resistance genes. In this study, we sequenced the complete mitochondrial genome of Panonychus citri (Family Tetranychidae), a worldwide citrus pest, and provide a comparison to other Acari. Results The mitochondrial genome of P. citri is a typical circular molecule of 13,077 bp, and contains the complete set of 37 genes that are usually found in metazoans. This is the smallest mitochondrial genome within all sequenced Acari and other Chelicerata, primarily due to the significant size reduction of protein coding genes (PCGs), a large rRNA gene, and the A + T-rich region. The mitochondrial gene order for P. citri is the same as those for P. ulmi and Tetranychus urticae, but distinctly different from other Acari by a series of gene translocations and/or inversions. The majority of the P. citri mitochondrial genome has a high A + T content (85.28%), which is also reflected by AT-rich codons being used more frequently, but exhibits a positive GC-skew (0.03). The Acari mitochondrial nad1 exhibits a faster amino acid substitution rate than other genes, and the variation of nucleotide substitution patterns of PCGs is significantly correlated with the G + C content. Most tRNA genes of P. citri are extremely truncated and atypical (44-65, 54.1 ± 4.1 bp), lacking either the T- or D-arm, as found in P. ulmi, T. urticae, and other Acariform mites. Conclusions The P. citri mitochondrial gene order is markedly different from those of other chelicerates, but is conserved within the family Tetranychidae indicating that high rearrangements have occurred after Tetranychidae diverged from other Acari. Comparative analyses suggest that the genome size, gene order, gene content, codon usage, and base composition are strongly variable among Acari mitochondrial genomes. While extremely small and unusual tRNA genes seem to be common for Acariform mites, further experimental evidence is needed. PMID:20969792
FGWAS: Functional genome wide association analysis.
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-10-01
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Language continuity despite population replacement in Remote Oceania.
Posth, Cosimo; Nägele, Kathrin; Colleran, Heidi; Valentin, Frédérique; Bedford, Stuart; Kami, Kaitip W; Shing, Richard; Buckley, Hallie; Kinaston, Rebecca; Walworth, Mary; Clark, Geoffrey R; Reepmeyer, Christian; Flexner, James; Maric, Tamara; Moser, Johannes; Gresky, Julia; Kiko, Lawrence; Robson, Kathryn J; Auckland, Kathryn; Oppenheimer, Stephen J; Hill, Adrian V S; Mentzer, Alexander J; Zech, Jana; Petchey, Fiona; Roberts, Patrick; Jeong, Choongwon; Gray, Russell D; Krause, Johannes; Powell, Adam
2018-04-01
Recent genomic analyses show that the earliest peoples reaching Remote Oceania-associated with Austronesian-speaking Lapita culture-were almost completely East Asian, without detectable Papuan ancestry. However, Papuan-related genetic ancestry is found across present-day Pacific populations, indicating that peoples from Near Oceania have played a significant, but largely unknown, ancestral role. Here, new genome-wide data from 19 ancient South Pacific individuals provide direct evidence of a so-far undescribed Papuan expansion into Remote Oceania starting ~2,500 yr BP, far earlier than previously estimated and supporting a model from historical linguistics. New genome-wide data from 27 contemporary ni-Vanuatu demonstrate a subsequent and almost complete replacement of Lapita-Austronesian by Near Oceanian ancestry. Despite this massive demographic change, incoming Papuan languages did not replace Austronesian languages. Population replacement with language continuity is extremely rare-if not unprecedented-in human history. Our analyses show that rather than one large-scale event, the process was incremental and complex, with repeated migrations and sex-biased admixture with peoples from the Bismarck Archipelago.
Genome-wide association studies of obesity and metabolic syndrome.
Fall, Tove; Ingelsson, Erik
2014-01-25
Until just a few years ago, the genetic determinants of obesity and metabolic syndrome were largely unknown, with the exception of a few forms of monogenic extreme obesity. Since genome-wide association studies (GWAS) became available, large advances have been made. The first single nucleotide polymorphism robustly associated with increased body mass index (BMI) was in 2007 mapped to a gene with for the time unknown function. This gene, now known as fat mass and obesity associated (FTO) has been repeatedly replicated in several ethnicities and is affecting obesity by regulating appetite. Since the first report from a GWAS of obesity, an increasing number of markers have been shown to be associated with BMI, other measures of obesity or fat distribution and metabolic syndrome. This systematic review of obesity GWAS will summarize genome-wide significant findings for obesity and metabolic syndrome and briefly give a few suggestions of what is to be expected in the next few years. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Enabling large-scale next-generation sequence assembly with Blacklight
Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.
2014-01-01
Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974
Kim, Sanghee; Lim, Byung-Jin; Min, Gi-Sik; Choi, Han-Gu
2013-05-10
Copepoda is the most diverse and abundant group of crustaceans, but its phylogenetic relationships are ambiguous. Mitochondrial (mt) genomes are useful for studying evolutionary history, but only six complete Copepoda mt genomes have been made available and these have extremely rearranged genome structures. This study determined the mt genome of Calanus hyperboreus, making it the first reported Arctic copepod mt genome and the first complete mt genome of a calanoid copepod. The mt genome of C. hyperboreus is 17,910 bp in length and it contains the entire set of 37 mt genes, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs. It has a very unusual gene structure, including the longest control region reported for a crustacean, a large tRNA gene cluster, and reversed GC skews in 11 out of 13 protein-coding genes (84.6%). Despite the unusual features, comparing this genome to published copepod genomes revealed retained pan-crustacean features, as well as a conserved calanoid-specific pattern. Our data provide a foundation for exploring the calanoid pattern and the mechanisms of mt gene rearrangement in the evolutionary history of the copepod mt genome. Copyright © 2012 Elsevier B.V. All rights reserved.
A world of opportunities with nanopore sequencing.
Leggett, Richard M; Clark, Matthew D
2017-11-28
Oxford Nanopore Technologies' MinION sequencer was launched in pre-release form in 2014 and represents an exciting new sequencing paradigm. The device offers multi-kilobase reads and a streamed mode of operation that allows processing of reads as they are generated. Crucially, it is an extremely compact device that is powered from the USB port of a laptop computer, enabling it to be taken out of the lab and facilitating previously impossible in-field sequencing experiments to be undertaken. Many of the initial publications concerning the platform focused on provision of tools to access and analyse the new sequence formats and then demonstrating the assembly of microbial genomes. More recently, as throughput and accuracy have increased, it has been possible to begin work involving more complex genomes and metagenomes. With the release of the high-throughput GridION X5 and PromethION platforms, the sequencing of large genomes will become more cost efficient, and enable the leveraging of extremely long (>100 kb) reads for resolution of complex genomic structures. This review provides a brief overview of nanopore sequencing technology, describes the growing range of nanopore bioinformatics tools, and highlights some of the most influential publications that have emerged over the last 2 years. Finally, we look to the future and the potential the platform has to disrupt work in human, microbiome, and plant genomics. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Remnants of an Ancient Deltaretrovirus in the Genomes of Horseshoe Bats (Rhinolophidae).
Hron, Tomáš; Farkašová, Helena; Gifford, Robert J; Benda, Petr; Hulva, Pavel; Görföl, Tamás; Pačes, Jan; Elleder, Daniel
2018-04-10
Endogenous retrovirus (ERV) sequences provide a rich source of information about the long-term interactions between retroviruses and their hosts. However, most ERVs are derived from a subset of retrovirus groups, while ERVs derived from certain other groups remain extremely rare. In particular, only a single ERV sequence has been identified that shows evidence of being related to an ancient Deltaretrovirus , despite the large number of vertebrate genome sequences now available. In this report, we identify a second example of an ERV sequence putatively derived from a past deltaretroviral infection, in the genomes of several species of horseshoe bats (Rhinolophidae). This sequence represents a fragment of viral genome derived from a single integration. The time of the integration was estimated to be 11-19 million years ago. This finding, together with the previously identified endogenous Deltaretrovirus in long-fingered bats (Miniopteridae), suggest a close association of bats with ancient deltaretroviruses.
Bioenergetic Constraints on the Evolution of Complex Life
Lane, Nick
2014-01-01
All morphologically complex life on Earth, beyond the level of cyanobacteria, is eukaryotic. All eukaryotes share a common ancestor that was already a complex cell. Despite their biochemical virtuosity, prokaryotes show little tendency to evolve eukaryotic traits or large genomes. Here I argue that prokaryotes are constrained by their membrane bioenergetics, for fundamental reasons relating to the origin of life. Eukaryotes arose in a rare endosymbiosis between two prokaryotes, which broke the energetic constraints on prokaryotes and gave rise to mitochondria. Loss of almost all mitochondrial genes produced an extreme genomic asymmetry, in which tiny mitochondrial genomes support, energetically, a massive nuclear genome, giving eukaryotes three to five orders of magnitude more energy per gene than prokaryotes. The requirement for endosymbiosis radically altered selection on eukaryotes, potentially explaining the evolution of unique traits, including the nucleus, sex, two sexes, speciation, and aging. PMID:24789818
Yang, Jinliang; Jiang, Haiying; Yeh, Cheng-Ting; Yu, Jianming; Jeddeloh, Jeffrey A; Nettleton, Dan; Schnable, Patrick S
2015-11-01
Although approaches for performing genome-wide association studies (GWAS) are well developed, conventional GWAS requires high-density genotyping of large numbers of individuals from a diversity panel. Here we report a method for performing GWAS that does not require genotyping of large numbers of individuals. Instead XP-GWAS (extreme-phenotype GWAS) relies on genotyping pools of individuals from a diversity panel that have extreme phenotypes. This analysis measures allele frequencies in the extreme pools, enabling discovery of associations between genetic variants and traits of interest. This method was evaluated in maize (Zea mays) using the well-characterized kernel row number trait, which was selected to enable comparisons between the results of XP-GWAS and conventional GWAS. An exome-sequencing strategy was used to focus sequencing resources on genes and their flanking regions. A total of 0.94 million variants were identified and served as evaluation markers; comparisons among pools showed that 145 of these variants were statistically associated with the kernel row number phenotype. These trait-associated variants were significantly enriched in regions identified by conventional GWAS. XP-GWAS was able to resolve several linked QTL and detect trait-associated variants within a single gene under a QTL peak. XP-GWAS is expected to be particularly valuable for detecting genes or alleles responsible for quantitative variation in species for which extensive genotyping resources are not available, such as wild progenitors of crops, orphan crops, and other poorly characterized species such as those of ecological interest. © 2015 The Authors The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
2010-01-01
Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. PMID:20609256
Draft Genome Sequence of Brevibacterium linens AE038-8, an Extremely Arsenic-Resistant Bacterium
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maizel, Daniela; Utturkar, Sagar M.; Brown, Steven D.
To understand the arsenic biogeocycles in the groundwaters at Tucumán, Argentina, we isolated Brevibacterium linens sp. strain AE38-8, obtained from arsenic-contaminated well water. This strain is extremely resistant to arsenicals and has arsenic resistance (ars) genes in its genome. Here, we report the draft genome sequence of B. linens AE38-8.
Draft Genome Sequence of Brevibacterium linens AE038-8, an Extremely Arsenic-Resistant Bacterium
Maizel, Daniela; Utturkar, Sagar M.; Brown, Steven D.; ...
2015-04-16
To understand the arsenic biogeocycles in the groundwaters at Tucumán, Argentina, we isolated Brevibacterium linens sp. strain AE38-8, obtained from arsenic-contaminated well water. This strain is extremely resistant to arsenicals and has arsenic resistance (ars) genes in its genome. Here, we report the draft genome sequence of B. linens AE38-8.
Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis
Patch, Ann-Marie; Bailey, Peter; Newell, Felicity; Holmes, Oliver; Fink, J. Lynn; Quinn, Michael C.J.; Tang, Yue Hang; Lampe, Guy; Quek, Kelly; Loffler, Kelly A.; Manning, Suzanne; Idrisoglu, Senel; Miller, David; Xu, Qinying; Waddell, Nick; Wilson, Peter J.; Bruxner, Timothy J.C.; Christ, Angelika N.; Harliwong, Ivon; Nourse, Craig; Nourbakhsh, Ehsan; Anderson, Matthew; Kazakoff, Stephen; Leonard, Conrad; Wood, Scott; Simpson, Peter T.; Reid, Lynne E.; Krause, Lutz; Hussey, Damian J.; Watson, David I.; Lord, Reginald V.; Nancarrow, Derek; Phillips, Wayne A.; Gotley, David; Smithers, B. Mark; Whiteman, David C.; Hayward, Nicholas K.; Campbell, Peter J.; Pearson, John V.; Grimmond, Sean M.; Barbour, Andrew P.
2015-01-01
Oesophageal adenocarcinoma (EAC) incidence is rapidly increasing in Western countries. A better understanding of EAC underpins efforts to improve early detection and treatment outcomes. While large EAC exome sequencing efforts to date have found recurrent loss-of-function mutations, oncogenic driving events have been underrepresented. Here we use a combination of whole-genome sequencing (WGS) and single-nucleotide polymorphism-array profiling to show that genomic catastrophes are frequent in EAC, with almost a third (32%, n = 40/123) undergoing chromothriptic events. WGS of 22 EAC cases show that catastrophes may lead to oncogene amplification through chromothripsis-derived double-minute chromosome formation (MYC and MDM2) or breakage-fusion-bridge (KRAS, MDM2 and RFC3). Telomere shortening is more prominent in EACs bearing localized complex rearrangements. Mutational signature analysis also confirms that extreme genomic instability in EAC can be driven by somatic BRCA2 mutations. These findings suggest that genomic catastrophes have a significant role in the malignant transformation of EAC. PMID:25351503
McNeal, Joel R; Arumugunathan, Kathiravetpilla; Kuehl, Jennifer V; Boore, Jeffrey L; Depamphilis, Claude W
2007-12-13
The genus Cuscuta L. (Convolvulaceae), commonly known as dodders, are epiphytic vines that invade the stems of their host with haustorial feeding structures at the points of contact. Although they lack expanded leaves, some species are noticeably chlorophyllous, especially as seedlings and in maturing fruits. Some species are reported as crop pests of worldwide distribution, whereas others are extremely rare and have local distributions and apparent niche specificity. A strong phylogenetic framework for this large genus is essential to understand the interesting ecological, morphological and molecular phenomena that occur within these parasites in an evolutionary context. Here we present a well-supported phylogeny of Cuscuta using sequences of the nuclear ribosomal internal transcribed spacer and plastid rps2, rbcL and matK from representatives across most of the taxonomic diversity of the genus. We use the phylogeny to interpret morphological and plastid genome evolution within the genus. At least three currently recognized taxonomic sections are not monophyletic and subgenus Cuscuta is unequivocally paraphyletic. Plastid genes are extremely variable with regards to evolutionary constraint, with rbcL exhibiting even higher levels of purifying selection in Cuscuta than photosynthetic relatives. Nuclear genome size is highly variable within Cuscuta, particularly within subgenus Grammica, and in some cases may indicate the existence of cryptic species in this large clade of morphologically similar species. Some morphological characters traditionally used to define major taxonomic splits within Cuscuta are homoplastic and are of limited use in defining true evolutionary groups. Chloroplast genome evolution seems to have evolved in a punctuated fashion, with episodes of loss involving suites of genes or tRNAs followed by stabilization of gene content in major clades. Nearly all species of Cuscuta retain some photosynthetic ability, most likely for nutrient apportionment to their seeds, while complete loss of photosynthesis and possible loss of the entire chloroplast genome is limited to a single small clade of outcrossing species found primarily in western South America.
McNeal, Joel R; Arumugunathan, Kathiravetpilla; Kuehl, Jennifer V; Boore, Jeffrey L; dePamphilis, Claude W
2007-01-01
Background The genus Cuscuta L. (Convolvulaceae), commonly known as dodders, are epiphytic vines that invade the stems of their host with haustorial feeding structures at the points of contact. Although they lack expanded leaves, some species are noticeably chlorophyllous, especially as seedlings and in maturing fruits. Some species are reported as crop pests of worldwide distribution, whereas others are extremely rare and have local distributions and apparent niche specificity. A strong phylogenetic framework for this large genus is essential to understand the interesting ecological, morphological and molecular phenomena that occur within these parasites in an evolutionary context. Results Here we present a well-supported phylogeny of Cuscuta using sequences of the nuclear ribosomal internal transcribed spacer and plastid rps2, rbcL and matK from representatives across most of the taxonomic diversity of the genus. We use the phylogeny to interpret morphological and plastid genome evolution within the genus. At least three currently recognized taxonomic sections are not monophyletic and subgenus Cuscuta is unequivocally paraphyletic. Plastid genes are extremely variable with regards to evolutionary constraint, with rbcL exhibiting even higher levels of purifying selection in Cuscuta than photosynthetic relatives. Nuclear genome size is highly variable within Cuscuta, particularly within subgenus Grammica, and in some cases may indicate the existence of cryptic species in this large clade of morphologically similar species. Conclusion Some morphological characters traditionally used to define major taxonomic splits within Cuscuta are homoplastic and are of limited use in defining true evolutionary groups. Chloroplast genome evolution seems to have evolved in a punctuated fashion, with episodes of loss involving suites of genes or tRNAs followed by stabilization of gene content in major clades. Nearly all species of Cuscuta retain some photosynthetic ability, most likely for nutrient apportionment to their seeds, while complete loss of photosynthesis and possible loss of the entire chloroplast genome is limited to a single small clade of outcrossing species found primarily in western South America. PMID:18078516
The Small Nuclear Genomes of Selaginella Are Associated with a Low Rate of Genome Size Evolution.
Baniaga, Anthony E; Arrigo, Nils; Barker, Michael S
2016-06-03
The haploid nuclear genome size (1C DNA) of vascular land plants varies over several orders of magnitude. Much of this observed diversity in genome size is due to the proliferation and deletion of transposable elements. To date, all vascular land plant lineages with extremely small nuclear genomes represent recently derived states, having ancestors with much larger genome sizes. The Selaginellaceae represent an ancient lineage with extremely small genomes. It is unclear how small nuclear genomes evolved in Selaginella We compared the rates of nuclear genome size evolution in Selaginella and major vascular plant clades in a comparative phylogenetic framework. For the analyses, we collected 29 new flow cytometry estimates of haploid genome size in Selaginella to augment publicly available data. Selaginella possess some of the smallest known haploid nuclear genome sizes, as well as the lowest rate of genome size evolution observed across all vascular land plants included in our analyses. Additionally, our analyses provide strong support for a history of haploid nuclear genome size stasis in Selaginella Our results indicate that Selaginella, similar to other early diverging lineages of vascular land plants, has relatively low rates of genome size evolution. Further, our analyses highlight that a rapid transition to a small genome size is only one route to an extremely small genome. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Valdes, Jorge; Ossandon, Francisco; Quatrini, Raquel; Dopson, Mark; Holmes, David S
2011-12-01
Acidithiobacillus thiooxidans is a mesophilic, extremely acidophilic, chemolithoautotrophic gammaproteobacterium that derives energy from the oxidation of sulfur and inorganic sulfur compounds. Here we present the draft genome sequence of A. thiooxidans ATCC 19377, which has allowed the identification of genes for survival and colonization of extremely acidic environments.
Degenerative minimalism in the genome of a psyllid endosymbiont.
Clark, M A; Baumann, L; Thao, M L; Moran, N A; Baumann, P
2001-03-01
Psyllids, like aphids, feed on plant phloem sap and are obligately associated with prokaryotic endosymbionts acquired through vertical transmission from an ancestral infection. We have sequenced 37 kb of DNA of the genome of Carsonella ruddii, the endosymbiont of psyllids, and found that it has a number of unusual properties revealing a more extreme case of degeneration than was previously reported from studies of eubacterial genomes, including that of the aphid endosymbiont Buchnera aphidicola. Among the unusual properties are an exceptionally low guanine-plus-cytosine content (19.9%), almost complete absence of intergenic spaces, operon fusion, and lack of the usual promoter sequences upstream of 16S rDNA. These features suggest the synthesis of long mRNAs and translational coupling. The most extreme instances of base compositional bias occur in the genes encoding proteins that have less highly conserved amino acid sequences; the guanine-plus-cytosine content of some protein-coding sequences is as low as 10%. The shift in base composition has a large effect on proteins: in polypeptides of C. ruddii, half of the residues consist of five amino acids with codons low in guanine plus cytosine. Furthermore, the proteins of C. ruddii are reduced in size, with an average of about 9% fewer amino acids than in homologous proteins of related bacteria. These observations suggest that the C. ruddii genome is not subject to constraints that limit the evolution of other known eubacteria.
Berndt, Sonja I; Gustafsson, Stefan; Mägi, Reedik; Ganna, Andrea; Wheeler, Eleanor; Feitosa, Mary F; Justice, Anne E; Monda, Keri L; Croteau-Chonka, Damien C; Day, Felix R; Esko, Tõnu; Fall, Tove; Ferreira, Teresa; Gentilini, Davide; Jackson, Anne U; Luan, Jian'an; Randall, Joshua C; Vedantam, Sailaja; Willer, Cristen J; Winkler, Thomas W; Wood, Andrew R; Workalemahu, Tsegaselassie; Hu, Yi-Juan; Lee, Sang Hong; Liang, Liming; Lin, Dan-Yu; Min, Josine L; Neale, Benjamin M; Thorleifsson, Gudmar; Yang, Jian; Albrecht, Eva; Amin, Najaf; Bragg-Gresham, Jennifer L; Cadby, Gemma; den Heijer, Martin; Eklund, Niina; Fischer, Krista; Goel, Anuj; Hottenga, Jouke-Jan; Huffman, Jennifer E; Jarick, Ivonne; Johansson, Åsa; Johnson, Toby; Kanoni, Stavroula; Kleber, Marcus E; König, Inke R; Kristiansson, Kati; Kutalik, Zoltán; Lamina, Claudia; Lecoeur, Cecile; Li, Guo; Mangino, Massimo; McArdle, Wendy L; Medina-Gomez, Carolina; Müller-Nurasyid, Martina; Ngwa, Julius S; Nolte, Ilja M; Paternoster, Lavinia; Pechlivanis, Sonali; Perola, Markus; Peters, Marjolein J; Preuss, Michael; Rose, Lynda M; Shi, Jianxin; Shungin, Dmitry; Smith, Albert Vernon; Strawbridge, Rona J; Surakka, Ida; Teumer, Alexander; Trip, Mieke D; Tyrer, Jonathan; Van Vliet-Ostaptchouk, Jana V; Vandenput, Liesbeth; Waite, Lindsay L; Zhao, Jing Hua; Absher, Devin; Asselbergs, Folkert W; Atalay, Mustafa; Attwood, Antony P; Balmforth, Anthony J; Basart, Hanneke; Beilby, John; Bonnycastle, Lori L; Brambilla, Paolo; Bruinenberg, Marcel; Campbell, Harry; Chasman, Daniel I; Chines, Peter S; Collins, Francis S; Connell, John M; Cookson, William O; de Faire, Ulf; de Vegt, Femmie; Dei, Mariano; Dimitriou, Maria; Edkins, Sarah; Estrada, Karol; Evans, David M; Farrall, Martin; Ferrario, Marco M; Ferrières, Jean; Franke, Lude; Frau, Francesca; Gejman, Pablo V; Grallert, Harald; Grönberg, Henrik; Gudnason, Vilmundur; Hall, Alistair S; Hall, Per; Hartikainen, Anna-Liisa; Hayward, Caroline; Heard-Costa, Nancy L; Heath, Andrew C; Hebebrand, Johannes; Homuth, Georg; Hu, Frank B; Hunt, Sarah E; Hyppönen, Elina; Iribarren, Carlos; Jacobs, Kevin B; Jansson, John-Olov; Jula, Antti; Kähönen, Mika; Kathiresan, Sekar; Kee, Frank; Khaw, Kay-Tee; Kivimäki, Mika; Koenig, Wolfgang; Kraja, Aldi T; Kumari, Meena; Kuulasmaa, Kari; Kuusisto, Johanna; Laitinen, Jaana H; Lakka, Timo A; Langenberg, Claudia; Launer, Lenore J; Lind, Lars; Lindström, Jaana; Liu, Jianjun; Liuzzi, Antonio; Lokki, Marja-Liisa; Lorentzon, Mattias; Madden, Pamela A; Magnusson, Patrik K; Manunta, Paolo; Marek, Diana; März, Winfried; Mateo Leach, Irene; McKnight, Barbara; Medland, Sarah E; Mihailov, Evelin; Milani, Lili; Montgomery, Grant W; Mooser, Vincent; Mühleisen, Thomas W; Munroe, Patricia B; Musk, Arthur W; Narisu, Narisu; Navis, Gerjan; Nicholson, George; Nohr, Ellen A; Ong, Ken K; Oostra, Ben A; Palmer, Colin N A; Palotie, Aarno; Peden, John F; Pedersen, Nancy; Peters, Annette; Polasek, Ozren; Pouta, Anneli; Pramstaller, Peter P; Prokopenko, Inga; Pütter, Carolin; Radhakrishnan, Aparna; Raitakari, Olli; Rendon, Augusto; Rivadeneira, Fernando; Rudan, Igor; Saaristo, Timo E; Sambrook, Jennifer G; Sanders, Alan R; Sanna, Serena; Saramies, Jouko; Schipf, Sabine; Schreiber, Stefan; Schunkert, Heribert; Shin, So-Youn; Signorini, Stefano; Sinisalo, Juha; Skrobek, Boris; Soranzo, Nicole; Stančáková, Alena; Stark, Klaus; Stephens, Jonathan C; Stirrups, Kathleen; Stolk, Ronald P; Stumvoll, Michael; Swift, Amy J; Theodoraki, Eirini V; Thorand, Barbara; Tregouet, David-Alexandre; Tremoli, Elena; Van der Klauw, Melanie M; van Meurs, Joyce B J; Vermeulen, Sita H; Viikari, Jorma; Virtamo, Jarmo; Vitart, Veronique; Waeber, Gérard; Wang, Zhaoming; Widén, Elisabeth; Wild, Sarah H; Willemsen, Gonneke; Winkelmann, Bernhard R; Witteman, Jacqueline C M; Wolffenbuttel, Bruce H R; Wong, Andrew; Wright, Alan F; Zillikens, M Carola; Amouyel, Philippe; Boehm, Bernhard O; Boerwinkle, Eric; Boomsma, Dorret I; Caulfield, Mark J; Chanock, Stephen J; Cupples, L Adrienne; Cusi, Daniele; Dedoussis, George V; Erdmann, Jeanette; Eriksson, Johan G; Franks, Paul W; Froguel, Philippe; Gieger, Christian; Gyllensten, Ulf; Hamsten, Anders; Harris, Tamara B; Hengstenberg, Christian; Hicks, Andrew A; Hingorani, Aroon; Hinney, Anke; Hofman, Albert; Hovingh, Kees G; Hveem, Kristian; Illig, Thomas; Jarvelin, Marjo-Riitta; Jöckel, Karl-Heinz; Keinanen-Kiukaanniemi, Sirkka M; Kiemeney, Lambertus A; Kuh, Diana; Laakso, Markku; Lehtimäki, Terho; Levinson, Douglas F; Martin, Nicholas G; Metspalu, Andres; Morris, Andrew D; Nieminen, Markku S; Njølstad, Inger; Ohlsson, Claes; Oldehinkel, Albertine J; Ouwehand, Willem H; Palmer, Lyle J; Penninx, Brenda; Power, Chris; Province, Michael A; Psaty, Bruce M; Qi, Lu; Rauramaa, Rainer; Ridker, Paul M; Ripatti, Samuli; Salomaa, Veikko; Samani, Nilesh J; Snieder, Harold; Sørensen, Thorkild I A; Spector, Timothy D; Stefansson, Kari; Tönjes, Anke; Tuomilehto, Jaakko; Uitterlinden, André G; Uusitupa, Matti; van der Harst, Pim; Vollenweider, Peter; Wallaschofski, Henri; Wareham, Nicholas J; Watkins, Hugh; Wichmann, H-Erich; Wilson, James F; Abecasis, Goncalo R; Assimes, Themistocles L; Barroso, Inês; Boehnke, Michael; Borecki, Ingrid B; Deloukas, Panos; Fox, Caroline S; Frayling, Timothy; Groop, Leif C; Haritunian, Talin; Heid, Iris M; Hunter, David; Kaplan, Robert C; Karpe, Fredrik; Moffatt, Miriam F; Mohlke, Karen L; O'Connell, Jeffrey R; Pawitan, Yudi; Schadt, Eric E; Schlessinger, David; Steinthorsdottir, Valgerdur; Strachan, David P; Thorsteinsdottir, Unnur; van Duijn, Cornelia M; Visscher, Peter M; Di Blasio, Anna Maria; Hirschhorn, Joel N; Lindgren, Cecilia M; Morris, Andrew P; Meyre, David; Scherag, André; McCarthy, Mark I; Speliotes, Elizabeth K; North, Kari E; Loos, Ruth J F; Ingelsson, Erik
2013-05-01
Approaches exploiting trait distribution extremes may be used to identify loci associated with common traits, but it is unknown whether these loci are generalizable to the broader population. In a genome-wide search for loci associated with the upper versus the lower 5th percentiles of body mass index, height and waist-to-hip ratio, as well as clinical classes of obesity, including up to 263,407 individuals of European ancestry, we identified 4 new loci (IGFBP4, H6PD, RSRC1 and PPP2R2A) influencing height detected in the distribution tails and 7 new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3 and ZZZ3) for clinical classes of obesity. Further, we find a large overlap in genetic structure and the distribution of variants between traits based on extremes and the general population and little etiological heterogeneity between obesity subgroups.
Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.
Kelley, Joanna L; Peyton, Justin T; Fiston-Lavier, Anna-Sophie; Teets, Nicholas M; Yee, Muh-Ching; Johnston, J Spencer; Bustamante, Carlos D; Lee, Richard E; Denlinger, David L
2014-08-12
The midge, Belgica antarctica, is the only insect endemic to Antarctica, and thus it offers a powerful model for probing responses to extreme temperatures, freeze tolerance, dehydration, osmotic stress, ultraviolet radiation and other forms of environmental stress. Here we present the first genome assembly of an extremophile, the first dipteran in the family Chironomidae, and the first Antarctic eukaryote to be sequenced. At 99 megabases, B. antarctica has the smallest insect genome sequenced thus far. Although it has a similar number of genes as other Diptera, the midge genome has very low repeat density and a reduction in intron length. Environmental extremes appear to constrain genome architecture, not gene content. The few transposable elements present are mainly ancient, inactive retroelements. An abundance of genes associated with development, regulation of metabolism and responses to external stimuli may reflect adaptations for surviving in this harsh environment.
Compact genome of the Antarctic midge is likely an adaptation to an extreme environment
Kelley, Joanna L.; Peyton, Justin T.; Fiston-Lavier, Anna-Sophie; Teets, Nicholas M.; Yee, Muh-Ching; Johnston, J. Spencer; Bustamante, Carlos D.; Lee, Richard E.; Denlinger, David L.
2014-01-01
The midge, Belgica antarctica, is the only insect endemic to Antarctica, and thus it offers a powerful model for probing responses to extreme temperatures, freeze tolerance, dehydration, osmotic stress, ultraviolet radiation and other forms of environmental stress. Here we present the first genome assembly of an extremophile, the first dipteran in the family Chironomidae, and the first Antarctic eukaryote to be sequenced. At 99 megabases, B. antarctica has the smallest insect genome sequenced thus far. Although it has a similar number of genes as other Diptera, the midge genome has very low repeat density and a reduction in intron length. Environmental extremes appear to constrain genome architecture, not gene content. The few transposable elements present are mainly ancient, inactive retroelements. An abundance of genes associated with development, regulation of metabolism and responses to external stimuli may reflect adaptations for surviving in this harsh environment. PMID:25118180
Kouvelis, Vassili N; Ghikas, Dimitri V; Typas, Milton A
2004-10-01
The mitochondrial genome (mtDNA) of the entomopathogenic fungus Lecanicillium muscarium (synonym Verticillium lecanii) with a total size of 24,499-bp has been analyzed. So far, it is the smallest known mitochondrial genome among Pezizomycotina, with an extremely compact gene organization and only one group-I intron in its large ribosomal RNA (rnl) gene. It contains the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, one intronic ORF coding for a possible ribosomal protein (rps), and a set of 25 tRNA genes which recognize codons for all amino acids, except alanine and cysteine. All genes are transcribed from the same DNA strand. Gene order comparison with all available complete fungal mtDNAs-representatives of all four Phyla are included-revealed some characteristic common features like uninterrupted gene pairs, overlapping genes, and extremely variable intergenic regions, that can all be exploited for the study of fungal mitochondrial genomes. Moreover, a minimum common mtDNA gene order could be detected, in two units, for all known Sordariomycetes namely nad1-nad4-atp8-atp6 and rns-cox3-rnl, which can be extended in Hypocreales, to nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 and rns-cox3-rnl nad2-nad3, respectively. Phylogenetic analysis of all fungal mtDNA essential protein-coding genes as one unit, clearly demonstrated the superiority of small genome (mtDNA) over single gene comparisons.
Yi, Xuan; Gao, Lei; Wang, Bo; Su, Ying-Juan; Wang, Ting
2013-01-01
We have determined the complete chloroplast (cp) genome sequence of Cephalotaxus oliveri. The genome is 134,337 bp in length, encodes 113 genes, and lacks inverted repeat (IR) regions. Genome-wide mutational dynamics have been investigated through comparative analysis of the cp genomes of C. oliveri and C. wilsoniana. Gene order transformation analyses indicate that when distinct isomers are considered as alternative structures for the ancestral cp genome of cupressophyte and Pinaceae lineages, it is not possible to distinguish between hypotheses favoring retention of the same IR region in cupressophyte and Pinaceae cp genomes from a hypothesis proposing independent loss of IRA and IRB. Furthermore, in cupressophyte cp genomes, the highly reduced IRs are replaced by short repeats that have the potential to mediate homologous recombination, analogous to the situation in Pinaceae. The importance of repeats in the mutational dynamics of cupressophyte cp genomes is also illustrated by the accD reading frame, which has undergone extreme length expansion in cupressophytes. This has been caused by a large insertion comprising multiple repeat sequences. Overall, we find that the distribution of repeats, indels, and substitutions is significantly correlated in Cephalotaxus cp genomes, consistent with a hypothesis that repeats play a role in inducing substitutions and indels in conifer cp genomes.
Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi
Dupont, Pierre-Yves; Cox, Murray P.
2017-01-01
Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported. PMID:28235827
LTR Retrotransposons Contribute to Genomic Gigantism in Plethodontid Salamanders
Sun, Cheng; Shepard, Donald B.; Chong, Rebecca A.; López Arriaza, José; Hall, Kathryn; Castoe, Todd A.; Feschotte, Cédric; Pollock, David D.; Mueller, Rachel Lockridge
2012-01-01
Among vertebrates, most of the largest genomes are found within the salamanders, a clade of amphibians that includes 613 species. Salamander genome sizes range from ∼14 to ∼120 Gb. Because genome size is correlated with nucleus and cell sizes, as well as other traits, morphological evolution in salamanders has been profoundly affected by genomic gigantism. However, the molecular mechanisms driving genomic expansion in this clade remain largely unknown. Here, we present the first comparative analysis of transposable element (TE) content in salamanders. Using high-throughput sequencing, we generated genomic shotgun data for six species from the Plethodontidae, the largest family of salamanders. We then developed a pipeline to mine TE sequences from shotgun data in taxa with limited genomic resources, such as salamanders. Our summaries of overall TE abundance and diversity for each species demonstrate that TEs make up a substantial portion of salamander genomes, and that all of the major known types of TEs are represented in salamanders. The most abundant TE superfamilies found in the genomes of our six focal species are similar, despite substantial variation in genome size. However, our results demonstrate a major difference between salamanders and other vertebrates: salamander genomes contain much larger amounts of long terminal repeat (LTR) retrotransposons, primarily Ty3/gypsy elements. Thus, the extreme increase in genome size that occurred in salamanders was likely accompanied by a shift in TE landscape. These results suggest that increased proliferation of LTR retrotransposons was a major molecular mechanism contributing to genomic expansion in salamanders. PMID:22200636
Determinants of host species range in plant viruses.
Moury, Benoît; Fabre, Frédéric; Hébrard, Eugénie; Froissart, Rémy
2017-04-01
Prediction of pathogen emergence is an important field of research, both in human health and in agronomy. Most studies of pathogen emergence have focused on the ecological or anthropic factors involved rather than on the role of intrinsic pathogen properties. The capacity of pathogens to infect a large set of host species, i.e. to possess a large host range breadth (HRB), is tightly linked to their emergence propensity. Using an extensive plant virus database, we found that four traits related to virus genome or transmission properties were strongly and robustly linked to virus HRB. Broader host ranges were observed for viruses with single-stranded genomes, those with three genome segments and nematode-transmitted viruses. Also, two contrasted groups of seed-transmitted viruses were evidenced. Those with a single-stranded genome had larger HRB than non-seed-transmitted viruses, whereas those with a double-stranded genome (almost exclusively RNA) had an extremely small HRB. From the plant side, the family taxonomic rank appeared as a critical threshold for virus host range, with a highly significant increase in barriers to infection between plant families. Accordingly, the plant-virus infectivity matrix shows a dual structure pattern: a modular pattern mainly due to viruses specialized to infect plants of a given family and a nested pattern due to generalist viruses. These results contribute to a better prediction of virus host jumps and emergence risks.
Exploiting induced variation to dissect quantitative traits in barley.
Druka, Arnis; Franckowiak, Jerome; Lundqvist, Udda; Bonar, Nicola; Alexander, Jill; Guzy-Wrobelska, Justyna; Ramsay, Luke; Druka, Ilze; Grant, Iain; Macaulay, Malcolm; Vendramin, Vera; Shahinnia, Fahimeh; Radovic, Slobodanka; Houston, Kelly; Harrap, David; Cardle, Linda; Marshall, David; Morgante, Michele; Stein, Nils; Waugh, Robbie
2010-04-01
The identification of genes underlying complex quantitative traits such as grain yield by means of conventional genetic analysis (positional cloning) requires the development of several large mapping populations. However, it is possible that phenotypically related, but more extreme, allelic variants generated by mutational studies could provide a means for more efficient cloning of QTLs (quantitative trait loci). In barley (Hordeum vulgare), with the development of high-throughput genome analysis tools, efficient genome-wide identification of genetic loci harbouring mutant alleles has recently become possible. Genotypic data from NILs (near-isogenic lines) that carry induced or natural variants of genes that control aspects of plant development can be compared with the location of QTLs to potentially identify candidate genes for development--related traits such as grain yield. As yield itself can be divided into a number of allometric component traits such as tillers per plant, kernels per spike and kernel size, mutant alleles that both affect these traits and are located within the confidence intervals for major yield QTLs may represent extreme variants of the underlying genes. In addition, the development of detailed comparative genomic models based on the alignment of a high-density barley gene map with the rice and sorghum physical maps, has enabled an informed prioritization of 'known function' genes as candidates for both QTLs and induced mutant genes.
Scherag, André; Dina, Christian; Hinney, Anke; Vatin, Vincent; Scherag, Susann; Vogel, Carla I. G.; Müller, Timo D.; Grallert, Harald; Wichmann, H.-Erich; Balkau, Beverley; Heude, Barbara; Jarvelin, Marjo-Riitta; Hartikainen, Anna-Liisa; Levy-Marchal, Claire; Weill, Jacques; Delplanque, Jérôme; Körner, Antje; Kiess, Wieland; Kovacs, Peter; Rayner, Nigel W.; Prokopenko, Inga; McCarthy, Mark I.; Schäfer, Helmut; Jarick, Ivonne; Boeing, Heiner; Fisher, Eva; Reinehr, Thomas; Heinrich, Joachim; Rzehak, Peter; Berdel, Dietrich; Borte, Michael; Biebermann, Heike; Krude, Heiko; Rosskopf, Dieter; Rimmbach, Christian; Rief, Winfried; Fromme, Tobias; Klingenspor, Martin; Schürmann, Annette; Schulz, Nadja; Nöthen, Markus M.; Mühleisen, Thomas W.; Erbel, Raimund; Jöckel, Karl-Heinz; Moebus, Susanne; Boes, Tanja; Illig, Thomas; Froguel, Philippe; Hebebrand, Johannes; Meyre, David
2010-01-01
Meta-analyses of population-based genome-wide association studies (GWAS) in adults have recently led to the detection of new genetic loci for obesity. Here we aimed to discover additional obesity loci in extremely obese children and adolescents. We also investigated if these results generalize by estimating the effects of these obesity loci in adults and in population-based samples including both children and adults. We jointly analysed two GWAS of 2,258 individuals and followed-up the best, according to lowest p-values, 44 single nucleotide polymorphisms (SNP) from 21 genomic regions in 3,141 individuals. After this DISCOVERY step, we explored if the findings derived from the extremely obese children and adolescents (10 SNPs from 5 genomic regions) generalized to (i) the population level and (ii) to adults by genotyping another 31,182 individuals (GENERALIZATION step). Apart from previously identified FTO, MC4R, and TMEM18, we detected two new loci for obesity: one in SDCCAG8 (serologically defined colon cancer antigen 8 gene; p = 1.85×10−8 in the DISCOVERY step) and one between TNKS (tankyrase, TRF1-interacting ankyrin-related ADP-ribose polymerase gene) and MSRA (methionine sulfoxide reductase A gene; p = 4.84×10−7), the latter finding being limited to children and adolescents as demonstrated in the GENERALIZATION step. The odds ratios for early-onset obesity were estimated at ∼1.10 per risk allele for both loci. Interestingly, the TNKS/MSRA locus has recently been found to be associated with adult waist circumference. In summary, we have completed a meta-analysis of two GWAS which both focus on extremely obese children and adolescents and replicated our findings in a large followed-up data set. We observed that genetic variants in or near FTO, MC4R, TMEM18, SDCCAG8, and TNKS/MSRA were robustly associated with early-onset obesity. We conclude that the currently known major common variants related to obesity overlap to a substantial degree between children and adults. PMID:20421936
Xiang, Yang; Lu, Kewei; James, Stephen L.; Borlawsky, Tara B.; Huang, Kun; Payne, Philip R.O.
2011-01-01
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. PMID:22154838
Xiang, Yang; Lu, Kewei; James, Stephen L; Borlawsky, Tara B; Huang, Kun; Payne, Philip R O
2012-04-01
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications. Copyright © 2011 Elsevier Inc. All rights reserved.
Ma, Sanyuan; Shi, Run; Wang, Xiaogang; Liu, Yuanyuan; Chang, Jiasong; Gao, Jie; Lu, Wei; Zhang, Jianduo; Zhao, Ping; Xia, Qingyou
2014-01-01
Evolution has produced some remarkable creatures, of which silk gland is a fascinating organ that exists in a variety of insects and almost half of the 34,000 spider species. The impressive ability to secrete huge amount of pure silk protein, and to store proteins at an extremely high concentration (up to 25%) make the silk gland of Bombyx mori hold great promise to be a cost-effective platform for production of recombinant proteins. However, the extremely low production yields of the numerous reported expression systems greatly hindered the exploration and application of silk gland bioreactors. Using customized zinc finger nucleases (ZFN), we successfully performed genome editing of Bmfib-H gene, which encodes the largest and most abundant silk protein, in B. mori with efficiency higher than any previously reported. The resulted Bmfib-H knocked-out B. mori showed a smaller and empty silk gland, abnormally developed posterior silk gland cells, an extremely thin cocoon that contain only sericin proteins, and a slightly heavier pupae. We also showed that removal of endogenous Bmfib-H protein could significantly increase the expression level of exogenous protein. Furthermore, we demonstrated that the bioreactor is suitable for large scale production of protein-based materials. PMID:25359576
Genomics and Metagenomics of Extreme Acidophiles in Biomining Environments
NASA Astrophysics Data System (ADS)
Holmes, D. S.
2015-12-01
Over 160 draft or complete genomes of extreme acidophiles (pH < 3) have been published, many of which are from bioleaching and other biomining environments, or are closely related to such microorganisms. In addition, there are over 20 metagenomic studies of such environments. This provides a rich source of latent data that can be exploited for understanding the biology of biomining environments and for advancing biotechnological applications. Genomic and metagenomic data are already yielding valuable insights into cellular processes, including carbon and nitrogen management, heavy metal and acid resistance, iron and sulfur oxido-reduction, linking biogeochemical processes to organismal physiology. The data also allow the construction of useful models of the ecophysiology of biomining environments and provide insight into the gene and genome evolution of extreme acidophiles. Additionally, since most of these acidophiles are also chemoautolithotrophs that use minerals as energy sources or electron sinks, their genomes can be plundered for clues about the evolution of cellular metabolism and bioenergetic pathways during the Archaean abiotic/biotic transition on early Earth. Acknowledgements: Fondecyt 1130683.
Degenerative Minimalism in the Genome of a Psyllid Endosymbiont
Clark, Marta A.; Baumann, Linda; Thao, MyLo Ly; Moran, Nancy A.; Baumann, Paul
2001-01-01
Psyllids, like aphids, feed on plant phloem sap and are obligately associated with prokaryotic endosymbionts acquired through vertical transmission from an ancestral infection. We have sequenced 37 kb of DNA of the genome of Carsonella ruddii, the endosymbiont of psyllids, and found that it has a number of unusual properties revealing a more extreme case of degeneration than was previously reported from studies of eubacterial genomes, including that of the aphid endosymbiont Buchnera aphidicola. Among the unusual properties are an exceptionally low guanine-plus-cytosine content (19.9%), almost complete absence of intergenic spaces, operon fusion, and lack of the usual promoter sequences upstream of 16S rDNA. These features suggest the synthesis of long mRNAs and translational coupling. The most extreme instances of base compositional bias occur in the genes encoding proteins that have less highly conserved amino acid sequences; the guanine-plus-cytosine content of some protein-coding sequences is as low as 10%. The shift in base composition has a large effect on proteins: in polypeptides of C. ruddii, half of the residues consist of five amino acids with codons low in guanine plus cytosine. Furthermore, the proteins of C. ruddii are reduced in size, with an average of about 9% fewer amino acids than in homologous proteins of related bacteria. These observations suggest that the C. ruddii genome is not subject to constraints that limit the evolution of other known eubacteria. PMID:11222582
New Markers for Predicting Fertility of the Male Gametes in the Post Genomic Age.
Dipresa, Savina; De Toni, Luca; Foresta, Carlo; Garolla, Andrea
2018-04-18
A number of test have been proposed to assess male fertility potential, ranging from routine testing by light microscopic method for evaluating semen samples, to screening test for DNA integrity aimed to look at sperm chromatin abnormalities. Spermatozoa are an extremely differentiated cell, they have critical functions for embryo development and heredity, in addiction to delivering a haploid paternal genome to the oocyte. Towards this goal certain requirements must always be met. The ability of spermatozoa to perform its reproductive function taking place in the spermatogenesis, a highly specialized process depending on multiple factors with effect on male fertility. In the past 30 years, large-scale analyses of transcriptomic and genome expression in mammals have generated a large amount of informations on numberless biomolecules involved in spermatogenesis and male germ cell reproductive function. Sperm proteome represents the protein content that spermatozoa needs to survive and work correctly and modifications of sperm proteome play a role in determining functional changes leading to a decrease of reproductive competence into affected spermatozoa. The post-genomic approach consists of different methodologies for concurrently testicular transcriptome studies, protein compositional analysis and metabolomics findings of the spermatozoa in humans. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Marín, C; Dollet, M; Pagès, M; Bastien, P
2009-03-01
All currently known plant trypanosomes have been grouped in the genus Phytomonas spp., although they can differ greatly in terms of both their biological properties and effects upon the host. Those parasitizing the phloem sap are specifically associated with lethal syndromes in Latin America, such as, phloem necrosis of coffee, 'Hartrot' of coconut and 'Marchitez sorpresiva' of oil palm, that inflict considerable economic losses in endemic countries. The genomic organization of one group of Phytomonas (D) considered as representative of the genus has been published previously. The present work presents the genomic structure of two representative isolates from the pathogenic phloem-restricted group (H) of Phytomonas, analyzed by pulsed field gel electrophoresis followed by hybridization with chromosome-specific DNA markers. It came as a surprise to observe an extremely different genomic organization in this group as compared with that of group D. Most notably, the chromosome number is 7 in this group (with a genome size of 10 Mb) versus 21 in the group D (totalling 25 Mb). These data unravel an unsuspected genomic diversity within plant trypanosomatids, that may justify a further debate about their division into different genera.
Wernegreen, Jennifer J
2017-09-15
Ancient associations between insects and bacteria provide models to study intimate host-microbe interactions. Currently, a wealth of genome sequence data for long-term, obligately intracellular (primary) endosymbionts of insects reveals profound genomic consequences of this specialized bacterial lifestyle. Those consequences include severe genome reduction and extreme base compositions. This minireview highlights the utility of genome sequence data to understand how, and why, endosymbionts have been pushed to such extremes, and to illuminate the functional consequences of such extensive genome change. While the static snapshots provided by individual endosymbiont genomes are valuable, comparative analyses of multiple genomes have shed light on evolutionary mechanisms. Namely, genome comparisons have told us that selection is important in fine-tuning gene content, but at the same time, mutational pressure and genetic drift contribute to genome degradation. Examples from Blochmannia, the primary endosymbiont of the ant tribe Camponotini, illustrate the value and constraints of genome sequence data, and exemplify how genomes can serve as a springboard for further comparative and experimental inquiry. Copyright © 2017. Published by Elsevier Inc.
HITS-CLIP yields genome-wide insights into brain alternative RNA processing
NASA Astrophysics Data System (ADS)
Licatalosi, Donny D.; Mele, Aldo; Fak, John J.; Ule, Jernej; Kayikci, Melis; Chi, Sung Wook; Clark, Tyson A.; Schweitzer, Anthony C.; Blume, John E.; Wang, Xuning; Darnell, Jennifer C.; Darnell, Robert B.
2008-11-01
Protein-RNA interactions have critical roles in all aspects of gene expression. However, applying biochemical methods to understand such interactions in living tissues has been challenging. Here we develop a genome-wide means of mapping protein-RNA binding sites in vivo, by high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP). HITS-CLIP analysis of the neuron-specific splicing factor Nova revealed extremely reproducible RNA-binding maps in multiple mouse brains. These maps provide genome-wide in vivo biochemical footprints confirming the previous prediction that the position of Nova binding determines the outcome of alternative splicing; moreover, they are sufficiently powerful to predict Nova action de novo. HITS-CLIP revealed a large number of Nova-RNA interactions in 3' untranslated regions, leading to the discovery that Nova regulates alternative polyadenylation in the brain. HITS-CLIP, therefore, provides a robust, unbiased means to identify functional protein-RNA interactions in vivo.
Extraction of genomic DNA from yeasts for PCR-based applications.
Lõoke, Marko; Kristjuhan, Kersti; Kristjuhan, Arnold
2011-05-01
We have developed a quick and low-cost genomic DNA extraction protocol from yeast cells for PCR-based applications. This method does not require any enzymes, hazardous chemicals, or extreme temperatures, and is especially powerful for simultaneous analysis of a large number of samples. DNA can be efficiently extracted from different yeast species (Kluyveromyces lactis, Hansenula polymorpha, Schizosaccharomyces pombe, Candida albicans, Pichia pastoris, and Saccharomyces cerevisiae). The protocol involves lysis of yeast colonies or cells from liquid culture in a lithium acetate (LiOAc)-SDS solution and subsequent precipitation of DNA with ethanol. Approximately 100 nanograms of total genomic DNA can be extracted from 1 × 10(7) cells. DNA extracted by this method is suitable for a variety of PCR-based applications (including colony PCR, real-time qPCR, and DNA sequencing) for amplification of DNA fragments of ≤ 3500 bp.
Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome.
Nicholson, Matthew J; Theodorou, Michael K; Brookman, Jayne L
2005-01-01
The anaerobic gut fungi occupy a unique niche in the intestinal tract of large herbivorous animals and are thought to act as primary colonizers of plant material during digestion. They are the only known obligately anaerobic fungi but molecular analysis of this group has been hampered by difficulties in their culture and manipulation, and by their extremely high A+T nucleotide content. This study begins to answer some of the fundamental questions about the structure and organization of the anaerobic gut fungal genome. Directed plasmid libraries using genomic DNA digested with highly or moderately rich AT-specific restriction enzymes (VspI and EcoRI) were prepared from a polycentric Orpinomyces isolate. Clones were sequenced from these libraries and the breadth of genomic inserts, both genic and intergenic, was characterized. Genes encoding numerous functions not previously characterized for these fungi were identified, including cytoskeletal, secretory pathway and transporter genes. A peptidase gene with no introns and having sequence similarity to a gene encoding a bacterial peptidase was also identified, extending the range of metabolic enzymes resulting from apparent trans-kingdom transfer from bacteria to fungi, as previously characterized largely for genes encoding plant-degrading enzymes. This paper presents the first thorough analysis of the genic, intergenic and rDNA regions of a variety of genomic segments from an anaerobic gut fungus and provides observations on rules governing intron boundaries, the codon biases observed with different types of genes, and the sequence of only the second anaerobic gut fungal promoter reported. Large numbers of retrotransposon sequences of different types were found and the authors speculate on the possible consequences of any such transposon activity in the genome. The coding sequences identified included several orphan gene sequences, including one with regions strongly suggestive of structural proteins such as collagens and lampirin. This gene was present as a single copy in Orpinomyces, was expressed during vegetative growth and was also detected in genomes from another gut fungal genus, Neocallimastix.
Lefébure, Tristan; Stanhope, Michael J
2007-01-01
Background The genus Streptococcus is one of the most diverse and important human and agricultural pathogens. This study employs comparative evolutionary analyses of 26 Streptococcus genomes to yield an improved understanding of the relative roles of recombination and positive selection in pathogen adaptation to their hosts. Results Streptococcus genomes exhibit extreme levels of evolutionary plasticity, with high levels of gene gain and loss during species and strain evolution. S. agalactiae has a large pan-genome, with little recombination in its core-genome, while S. pyogenes has a smaller pan-genome and much more recombination of its core-genome, perhaps reflecting the greater habitat, and gene pool, diversity for S. agalactiae compared to S. pyogenes. Core-genome recombination was evident in all lineages (18% to 37% of the core-genome judged to be recombinant), while positive selection was mainly observed during species differentiation (from 11% to 34% of the core-genome). Positive selection pressure was unevenly distributed across lineages and biochemical main role categories. S. suis was the lineage with the greatest level of positive selection pressure, the largest number of unique loci selected, and the largest amount of gene gain and loss. Conclusion Recombination is an important evolutionary force in shaping Streptococcus genomes, not only in the acquisition of significant portions of the genome as lineage specific loci, but also in facilitating rapid evolution of the core-genome. Positive selection, although undoubtedly a slower process, has nonetheless played an important role in adaptation of the core-genome of different Streptococcus species to different hosts. PMID:17475002
Takeuchi, Fumihiko; Watanabe, Shinya; Baba, Tadashi; Yuzawa, Harumi; Ito, Teruyo; Morimoto, Yuh; Kuroda, Makoto; Cui, Longzhu; Takahashi, Mikio; Ankai, Akiho; Baba, Shin-ichi; Fukui, Shigehiro; Lee, Jean C; Hiramatsu, Keiichi
2005-11-01
Staphylococcus haemolyticus is an opportunistic bacterial pathogen that colonizes human skin and is remarkable for its highly antibiotic-resistant phenotype. We determined the complete genome sequence of S.haemolyticus to better understand its pathogenicity and evolutionary relatedness to the other staphylococcal species. A large proportion of the open reading frames in the genomes of S.haemolyticus, Staphylococcus aureus, and Staphylococcus epidermidis were conserved in their sequence and order on the chromosome. We identified a region of the bacterial chromosome just downstream of the origin of replication that showed little homology among the species but was conserved among strains within a species. This novel region, designated the "oriC environ," likely contributes to the evolution and differentiation of the staphylococcal species, since it was enriched for species-specific nonessential genes that contribute to the biological features of each staphylococcal species. A comparative analysis of the genomes of S.haemolyticus, S.aureus, and S.epidermidis elucidated differences in their biological and genetic characteristics and pathogenic potentials. We identified as many as 82 insertion sequences in the S.haemolyticus chromosome that probably mediated frequent genomic rearrangements, resulting in phenotypic diversification of the strain. Such rearrangements could have brought genomic plasticity to this species and contributed to its acquisition of antibiotic resistance.
Machado, Lilian de Oliveira; Vieira, Leila do Nascimento; Stefenon, Valdir Marcos; Oliveira Pedrosa, Fábio de; Souza, Emanuel Maltempi de; Guerra, Miguel Pedro; Nodari, Rubens Onofre
2017-04-01
Given their distribution, importance, and richness, Myrtaceae species comprise a model system for studying the evolution of tropical plant diversity. In addition, chloroplast (cp) genome sequencing is an efficient tool for phylogenetic relationship studies. Feijoa [Acca sellowiana (O. Berg) Burret; CN: pineapple-guava] is a Myrtaceae species that occurs naturally in southern Brazil and northern Uruguay. Feijoa is known for its exquisite perfume and flavorful fruits, pharmacological properties, ornamental value and increasing economic relevance. In the present work, we reported the complete cp genome of feijoa. The feijoa cp genome is a circular molecule of 159,370 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC 88,028 bp) and a Small Single Copy region (SSC 18,598 bp) separated by Inverted Repeat regions (IRs 26,372 bp). The genome structure, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. When compared to other cp genome sequences of Myrtaceae, feijoa showed closest relationship with pitanga (Eugenia uniflora L.). Furthermore, a comparison of pitanga synonymous (Ks) and nonsynonymous (Ka) substitution rates revealed extremely low values. Maximum Likelihood and Bayesian Inference analyses produced phylogenomic trees identical in topology. These trees supported monophyly of three Myrtoideae clades.
Chen, Fang; He, Jing; Zhang, Jianqi; Chen, Gary K.; Thomas, Venetta; Ambrosone, Christine B.; Bandera, Elisa V.; Berndt, Sonja I.; Bernstein, Leslie; Blot, William J.; Cai, Qiuyin; Carpten, John; Casey, Graham; Chanock, Stephen J.; Cheng, Iona; Chu, Lisa; Deming, Sandra L.; Driver, W. Ryan; Goodman, Phyllis; Hayes, Richard B.; Hennis, Anselm J. M.; Hsing, Ann W.; Hu, Jennifer J.; Ingles, Sue A.; John, Esther M.; Kittles, Rick A.; Kolb, Suzanne; Leske, M. Cristina; Monroe, Kristine R.; Murphy, Adam; Nemesure, Barbara; Neslund-Dudas, Christine; Nyante, Sarah; Ostrander, Elaine A; Press, Michael F.; Rodriguez-Gil, Jorge L.; Rybicki, Ben A.; Schumacher, Fredrick; Stanford, Janet L.; Signorello, Lisa B.; Strom, Sara S.; Stevens, Victoria; Van Den Berg, David; Wang, Zhaoming; Witte, John S.; Wu, Suh-Yuh; Yamamura, Yuko; Zheng, Wei; Ziegler, Regina G.; Stram, Alexander H.; Kolonel, Laurence N.; Marchand, Loïc Le; Henderson, Brian E.; Haiman, Christopher A.; Stram, Daniel O.
2015-01-01
Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious. PMID:26125186
Chen, Fang; He, Jing; Zhang, Jianqi; Chen, Gary K; Thomas, Venetta; Ambrosone, Christine B; Bandera, Elisa V; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Cai, Qiuyin; Carpten, John; Casey, Graham; Chanock, Stephen J; Cheng, Iona; Chu, Lisa; Deming, Sandra L; Driver, W Ryan; Goodman, Phyllis; Hayes, Richard B; Hennis, Anselm J M; Hsing, Ann W; Hu, Jennifer J; Ingles, Sue A; John, Esther M; Kittles, Rick A; Kolb, Suzanne; Leske, M Cristina; Millikan, Robert C; Monroe, Kristine R; Murphy, Adam; Nemesure, Barbara; Neslund-Dudas, Christine; Nyante, Sarah; Ostrander, Elaine A; Press, Michael F; Rodriguez-Gil, Jorge L; Rybicki, Ben A; Schumacher, Fredrick; Stanford, Janet L; Signorello, Lisa B; Strom, Sara S; Stevens, Victoria; Van Den Berg, David; Wang, Zhaoming; Witte, John S; Wu, Suh-Yuh; Yamamura, Yuko; Zheng, Wei; Ziegler, Regina G; Stram, Alexander H; Kolonel, Laurence N; Le Marchand, Loïc; Henderson, Brian E; Haiman, Christopher A; Stram, Daniel O
2015-01-01
Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.
A Pooled Sequencing Approach Identifies a Candidate Meiotic Driver in Drosophila
Wei, Kevin H.-C.; Reddy, Hemakumar M.; Rathnam, Chandramouli; Lee, Jimin; Lin, Deanna; Ji, Shuqing; Mason, James M.; Clark, Andrew G.; Barbash, Daniel A.
2017-01-01
Meiotic drive occurs when a selfish element increases its transmission frequency above the Mendelian ratio by hijacking the asymmetric divisions of female meiosis. Meiotic drive causes genomic conflict and potentially has a major impact on genome evolution, but only a few drive loci of large effect have been described. New methods to reliably detect meiotic drive are therefore needed, particularly for discovering moderate-strength drivers that are likely to be more prevalent in natural populations than strong drivers. Here, we report an efficient method that uses sequencing of large pools of backcross (BC1) progeny to test for deviations from Mendelian segregation genome-wide with single-nucleotide polymorphisms (SNPs) that distinguish the parental strains. We show that meiotic drive can be detected by a characteristic pattern of decay in distortion of SNP frequencies, caused by recombination unlinking the driver from distal loci. We further show that control crosses allow allele-frequency distortion caused by meiotic drive to be distinguished from distortion resulting from developmental effects. We used this approach to test whether chromosomes with extreme telomere-length differences segregate at Mendelian ratios, as telomeric regions are a potential hotspot for meiotic drive due to their roles in meiotic segregation and multiple observations of high rates of telomere sequence evolution. Using four different pairings of long and short telomere strains, we find no evidence that extreme telomere-length variation causes meiotic drive in Drosophila. However, we identify one candidate meiotic driver in a centromere-linked region that shows an ∼8% increase in transmission frequency, corresponding to a ∼54:46 segregation ratio. Our results show that candidate meiotic drivers of moderate strength can be readily detected and localized in pools of BC1 progeny. PMID:28258181
Gostinčar, Cene; Ohm, Robin A; Kogej, Tina; Sonjak, Silva; Turk, Martina; Zajc, Janja; Zalar, Polona; Grube, Martin; Sun, Hui; Han, James; Sharma, Aditi; Chiniquy, Jennifer; Ngan, Chew Yee; Lipzen, Anna; Barry, Kerrie; Grigoriev, Igor V; Gunde-Cimerman, Nina
2014-07-01
Aureobasidium pullulans is a black-yeast-like fungus used for production of the polysaccharide pullulan and the antimycotic aureobasidin A, and as a biocontrol agent in agriculture. It can cause opportunistic human infections, and it inhabits various extreme environments. To promote the understanding of these traits, we performed de-novo genome sequencing of the four varieties of A. pullulans. The 25.43-29.62 Mb genomes of these four varieties of A. pullulans encode between 10266 and 11866 predicted proteins. Their genomes encode most of the enzyme families involved in degradation of plant material and many sugar transporters, and they have genes possibly associated with degradation of plastic and aromatic compounds. Proteins believed to be involved in the synthesis of pullulan and siderophores, but not of aureobasidin A, are predicted. Putative stress-tolerance genes include several aquaporins and aquaglyceroporins, large numbers of alkali-metal cation transporters, genes for the synthesis of compatible solutes and melanin, all of the components of the high-osmolarity glycerol pathway, and bacteriorhodopsin-like proteins. All of these genomes contain a homothallic mating-type locus. The differences between these four varieties of A. pullulans are large enough to justify their redefinition as separate species: A. pullulans, A. melanogenum, A. subglaciale and A. namibiae. The redundancy observed in several gene families can be linked to the nutritional versatility of these species and their particular stress tolerance. The availability of the genome sequences of the four Aureobasidium species should improve their biotechnological exploitation and promote our understanding of their stress-tolerance mechanisms, diverse lifestyles, and pathogenic potential.
Insertion sequences enrichment in extreme Red sea brine pool vent.
Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania
2017-03-01
Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.
Diversity and Evolution in the Genome of Clostridium difficile
Knight, Daniel R.; Elliott, Briony; Chang, Barbara J.; Perkins, Timothy T.
2015-01-01
SUMMARY Clostridium difficile infection (CDI) is the leading cause of antimicrobial and health care-associated diarrhea in humans, presenting a significant burden to global health care systems. In the last 2 decades, PCR- and sequence-based techniques, particularly whole-genome sequencing (WGS), have significantly furthered our knowledge of the genetic diversity, evolution, epidemiology, and pathogenicity of this once enigmatic pathogen. C. difficile is taxonomically distinct from many other well-known clostridia, with a diverse population structure comprising hundreds of strain types spread across at least 6 phylogenetic clades. The C. difficile species is defined by a large diverse pangenome with extreme levels of evolutionary plasticity that has been shaped over long time periods by gene flux and recombination, often between divergent lineages. These evolutionary events are in response to environmental and anthropogenic activities and have led to the rapid emergence and worldwide dissemination of virulent clonal lineages. Moreover, genome analysis of large clinically relevant data sets has improved our understanding of CDI outbreaks, transmission, and recurrence. The epidemiology of CDI has changed dramatically over the last 15 years, and CDI may have a foodborne or zoonotic etiology. The WGS era promises to continue to redefine our view of this significant pathogen. PMID:26085550
Phylotranscriptomic consolidation of the jawed vertebrate timetree.
Irisarri, Iker; Baurain, Denis; Brinkmann, Henner; Delsuc, Frédéric; Sire, Jean-Yves; Kupfer, Alexander; Petersen, Jörn; Jarek, Michael; Meyer, Axel; Vences, Miguel; Philippe, Hervé
2017-09-01
Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.
Yang, Ji; Li, Wen-Rong; Lv, Feng-Hua; He, San-Gang; Tian, Shi-Lin; Peng, Wei-Feng; Sun, Ya-Wei; Zhao, Yong-Xin; Tu, Xiao-Long; Zhang, Min; Xie, Xing-Long; Wang, Yu-Tao; Li, Jin-Quan; Liu, Yong-Gang; Shen, Zhi-Qiang; Wang, Feng; Liu, Guang-Jian; Lu, Hong-Feng; Kantanen, Juha; Han, Jian-Lin; Li, Meng-Hua; Liu, Ming-Jun
2016-01-01
Global climate change has a significant effect on extreme environments and a profound influence on species survival. However, little is known of the genome-wide pattern of livestock adaptations to extreme environments over a short time frame following domestication. Sheep (Ovis aries) have become well adapted to a diverse range of agroecological zones, including certain extreme environments (e.g., plateaus and deserts), during their post-domestication (approximately 8–9 kya) migration and differentiation. Here, we generated whole-genome sequences from 77 native sheep, with an average effective sequencing depth of ∼5× for 75 samples and ∼42× for 2 samples. Comparative genomic analyses among sheep in contrasting environments, that is, plateau (>4,000 m above sea level) versus lowland (<100 m), high-altitude region (>1500 m) versus low-altitude region (<1300 m), desert (<10 mm average annual precipitation) versus highly humid region (>600 mm), and arid zone (<400 mm) versus humid zone (>400 mm), detected a novel set of candidate genes as well as pathways and GO categories that are putatively associated with hypoxia responses at high altitudes and water reabsorption in arid environments. In addition, candidate genes and GO terms functionally related to energy metabolism and body size variations were identified. This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change. PMID:27401233
The impact of rare variation on gene expression across tissues.
Li, Xin; Kim, Yungil; Tsang, Emily K; Davis, Joe R; Damani, Farhan N; Chiang, Colby; Hess, Gaelen T; Zappala, Zachary; Strober, Benjamin J; Scott, Alexandra J; Li, Amy; Ganna, Andrea; Bassik, Michael C; Merker, Jason D; Hall, Ira M; Battle, Alexis; Montgomery, Stephen B
2017-10-11
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Genome Wide Association Study of Sepsis in Extremely Premature Infants
Srinivasan, Lakshmi; Page, Grier; Kirpalani, Haresh; Murray, Jeffrey C.; Das, Abhik; Higgins, Rosemary D.; Carlo, Waldemar A.; Bell, Edward F.; Goldberg, Ronald N.; Schibler, Kurt; Sood, Beena G.; Stevenson, David K.; Stoll, Barbara J.; Van Meurs, Krisa P.; Johnson, Karen J.; Levy, Joshua; McDonald, Scott A.; Zaterka-Baxter, Kristin M.; Kennedy, Kathleen A.; Sánchez, Pablo J.; Duara, Shahnaz; Walsh, Michele C.; Shankaran, Seetha; Wynn, James L.; Cotten, C. Michael
2017-01-01
Objective To identify genetic variants associated with sepsis (early and late-onset) using a genome wide association (GWA) analysis in a cohort of extremely premature infants. Study Design Previously generated GWA data from the Neonatal Research Network’s anonymized genomic database biorepository of extremely premature infants were used for this study. Sepsis was defined as culture-positive early-onset or late-onset sepsis or culture-proven meningitis. Genomic and whole genome amplified DNA was genotyped for 1.2 million single nucleotide polymorphisms (SNPs); 91% of SNPs were successfully genotyped. We imputed 7.2 million additional SNPs. P values and false discovery rates were calculated from multivariate logistic regression analysis adjusting for gender, gestational age and ancestry. Target statistical value was p<10−5. Secondary analyses assessed associations of SNPs with pathogen type. Pathway analyses were also run on primary and secondary end points. Results Data from 757 extremely premature infants were included: 351 infants with sepsis and 406 infants without sepsis. No SNPs reached genome-wide significance levels (5×10−8); two SNPs in proximity to FOXC2 and FOXL1 genes achieved target levels of significance. In secondary analyses, SNPs for ELMO1, IRAK2 (Gram positive sepsis), RALA, IMMP2L (Gram negative sepsis) and PIEZO2 (fungal sepsis) met target significance levels. Pathways associated with sepsis and Gram negative sepsis included gap junctions, fibroblast growth factor receptors, regulators of cell division and Interleukin-1 associated receptor kinase 2 (p values<0.001 and FDR<20%). Conclusions No SNPs met genome-wide significance in this cohort of ELBW infants; however, areas of potential association and pathways meriting further study were identified. PMID:28283553
A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety
Cartwright, Dustin A.; Cestaro, Alessandro; Pruss, Dmitry; Pindo, Massimo; FitzGerald, Lisa M.; Vezzulli, Silvia; Reid, Julia; Malacarne, Giulia; Iliev, Diana; Coppola, Giuseppina; Wardell, Bryan; Micheletti, Diego; Macalma, Teresita; Facci, Marco; Mitchell, Jeff T.; Perazzolli, Michele; Eldredge, Glenn; Gatto, Pamela; Oyzerski, Rozan; Moretto, Marco; Gutin, Natalia; Stefanini, Marco; Chen, Yang; Segala, Cinzia; Davenport, Christine; Demattè, Lorenzo; Mraz, Amy; Battilana, Juri; Stormo, Keith; Costa, Fabrizio; Tao, Quanzhou; Si-Ammour, Azeddine; Harkins, Tim; Lackey, Angie; Perbost, Clotilde; Taillon, Bruce; Stella, Alessandra; Solovyev, Victor; Fawcett, Jeffrey A.; Sterck, Lieven; Vandepoele, Klaas; Grando, Stella M.; Toppo, Stefano; Moser, Claudio; Lanchbury, Jerry; Bogden, Robert; Skolnick, Mark; Sgaramella, Vittorio; Bhatnagar, Satish K.; Fontana, Paolo; Gutin, Alexander; Van de Peer, Yves; Salamini, Francesco; Viola, Roberto
2007-01-01
Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. PMID:18094749
Penz, Thomas; Horn, Matthias; Schmitz-Esser, Stephan
2010-01-01
The recently sequenced genome of the obligate intracellular amoeba symbiont 'Candidatus Amoebophilus asiaticus' is unique among prokaryotic genomes due to its extremely large fraction of genes encoding proteins harboring eukaryotic domains such as ankyrin-repeats, TPR/SEL1 repeats, leucine-rich repeats, as well as F- and U-box domains, most of which likely serve in the interaction with the amoeba host. Here we provide evidence for the presence of additional proteins which are presumably presented extracellularly and should thus also be important for host cell interaction. Surprisingly, we did not find homologues of any of the well-known protein secretion systems required to translocate effector proteins into the host cell in the A. asiaticus genome, and the type six secretion systems seems to be incomplete. Here we describe the presence of a putative prophage in the A. asiaticus genome, which shows similarity to the antifeeding prophage from the insect pathogen Serratia entomophila. In S. entomophila this system is used to deliver toxins into insect hosts. This putative antifeeding-like prophage might thus represent the missing protein secretion apparatus in A. asiaticus.
NASA Astrophysics Data System (ADS)
Gusev, Oleg; Sugimoto, Manabu; Novikova, Nataliya; Sychev, Vladimir; Okuda, Takashi; Kikawada, Takahiro
2012-07-01
Anhydrobiotic chironomid larvae of Polypedilum vanderplanki (Diptera) can withstand prolonged complete desiccation as well as other external stresses including ionizing radiation. Recent experiments showed that this insect is able to survive long-tern exposure to real outer space. At the same time, we found that dehydration causes alterations in chromatin structure and a severe fragmentation of nuclear DNA in the cells of the larvae despite successful anhydrobiosis. Analysis of several remote populations of the chironomid in Africa that desiccation-related DNA damage might be a driving genetic force for rapid radiation within the species. First results of ongoing genome project suggest that origin and evolution of anhydrobiosis in this single insect species related to rapid duplication of the genes, coding late embryogenesis abundant proteins (LEA) and other molecular agents directly involved in desiccation resistance in the cells. Analysis of genome-wide mRNA expression profiles in the larvae subjected to desiccation shows that joint-activity of large multiple-genes coding regions in the genome involved in control of anhydrobiosis-related molecular adaptations in the chironomid.
Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian
2009-11-01
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.
Ellrott, Kyle; Bailey, Matthew H; Saksena, Gordon; Covington, Kyle R; Kandoth, Cyriac; Stewart, Chip; Hess, Julian; Ma, Singer; Chiotti, Kami E; McLellan, Michael; Sofia, Heidi J; Hutter, Carolyn; Getz, Gad; Wheeler, David; Ding, Li
2018-03-28
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Winter storms drive rapid phenotypic, regulatory, and genomic shifts in the green anole lizard.
Campbell-Staton, Shane C; Cheviron, Zachary A; Rochette, Nicholas; Catchen, Julian; Losos, Jonathan B; Edwards, Scott V
2017-08-04
Extreme environmental perturbations offer opportunities to observe the effects of natural selection in wild populations. During the winter of 2013-2014, the southeastern United States endured an extreme cold event. We used thermal performance, transcriptomics, and genome scans to measure responses of lizard populations to storm-induced selection. We found significant increases in cold tolerance at the species' southern limit. Gene expression in southern survivors shifted toward patterns characteristic of northern populations. Comparing samples before and after the extreme winter, 14 genomic regions were differentiated in the surviving southern population; four also exhibited signatures of local adaptation across the latitudinal gradient and implicate genes involved in nervous system function. Together, our results suggest that extreme winter events can rapidly produce strong selection on natural populations at multiple biological levels that recapitulate geographic patterns of local adaptation. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
The Burmese python genome reveals the molecular basis for extreme adaptation in snakes
Castoe, Todd A.; de Koning, A. P. Jason; Hall, Kathryn T.; Card, Daren C.; Schield, Drew R.; Fujita, Matthew K.; Ruggiero, Robert P.; Degner, Jack F.; Daza, Juan M.; Gu, Wanjun; Reyes-Velasco, Jacobo; Shaney, Kyle J.; Castoe, Jill M.; Fox, Samuel E.; Poole, Alex W.; Polanco, Daniel; Dobry, Jason; Vandewege, Michael W.; Li, Qing; Schott, Ryan K.; Kapusta, Aurélie; Minx, Patrick; Feschotte, Cédric; Uetz, Peter; Ray, David A.; Hoffmann, Federico G.; Bogden, Robert; Smith, Eric N.; Chang, Belinda S. W.; Vonk, Freek J.; Casewell, Nicholas R.; Henkel, Christiaan V.; Richardson, Michael K.; Mackessy, Stephen P.; Bronikowski, Anne M.; Yandell, Mark; Warren, Wesley C.; Secor, Stephen M.; Pollock, David D.
2013-01-01
Snakes possess many extreme morphological and physiological adaptations. Identification of the molecular basis of these traits can provide novel understanding for vertebrate biology and medicine. Here, we study snake biology using the genome sequence of the Burmese python (Python molurus bivittatus), a model of extreme physiological and metabolic adaptation. We compare the python and king cobra genomes along with genomic samples from other snakes and perform transcriptome analysis to gain insights into the extreme phenotypes of the python. We discovered rapid and massive transcriptional responses in multiple organ systems that occur on feeding and coordinate major changes in organ size and function. Intriguingly, the homologs of these genes in humans are associated with metabolism, development, and pathology. We also found that many snake metabolic genes have undergone positive selection, which together with the rapid evolution of mitochondrial proteins, provides evidence for extensive adaptive redesign of snake metabolic pathways. Additional evidence for molecular adaptation and gene family expansions and contractions is associated with major physiological and phenotypic adaptations in snakes; genes involved are related to cell cycle, development, lungs, eyes, heart, intestine, and skeletal structure, including GRB2-associated binding protein 1, SSH, WNT16, and bone morphogenetic protein 7. Finally, changes in repetitive DNA content, guanine-cytosine isochore structure, and nucleotide substitution rates indicate major shifts in the structure and evolution of snake genomes compared with other amniotes. Phenotypic and physiological novelty in snakes seems to be driven by system-wide coordination of protein adaptation, gene expression, and changes in the structure of the genome. PMID:24297902
The Burmese python genome reveals the molecular basis for extreme adaptation in snakes.
Castoe, Todd A; de Koning, A P Jason; Hall, Kathryn T; Card, Daren C; Schield, Drew R; Fujita, Matthew K; Ruggiero, Robert P; Degner, Jack F; Daza, Juan M; Gu, Wanjun; Reyes-Velasco, Jacobo; Shaney, Kyle J; Castoe, Jill M; Fox, Samuel E; Poole, Alex W; Polanco, Daniel; Dobry, Jason; Vandewege, Michael W; Li, Qing; Schott, Ryan K; Kapusta, Aurélie; Minx, Patrick; Feschotte, Cédric; Uetz, Peter; Ray, David A; Hoffmann, Federico G; Bogden, Robert; Smith, Eric N; Chang, Belinda S W; Vonk, Freek J; Casewell, Nicholas R; Henkel, Christiaan V; Richardson, Michael K; Mackessy, Stephen P; Bronikowski, Anne M; Bronikowsi, Anne M; Yandell, Mark; Warren, Wesley C; Secor, Stephen M; Pollock, David D
2013-12-17
Snakes possess many extreme morphological and physiological adaptations. Identification of the molecular basis of these traits can provide novel understanding for vertebrate biology and medicine. Here, we study snake biology using the genome sequence of the Burmese python (Python molurus bivittatus), a model of extreme physiological and metabolic adaptation. We compare the python and king cobra genomes along with genomic samples from other snakes and perform transcriptome analysis to gain insights into the extreme phenotypes of the python. We discovered rapid and massive transcriptional responses in multiple organ systems that occur on feeding and coordinate major changes in organ size and function. Intriguingly, the homologs of these genes in humans are associated with metabolism, development, and pathology. We also found that many snake metabolic genes have undergone positive selection, which together with the rapid evolution of mitochondrial proteins, provides evidence for extensive adaptive redesign of snake metabolic pathways. Additional evidence for molecular adaptation and gene family expansions and contractions is associated with major physiological and phenotypic adaptations in snakes; genes involved are related to cell cycle, development, lungs, eyes, heart, intestine, and skeletal structure, including GRB2-associated binding protein 1, SSH, WNT16, and bone morphogenetic protein 7. Finally, changes in repetitive DNA content, guanine-cytosine isochore structure, and nucleotide substitution rates indicate major shifts in the structure and evolution of snake genomes compared with other amniotes. Phenotypic and physiological novelty in snakes seems to be driven by system-wide coordination of protein adaptation, gene expression, and changes in the structure of the genome.
Das, Priyanka; Maharana, Jitendra; Paria, Prasenjit; Mandal, Shambhu Nath; Meena, Dharmendra Kumar; Sharma, Anil Prakash; Jayarajan, Rijith; Dixit, Vishal; Verma, Ankit; Vellarikkal, Shamsudheen Karuthedath; Scaria, Vinod; Sivasubbu, Sridhar; Rao, Atmakuri Ramakrishna; Mohapatra, Trilochan
2015-01-01
Halomonas salina strain CIFRI1 is an extremely salt-stress-tolerant bacterium isolated from the salt crystals of the east coast of India. Here we report the annotated 3.45-Mb draft genome sequence of strain CIFRI1 having 86 contigs with 3,139 protein coding loci, including 62 RNA genes. PMID:25573926
Burguener, Germán F; Maldonado, Marcos J; Revale, Santiago; Fernández Do Porto, Darío; Rascován, Nicolás; Vázquez, Martín; Farías, María Eugenia; Marti, Marcelo A; Turjanski, Adrián Gustavo
2014-02-06
Halorubrum sp. strain AJ67, an extreme halophilic UV-resistant archaeon, was isolated from Laguna Antofalla in the Argentinian Puna. The draft genome sequence suggests the presence of potent enzyme candidates that are essential for survival under multiple environmental extreme conditions, such as high UV radiation, elevated salinity, and the presence of critical arsenic concentrations.
Yang, Ji; Li, Wen-Rong; Lv, Feng-Hua; He, San-Gang; Tian, Shi-Lin; Peng, Wei-Feng; Sun, Ya-Wei; Zhao, Yong-Xin; Tu, Xiao-Long; Zhang, Min; Xie, Xing-Long; Wang, Yu-Tao; Li, Jin-Quan; Liu, Yong-Gang; Shen, Zhi-Qiang; Wang, Feng; Liu, Guang-Jian; Lu, Hong-Feng; Kantanen, Juha; Han, Jian-Lin; Li, Meng-Hua; Liu, Ming-Jun
2016-10-01
Global climate change has a significant effect on extreme environments and a profound influence on species survival. However, little is known of the genome-wide pattern of livestock adaptations to extreme environments over a short time frame following domestication. Sheep (Ovis aries) have become well adapted to a diverse range of agroecological zones, including certain extreme environments (e.g., plateaus and deserts), during their post-domestication (approximately 8-9 kya) migration and differentiation. Here, we generated whole-genome sequences from 77 native sheep, with an average effective sequencing depth of ∼5× for 75 samples and ∼42× for 2 samples. Comparative genomic analyses among sheep in contrasting environments, that is, plateau (>4,000 m above sea level) versus lowland (<100 m), high-altitude region (>1500 m) versus low-altitude region (<1300 m), desert (<10 mm average annual precipitation) versus highly humid region (>600 mm), and arid zone (<400 mm) versus humid zone (>400 mm), detected a novel set of candidate genes as well as pathways and GO categories that are putatively associated with hypoxia responses at high altitudes and water reabsorption in arid environments. In addition, candidate genes and GO terms functionally related to energy metabolism and body size variations were identified. This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Takeuchi, Fumihiko; Watanabe, Shinya; Baba, Tadashi; Yuzawa, Harumi; Ito, Teruyo; Morimoto, Yuh; Kuroda, Makoto; Cui, Longzhu; Takahashi, Mikio; Ankai, Akiho; Baba, Shin-ichi; Fukui, Shigehiro; Lee, Jean C.; Hiramatsu, Keiichi
2005-01-01
Staphylococcus haemolyticus is an opportunistic bacterial pathogen that colonizes human skin and is remarkable for its highly antibiotic-resistant phenotype. We determined the complete genome sequence of S.haemolyticus to better understand its pathogenicity and evolutionary relatedness to the other staphylococcal species. A large proportion of the open reading frames in the genomes of S.haemolyticus, Staphylococcus aureus, and Staphylococcus epidermidis were conserved in their sequence and order on the chromosome. We identified a region of the bacterial chromosome just downstream of the origin of replication that showed little homology among the species but was conserved among strains within a species. This novel region, designated the “oriC environ,” likely contributes to the evolution and differentiation of the staphylococcal species, since it was enriched for species-specific nonessential genes that contribute to the biological features of each staphylococcal species. A comparative analysis of the genomes of S.haemolyticus, S.aureus, and S.epidermidis elucidated differences in their biological and genetic characteristics and pathogenic potentials. We identified as many as 82 insertion sequences in the S.haemolyticus chromosome that probably mediated frequent genomic rearrangements, resulting in phenotypic diversification of the strain. Such rearrangements could have brought genomic plasticity to this species and contributed to its acquisition of antibiotic resistance. PMID:16237012
The Most Developmentally Truncated Fishes Show Extensive Hox Gene Loss and Miniaturized Genomes
Malmstrøm, Martin; Britz, Ralf; Matschiner, Michael; Tørresen, Ole K; Hadiaty, Renny Kurnia; Yaakob, Norsham; Tan, Heok Hui; Jakobsen, Kjetill Sigurd; Salzburger, Walter; Rüber, Lukas
2018-01-01
Abstract The world’s smallest fishes belong to the genus Paedocypris. These miniature fishes are endemic to an extreme habitat: the peat swamp forests in Southeast Asia, characterized by highly acidic blackwater. This threatened habitat is home to a large array of fishes, including a number of miniaturized but also developmentally truncated species. Especially the genus Paedocypris is characterized by profound, organism-wide developmental truncation, resulting in sexually mature individuals of <8 mm in length with a larval phenotype. Here, we report on evolutionary simplification in the genomes of two species of the dwarf minnow genus Paedocypris using whole-genome sequencing. The two species feature unprecedented Hox gene loss and genome reduction in association with their massive developmental truncation. We also show how other genes involved in the development of musculature, nervous system, and skeleton have been lost in Paedocypris, mirroring its highly progenetic phenotype. Further, our analyses suggest two mechanisms responsible for the genome streamlining in Paedocypris in relation to other Cypriniformes: severe intron shortening and reduced repeat content. As the first report on the genomic sequence of a vertebrate species with organism-wide developmental truncation, the results of our work enhance our understanding of genome evolution and how genotypes are translated to phenotypes. In addition, as a naturally simplified system closely related to zebrafish, Paedocypris provides novel insights into vertebrate development. PMID:29684203
Kis-Papo, Tamar; Kirzhner, Valery; Wasser, Solomon P.; Nevo, Eviatar
2003-01-01
We have found that genomic diversity is generally positively correlated with abiotic and biotic stress levels (1–3). However, beyond a high-threshold level of stress, the diversity declines to a few adapted genotypes. The Dead Sea is the harshest planetary hypersaline environment (340 g·liter–1 total dissolved salts, ≈10 times sea water). Hence, the Dead Sea is an excellent natural laboratory for testing the “rise and fall” pattern of genetic diversity with stress proposed in this article. Here, we examined genomic diversity of the ascomycete fungus Aspergillus versicolor from saline, nonsaline, and hypersaline Dead Sea environments. We screened the coding and noncoding genomes of A. versicolor isolates by using >600 AFLP (amplified fragment length polymorphism) markers (equal to loci). Genomic diversity was positively correlated with stress, culminating in the Dead Sea surface but dropped drastically in 50- to 280-m-deep seawater. The genomic diversity pattern paralleled the pattern of sexual reproduction of fungal species across the same southward gradient of increasing stress in Israel. This parallel may suggest that diversity and sex are intertwined intimately according to the rise and fall pattern and adaptively selected by natural selection in fungal genome evolution. Future large-scale verification in micromycetes will define further the trajectories of diversity and sex in the rise and fall pattern. PMID:14645702
Gostinčar, Cene; Ohm, Robin A.; Kogej, Tina; ...
2014-07-01
Aureobasidium pullulans is a black-yeast-like fungus used for production of the polysaccharide pullulan and the antimycotic aureobasidin A, and as a biocontrol agent in agriculture. It can cause opportunistic human infections, and it inhabits various extreme environments. To promote the understanding of these traits, we performed de-novo genome sequencing of the four varieties of A. pullulans. The 25.43-29.62 Mb genomes of these four varieties of A. pullulans encode between 10266 and 11866 predicted proteins. Their genomes encode most of the enzyme families involved in degradation of plant material and many sugar transporters, and they have genes possibly associated with degradationmore » of plastic and aromatic compounds. Proteins believed to be involved in the synthesis of pullulan and siderophores, but not of aureobasidin A, are predicted. Putative stress-tolerance genes include several aquaporins and aquaglyceroporins, large numbers of alkali-metal cation transporters, genes for the synthesis of compatible solutes and melanin, all of the components of the high-osmolarity glycerol pathway, and bacteriorhodopsin-like proteins. All of these genomes contain a homothallic mating-type locus. The differences between these four varieties of A. pullulans are large enough to justify their redefinition as separate species: A. pullulans, A. melanogenum, A. subglaciale and A. namibiae. We observed redundancy in several gene families that can be linked to the nutritional versatility of these species and their particular stress tolerance. In conclusions, the availability of the genome sequences of the four Aureobasidium species should improve their biotechnological exploitation and promote our understanding of their stress-tolerance mechanisms, diverse lifestyles, and pathogenic potential.« less
Lin, Douglas I; Chudnovsky, Yakov; Duggan, Bridget; Zajchowski, Deborah; Greenbowe, Joel; Ross, Jeffrey S; Gay, Laurie M; Ali, Siraj M; Elvin, Julia A
2017-12-01
Small cell carcinoma of the ovary, hypercalcemic-type (SCCOHT) is a rare, extremely aggressive neoplasm that usually occurs in young women and is characterized by deleterious germline or somatic SMARCA4 mutations. We performed comprehensive genomic profiling (CGP) to potentially identify additional clinically and pathophysiologically relevant genomic alterations in SCCOHT. CGP assessment of all classes of coding alterations in up to 406 genes commonly altered in cancer and intronic regions for up to 31 genes commonly rearranged in cancer was performed on 18 SCCOHT cases (16 exhibiting classic morphology and 2 cases exhibiting exclusive a large cell variant morphology). In addition, a retrospective database search for clinically advanced ovarian tumors with genomic profiles similar to SCCOHT yielded 3 additional cases originally diagnosed as non-SCCOHT. CGP revealed inactivating SMARCA4 alterations and low tumor mutational burden (TMB) (<6mutations/Mb) in 94% (15/16) of SCCOHT with classic morphology. In contrast, both (2/2) cases exhibiting only large cell variant morphology were hypermutated (TMB scores of 90 and 360mut/Mb) and were wildtype for SMARCA4. In our retrospective search, an index ovarian cancer patient harboring inactivating SMARCA4 alterations, initially diagnosed as endometrioid carcinoma, was re-classified as SCCOHT and responded to an SCCOHT chemotherapy regimen. The vast majority of SCCOHT demonstrate genomic SMARCA4 loss with only rare co-occurring alterations. Our data support a role for CGP in the diagnosis and management of SCCOHT and of other lesions with overlapping histological and clinical features, since identifying the former by genomic profile suggests benefit from an appropriate regimen and treatment decisions, as illustrated by an index patient. Copyright © 2017 Elsevier Inc. All rights reserved.
Resolving the tips of the tree of life: How much mitochondrialdata doe we need?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonett, Ronald M.; Macey, J. Robert; Boore, Jeffrey L.
2005-04-29
Mitochondrial (mt) DNA sequences are used extensively to reconstruct evolutionary relationships among recently diverged animals,and have constituted the most widely used markers for species- and generic-level relationships for the last decade or more. However, most studies to date have employed relatively small portions of the mt-genome. In contrast, complete mt-genomes primarily have been used to investigate deep divergences, including several studies of the amount of mt sequence necessary to recover ancient relationships. We sequenced and analyzed 24 complete mt-genomes from a group of salamander species exhibiting divergences typical of those in many species-level studies. We present the first comprehensive investigationmore » of the amount of mt sequence data necessary to consistently recover the mt-genome tree at this level, using parsimony and Bayesian methods. Both methods of phylogenetic analysis revealed extremely similar results. A surprising number of well supported, yet conflicting, relationships were found in trees based on fragments less than {approx}2000 nucleotides (nt), typical of the vast majority of the thousands of mt-based studies published to date. Large amounts of data (11,500+ nt) were necessary to consistently recover the whole mt-genome tree. Some relationships consistently were recovered with fragments of all sizes, but many nodes required the majority of the mt-genome to stabilize, particularly those associated with short internal branches. Although moderate amounts of data (2000-3000 nt) were adequate to recover mt-based relationships for which most nodes were congruent with the whole mt-genome tree, many thousands of nucleotides were necessary to resolve rapid bursts of evolution. Recent advances in genomics are making collection of large amounts of sequence data highly feasible, and our results provide the basis for comparative studies of other closely related groups to optimize mt sequence sampling and phylogenetic resolution at the ''tips'' of the Tree of Life.« less
Farias, Maria Eugenia; Revale, Santiago; Mancini, Estefania; Ordoñez, Omar; Turjanski, Adrian; Cortez, Néstor; Vazquez, Martin P.
2011-01-01
The high-altitude Andean lakes (HAAL) in the Argentinean Puna-high Andes region represent an almost unexplored ecosystem exposed to extreme conditions (high UV irradiation, hypersalinity, drastic temperature changes, desiccation, and high pH). Here we present the first genome sequence, a Sphingomonas sp., isolated from this extreme environment. PMID:21602338
Burguener, Germán F.; Maldonado, Marcos J.; Revale, Santiago; Fernández Do Porto, Darío; Rascován, Nicolás; Vázquez, Martín; Farías, María Eugenia; Marti, Marcelo A.
2014-01-01
Halorubrum sp. strain AJ67, an extreme halophilic UV-resistant archaeon, was isolated from Laguna Antofalla in the Argentinian Puna. The draft genome sequence suggests the presence of potent enzyme candidates that are essential for survival under multiple environmental extreme conditions, such as high UV radiation, elevated salinity, and the presence of critical arsenic concentrations. PMID:24503991
Liu, Changqing; Bai, Chunyu; Guo, Yu; Liu, Dan; Lu, Taofeng; Li, Xiangchen; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2014-01-01
Bacterial artificial chromosome (BAC) libraries are extremely valuable for the genome-wide genetic dissection of complex organisms. The Siberian tiger, one of the most well-known wild primitive carnivores in China, is an endangered animal. In order to promote research on its genome, a high-redundancy BAC library of the Siberian tiger was constructed and characterized. The library is divided into two sub-libraries prepared from blood cells and two sub-libraries prepared from fibroblasts. This BAC library contains 153,600 individually archived clones; for PCR-based screening of the library, BACs were placed into 40 superpools of 10 × 384-deep well microplates. The average insert size of BAC clones was estimated to be 116.5 kb, representing approximately 6.46 genome equivalents of the haploid genome and affording a 98.86% statistical probability of obtaining at least one clone containing a unique DNA sequence. Screening the library with 19 microsatellite markers and a SRY sequence revealed that each of these markers were present in the library; the average number of positive clones per marker was 6.74 (range 2 to 12), consistent with 6.46 coverage of the tiger genome. Additionally, we identified 72 microsatellite markers that could potentially be used as genetic markers. This BAC library will serve as a valuable resource for physical mapping, comparative genomic study and large-scale genome sequencing in the tiger. PMID:24608928
Mutations in MITF and PAX3 cause "splashed white" and other white spotting phenotypes in horses.
Hauswirth, Regula; Haase, Bianca; Blatter, Marlis; Brooks, Samantha A; Burger, Dominik; Drögemüller, Cord; Gerber, Vincent; Henke, Diana; Janda, Jozef; Jude, Rony; Magdesian, K Gary; Matthews, Jacqueline M; Poncet, Pierre-André; Svansson, Vilhjálmur; Tozaki, Teruaki; Wilkinson-White, Lorna; Penedo, M Cecilia T; Rieder, Stefan; Leeb, Tosso
2012-01-01
During fetal development neural-crest-derived melanoblasts migrate across the entire body surface and differentiate into melanocytes, the pigment-producing cells. Alterations in this precisely regulated process can lead to white spotting patterns. White spotting patterns in horses are a complex trait with a large phenotypic variance ranging from minimal white markings up to completely white horses. The "splashed white" pattern is primarily characterized by an extremely large blaze, often accompanied by extended white markings at the distal limbs and blue eyes. Some, but not all, splashed white horses are deaf. We analyzed a Quarter Horse family segregating for the splashed white coat color. Genome-wide linkage analysis in 31 horses gave a positive LOD score of 1.6 in a region on chromosome 6 containing the PAX3 gene. However, the linkage data were not in agreement with a monogenic inheritance of a single fully penetrant mutation. We sequenced the PAX3 gene and identified a missense mutation in some, but not all, splashed white Quarter Horses. Genome-wide association analysis indicated a potential second signal near MITF. We therefore sequenced the MITF gene and found a 10 bp insertion in the melanocyte-specific promoter. The MITF promoter variant was present in some splashed white Quarter Horses from the studied family, but also in splashed white horses from other horse breeds. Finally, we identified two additional non-synonymous mutations in the MITF gene in unrelated horses with white spotting phenotypes. Thus, several independent mutations in MITF and PAX3 together with known variants in the EDNRB and KIT genes explain a large proportion of horses with the more extreme white spotting phenotypes.
Mutations in MITF and PAX3 Cause “Splashed White” and Other White Spotting Phenotypes in Horses
Blatter, Marlis; Brooks, Samantha A.; Burger, Dominik; Drögemüller, Cord; Gerber, Vincent; Henke, Diana; Janda, Jozef; Jude, Rony; Magdesian, K. Gary; Matthews, Jacqueline M.; Poncet, Pierre-André; Svansson, Vilhjálmur; Tozaki, Teruaki; Wilkinson-White, Lorna; Penedo, M. Cecilia T.; Rieder, Stefan; Leeb, Tosso
2012-01-01
During fetal development neural-crest-derived melanoblasts migrate across the entire body surface and differentiate into melanocytes, the pigment-producing cells. Alterations in this precisely regulated process can lead to white spotting patterns. White spotting patterns in horses are a complex trait with a large phenotypic variance ranging from minimal white markings up to completely white horses. The “splashed white” pattern is primarily characterized by an extremely large blaze, often accompanied by extended white markings at the distal limbs and blue eyes. Some, but not all, splashed white horses are deaf. We analyzed a Quarter Horse family segregating for the splashed white coat color. Genome-wide linkage analysis in 31 horses gave a positive LOD score of 1.6 in a region on chromosome 6 containing the PAX3 gene. However, the linkage data were not in agreement with a monogenic inheritance of a single fully penetrant mutation. We sequenced the PAX3 gene and identified a missense mutation in some, but not all, splashed white Quarter Horses. Genome-wide association analysis indicated a potential second signal near MITF. We therefore sequenced the MITF gene and found a 10 bp insertion in the melanocyte-specific promoter. The MITF promoter variant was present in some splashed white Quarter Horses from the studied family, but also in splashed white horses from other horse breeds. Finally, we identified two additional non-synonymous mutations in the MITF gene in unrelated horses with white spotting phenotypes. Thus, several independent mutations in MITF and PAX3 together with known variants in the EDNRB and KIT genes explain a large proportion of horses with the more extreme white spotting phenotypes. PMID:22511888
Janssen, Paul J; Van Houdt, Rob; Moors, Hugo; Monsieurs, Pieter; Morin, Nicolas; Michaux, Arlette; Benotmane, Mohammed A; Leys, Natalie; Vallaeys, Tatiana; Lapidus, Alla; Monchy, Sébastien; Médigue, Claudine; Taghavi, Safiyh; McCorkle, Sean; Dunn, John; van der Lelie, Daniël; Mergeay, Max
2010-05-05
Many bacteria in the environment have adapted to the presence of toxic heavy metals. Over the last 30 years, this heavy metal tolerance was the subject of extensive research. The bacterium Cupriavidus metallidurans strain CH34, originally isolated by us in 1976 from a metal processing factory, is considered a major model organism in this field because it withstands milli-molar range concentrations of over 20 different heavy metal ions. This tolerance is mostly achieved by rapid ion efflux but also by metal-complexation and -reduction. We present here the full genome sequence of strain CH34 and the manual annotation of all its genes. The genome of C. metallidurans CH34 is composed of two large circular chromosomes CHR1 and CHR2 of, respectively, 3,928,089 bp and 2,580,084 bp, and two megaplasmids pMOL28 and pMOL30 of, respectively, 171,459 bp and 233,720 bp in size. At least 25 loci for heavy-metal resistance (HMR) are distributed over the four replicons. Approximately 67% of the 6,717 coding sequences (CDSs) present in the CH34 genome could be assigned a putative function, and 9.1% (611 genes) appear to be unique to this strain. One out of five proteins is associated with either transport or transcription while the relay of environmental stimuli is governed by more than 600 signal transduction systems. The CH34 genome is most similar to the genomes of other Cupriavidus strains by correspondence between the respective CHR1 replicons but also displays similarity to the genomes of more distantly related species as a result of gene transfer and through the presence of large genomic islands. The presence of at least 57 IS elements and 19 transposons and the ability to take in and express foreign genes indicates a very dynamic and complex genome shaped by evolutionary forces. The genome data show that C. metallidurans CH34 is particularly well equipped to live in extreme conditions and anthropogenic environments that are rich in metals.
Janssen, Paul J.; Van Houdt, Rob; Moors, Hugo; Monsieurs, Pieter; Morin, Nicolas; Michaux, Arlette; Benotmane, Mohammed A.; Leys, Natalie; Vallaeys, Tatiana; Lapidus, Alla; Monchy, Sébastien; Médigue, Claudine; Taghavi, Safiyh; McCorkle, Sean; Dunn, John; van der Lelie, Daniël; Mergeay, Max
2010-01-01
Many bacteria in the environment have adapted to the presence of toxic heavy metals. Over the last 30 years, this heavy metal tolerance was the subject of extensive research. The bacterium Cupriavidus metallidurans strain CH34, originally isolated by us in 1976 from a metal processing factory, is considered a major model organism in this field because it withstands milli-molar range concentrations of over 20 different heavy metal ions. This tolerance is mostly achieved by rapid ion efflux but also by metal-complexation and -reduction. We present here the full genome sequence of strain CH34 and the manual annotation of all its genes. The genome of C. metallidurans CH34 is composed of two large circular chromosomes CHR1 and CHR2 of, respectively, 3,928,089 bp and 2,580,084 bp, and two megaplasmids pMOL28 and pMOL30 of, respectively, 171,459 bp and 233,720 bp in size. At least 25 loci for heavy-metal resistance (HMR) are distributed over the four replicons. Approximately 67% of the 6,717 coding sequences (CDSs) present in the CH34 genome could be assigned a putative function, and 9.1% (611 genes) appear to be unique to this strain. One out of five proteins is associated with either transport or transcription while the relay of environmental stimuli is governed by more than 600 signal transduction systems. The CH34 genome is most similar to the genomes of other Cupriavidus strains by correspondence between the respective CHR1 replicons but also displays similarity to the genomes of more distantly related species as a result of gene transfer and through the presence of large genomic islands. The presence of at least 57 IS elements and 19 transposons and the ability to take in and express foreign genes indicates a very dynamic and complex genome shaped by evolutionary forces. The genome data show that C. metallidurans CH34 is particularly well equipped to live in extreme conditions and anthropogenic environments that are rich in metals. PMID:20463976
Genomics of an extreme psychrophile, Psychromonas ingrahamii
Riley, Monica; Staley, James T; Danchin, Antoine; Wang, Ting Zhang; Brettin, Thomas S; Hauser, Loren J; Land, Miriam L; Thompson, Linda S
2008-01-01
Background The genome sequence of the sea-ice bacterium Psychromonas ingrahamii 37, which grows exponentially at -12C, may reveal features that help to explain how this extreme psychrophile is able to grow at such low temperatures. Determination of the whole genome sequence allows comparison with genes of other psychrophiles and mesophiles. Results Correspondence analysis of the composition of all P. ingrahamii proteins showed that (1) there are 6 classes of proteins, at least one more than other bacteria, (2) integral inner membrane proteins are not sharply separated from bulk proteins suggesting that, overall, they may have a lower hydrophobic character, and (3) there is strong opposition between asparagine and the oxygen-sensitive amino acids methionine, arginine, cysteine and histidine and (4) one of the previously unseen clusters of proteins has a high proportion of "orphan" hypothetical proteins, raising the possibility these are cold-specific proteins. Based on annotation of proteins by sequence similarity, (1) P. ingrahamii has a large number (61) of regulators of cyclic GDP, suggesting that this bacterium produces an extracellular polysaccharide that may help sequester water or lower the freezing point in the vicinity of the cell. (2) P. ingrahamii has genes for production of the osmolyte, betaine choline, which may balance the osmotic pressure as sea ice freezes. (3) P. ingrahamii has a large number (11) of three-subunit TRAP systems that may play an important role in the transport of nutrients into the cell at low temperatures. (4) Chaperones and stress proteins may play a critical role in transforming nascent polypeptides into 3-dimensional configurations that permit low temperature growth. (5) Metabolic properties of P. ingrahamii were deduced. Finally, a few small sets of proteins of unknown function which may play a role in psychrophily have been singled out as worthy of future study. Conclusion The results of this genomic analysis provide a springboard for further investigations into mechanisms of psychrophily. Focus on the role of asparagine excess in proteins, targeted phenotypic characterizations and gene expression investigations are needed to ascertain if and how the organism regulates various proteins in response to growth at lower temperatures. PMID:18460197
Prasher, Bhavana; Negi, Sapna; Aggarwal, Shilpi; Mandal, Amit K; Sethi, Tav P; Deshmukh, Shailaja R; Purohit, Sudha G; Sengupta, Shantanu; Khanna, Sangeeta; Mohammad, Farhan; Garg, Gaurav; Brahmachari, Samir K; Mukerji, Mitali
2008-09-09
Ayurveda is an ancient system of personalized medicine documented and practiced in India since 1500 B.C. According to this system an individual's basic constitution to a large extent determines predisposition and prognosis to diseases as well as therapy and life-style regime. Ayurveda describes seven broad constitution types (Prakritis) each with a varying degree of predisposition to different diseases. Amongst these, three most contrasting types, Vata, Pitta, Kapha, are the most vulnerable to diseases. In the realm of modern predictive medicine, efforts are being directed towards capturing disease phenotypes with greater precision for successful identification of markers for prospective disease conditions. In this study, we explore whether the different constitution types as described in Ayurveda has molecular correlates. Normal individuals of the three most contrasting constitutional types were identified following phenotyping criteria described in Ayurveda in Indian population of Indo-European origin. The peripheral blood samples of these individuals were analysed for genome wide expression levels, biochemical and hematological parameters. Gene Ontology (GO) and pathway based analysis was carried out on differentially expressed genes to explore if there were significant enrichments of functional categories among Prakriti types. Individuals from the three most contrasting constitutional types exhibit striking differences with respect to biochemical and hematological parameters and at genome wide expression levels. Biochemical profiles like liver function tests, lipid profiles, and hematological parameters like haemoglobin exhibited differences between Prakriti types. Functional categories of genes showing differential expression among Prakriti types were significantly enriched in core biological processes like transport, regulation of cyclin dependent protein kinase activity, immune response and regulation of blood coagulation. A significant enrichment of housekeeping, disease related and hub genes were observed in these extreme constitution types. Ayurveda based method of phenotypic classification of extreme constitutional types allows us to uncover genes that may contribute to system level differences in normal individuals which could lead to differential disease predisposition. This is a first attempt towards unraveling the clinical phenotyping principle of a traditional system of medicine in terms of modern biology. An integration of Ayurveda with genomics holds potential and promise for future predictive medicine.
Prasher, Bhavana; Negi, Sapna; Aggarwal, Shilpi; Mandal, Amit K; Sethi, Tav P; Deshmukh, Shailaja R; Purohit, Sudha G; Sengupta, Shantanu; Khanna, Sangeeta; Mohammad, Farhan; Garg, Gaurav; Brahmachari, Samir K; Mukerji, Mitali
2008-01-01
Background Ayurveda is an ancient system of personalized medicine documented and practiced in India since 1500 B.C. According to this system an individual's basic constitution to a large extent determines predisposition and prognosis to diseases as well as therapy and life-style regime. Ayurveda describes seven broad constitution types (Prakritis) each with a varying degree of predisposition to different diseases. Amongst these, three most contrasting types, Vata, Pitta, Kapha, are the most vulnerable to diseases. In the realm of modern predictive medicine, efforts are being directed towards capturing disease phenotypes with greater precision for successful identification of markers for prospective disease conditions. In this study, we explore whether the different constitution types as described in Ayurveda has molecular correlates. Methods Normal individuals of the three most contrasting constitutional types were identified following phenotyping criteria described in Ayurveda in Indian population of Indo-European origin. The peripheral blood samples of these individuals were analysed for genome wide expression levels, biochemical and hematological parameters. Gene Ontology (GO) and pathway based analysis was carried out on differentially expressed genes to explore if there were significant enrichments of functional categories among Prakriti types. Results Individuals from the three most contrasting constitutional types exhibit striking differences with respect to biochemical and hematological parameters and at genome wide expression levels. Biochemical profiles like liver function tests, lipid profiles, and hematological parameters like haemoglobin exhibited differences between Prakriti types. Functional categories of genes showing differential expression among Prakriti types were significantly enriched in core biological processes like transport, regulation of cyclin dependent protein kinase activity, immune response and regulation of blood coagulation. A significant enrichment of housekeeping, disease related and hub genes were observed in these extreme constitution types. Conclusion Ayurveda based method of phenotypic classification of extreme constitutional types allows us to uncover genes that may contribute to system level differences in normal individuals which could lead to differential disease predisposition. This is a first attempt towards unraveling the clinical phenotyping principle of a traditional system of medicine in terms of modern biology. An integration of Ayurveda with genomics holds potential and promise for future predictive medicine. PMID:18782426
Choi, Dong Han; Ahn, Chisang; Jang, Gwang Il; ...
2015-11-11
Gracilimonas tropica Choi et al. 2009 is a member of order Sphingobacteriales, class Sphingobacteriia. Three species of the genus Gracilimonas have been isolated from marine seawater or a salt mine and showed extremely halotolerant and mesophilic features, although close relatives are extremely halophilic or thermophilic. The type strain of the type species of Gracilimonas, G. tropica DSM19535 T, was isolated from a Synechococcus culture which was established from the tropical sea-surface water of the Pacific Ocean. The genome of the strain DSM19535 T was sequenced through the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes project.more » Here, we describe the genomic features of the strain. The 3,831,242 bp long draft genome consists of 48 contigs with 3373 protein-coding and 53 RNA genes. Finally, the strain seems to adapt to phosphate limitation and requires amino acids from external environment. In addition, genomic analyses and pasteurization experiment suggested that G. tropica DSM19535 T did not form spore.« less
Duncan, Emma L; Danoy, Patrick; Kemp, John P; Leo, Paul J; McCloskey, Eugene; Nicholson, Geoffrey C; Eastell, Richard; Prince, Richard L; Eisman, John A; Jones, Graeme; Sambrook, Philip N; Reid, Ian R; Dennison, Elaine M; Wark, John; Richards, J Brent; Uitterlinden, Andre G; Spector, Tim D; Esapa, Chris; Cox, Roger D; Brown, Steve D M; Thakker, Rajesh V; Addison, Kathryn A; Bradbury, Linda A; Center, Jacqueline R; Cooper, Cyrus; Cremin, Catherine; Estrada, Karol; Felsenberg, Dieter; Glüer, Claus-C; Hadler, Johanna; Henry, Margaret J; Hofman, Albert; Kotowicz, Mark A; Makovey, Joanna; Nguyen, Sing C; Nguyen, Tuan V; Pasco, Julie A; Pryce, Karena; Reid, David M; Rivadeneira, Fernando; Roux, Christian; Stefansson, Kari; Styrkarsdottir, Unnur; Thorleifsson, Gudmar; Tichawangana, Rumbidzai; Evans, David M; Brown, Matthew A
2011-04-01
Osteoporotic fracture is a major cause of morbidity and mortality worldwide. Low bone mineral density (BMD) is a major predisposing factor to fracture and is known to be highly heritable. Site-, gender-, and age-specific genetic effects on BMD are thought to be significant, but have largely not been considered in the design of genome-wide association studies (GWAS) of BMD to date. We report here a GWAS using a novel study design focusing on women of a specific age (postmenopausal women, age 55-85 years), with either extreme high or low hip BMD (age- and gender-adjusted BMD z-scores of +1.5 to +4.0, n = 1055, or -4.0 to -1.5, n = 900), with replication in cohorts of women drawn from the general population (n = 20,898). The study replicates 21 of 26 known BMD-associated genes. Additionally, we report suggestive association of a further six new genetic associations in or around the genes CLCN7, GALNT3, IBSP, LTBP3, RSPO3, and SOX4, with replication in two independent datasets. A novel mouse model with a loss-of-function mutation in GALNT3 is also reported, which has high bone mass, supporting the involvement of this gene in BMD determination. In addition to identifying further genes associated with BMD, this study confirms the efficiency of extreme-truncate selection designs for quantitative trait association studies.
Duncan, Emma L.; Danoy, Patrick; Kemp, John P.; Leo, Paul J.; McCloskey, Eugene; Nicholson, Geoffrey C.; Eastell, Richard; Prince, Richard L.; Eisman, John A.; Jones, Graeme; Sambrook, Philip N.; Reid, Ian R.; Dennison, Elaine M.; Wark, John; Richards, J. Brent; Uitterlinden, Andre G.; Spector, Tim D.; Esapa, Chris; Cox, Roger D.; Brown, Steve D. M.; Thakker, Rajesh V.; Addison, Kathryn A.; Bradbury, Linda A.; Center, Jacqueline R.; Cooper, Cyrus; Cremin, Catherine; Estrada, Karol; Felsenberg, Dieter; Glüer, Claus-C.; Hadler, Johanna; Henry, Margaret J.; Hofman, Albert; Kotowicz, Mark A.; Makovey, Joanna; Nguyen, Sing C.; Nguyen, Tuan V.; Pasco, Julie A.; Pryce, Karena; Reid, David M.; Rivadeneira, Fernando; Roux, Christian; Stefansson, Kari; Styrkarsdottir, Unnur; Thorleifsson, Gudmar; Tichawangana, Rumbidzai; Evans, David M.; Brown, Matthew A.
2011-01-01
Osteoporotic fracture is a major cause of morbidity and mortality worldwide. Low bone mineral density (BMD) is a major predisposing factor to fracture and is known to be highly heritable. Site-, gender-, and age-specific genetic effects on BMD are thought to be significant, but have largely not been considered in the design of genome-wide association studies (GWAS) of BMD to date. We report here a GWAS using a novel study design focusing on women of a specific age (postmenopausal women, age 55–85 years), with either extreme high or low hip BMD (age- and gender-adjusted BMD z-scores of +1.5 to +4.0, n = 1055, or −4.0 to −1.5, n = 900), with replication in cohorts of women drawn from the general population (n = 20,898). The study replicates 21 of 26 known BMD–associated genes. Additionally, we report suggestive association of a further six new genetic associations in or around the genes CLCN7, GALNT3, IBSP, LTBP3, RSPO3, and SOX4, with replication in two independent datasets. A novel mouse model with a loss-of-function mutation in GALNT3 is also reported, which has high bone mass, supporting the involvement of this gene in BMD determination. In addition to identifying further genes associated with BMD, this study confirms the efficiency of extreme-truncate selection designs for quantitative trait association studies. PMID:21533022
Hoy, Marjorie A.; Waterhouse, Robert M.; Wu, Ke; Estep, Alden S.; Ioannidis, Panagiotis; Palmer, William J.; Pomerantz, Aaron F.; Simão, Felipe A.; Thomas, Jainy; Jiggins, Francis M.; Murphy, Terence D.; Pritham, Ellen J.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Gibbs, Richard A.; Richards, Stephen
2016-01-01
Metaseiulus occidentalis is an eyeless phytoseiid predatory mite employed for the biological control of agricultural pests including spider mites. Despite appearances, these predator and prey mites are separated by some 400 Myr of evolution and radically different lifestyles. We present a 152-Mb draft assembly of the M. occidentalis genome: Larger than that of its favored prey, Tetranychus urticae, but considerably smaller than those of many other chelicerates, enabling an extremely contiguous and complete assembly to be built—the best arachnid to date. Aided by transcriptome data, genome annotation cataloged 18,338 protein-coding genes and identified large numbers of Helitron transposable elements. Comparisons with other arthropods revealed a particularly dynamic and turbulent genomic evolutionary history. Its genes exhibit elevated molecular evolution, with strikingly high numbers of intron gains and losses, in stark contrast to the deer tick Ixodes scapularis. Uniquely among examined arthropods, this predatory mite’s Hox genes are completely atomized, dispersed across the genome, and it encodes five copies of the normally single-copy RNA processing Dicer-2 gene. Examining gene families linked to characteristic biological traits of this tiny predator provides initial insights into processes of sex determination, development, immune defense, and how it detects, disables, and digests its prey. As the first reference genome for the Phytoseiidae, and for any species with the rare sex determination system of parahaploidy, the genome of the western orchard predatory mite improves genomic sampling of chelicerates and provides invaluable new resources for functional genomic analyses of this family of agriculturally important mites. PMID:26951779
Superior cross-species reference genes: a blueberry case study
USDA-ARS?s Scientific Manuscript database
The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the applic...
Crits-Christoph, Alexander; Robinson, Courtney K.; Ma, Bing; Ravel, Jacques; Wierzchos, Jacek; Ascaso, Carmen; Artieda, Octavio; Souza-Egipsy, Virginia; Casero, M. Cristina; DiRuggiero, Jocelyne
2016-01-01
Under extreme water deficit, endolithic (inside rock) microbial ecosystems are considered environmental refuges for life in cold and hot deserts, yet their diversity and functional adaptations remain vastly unexplored. The metagenomic analyses of the communities from two rock substrates, calcite and ignimbrite, revealed that they were dominated by Cyanobacteria, Actinobacteria, and Chloroflexi. The relative distribution of major phyla was significantly different between the two substrates and biodiversity estimates, from 16S rRNA gene sequences and from the metagenomic data, all pointed to a higher taxonomic diversity in the calcite community. While both endolithic communities showed adaptations to extreme aridity and to the rock habitat, their functional capabilities revealed significant differences. ABC transporters and pathways for osmoregulation were more diverse in the calcite chasmoendolithic community. In contrast, the ignimbrite cryptoendolithic community was enriched in pathways for secondary metabolites, such as non-ribosomal peptides (NRP) and polyketides (PK). Assemblies of the metagenome data produced population genomes for the major phyla found in both communities and revealed a greater diversity of Cyanobacteria population genomes for the calcite substrate. Draft genomes of the dominant Cyanobacteria in each community were constructed with more than 93% estimated completeness. The two annotated proteomes shared 64% amino acid identity and a significantly higher number of genes involved in iron update, and NRPS gene clusters, were found in the draft genomes from the ignimbrite. Both the community-wide and genome-specific differences may be related to higher water availability and the colonization of large fissures and cracks in the calcite in contrast to a harsh competition for colonization space and nutrient resources in the narrow pores of the ignimbrite. Together, these results indicated that the habitable architecture of both lithic substrates- chasmoendolithic versus cryptoendolithic – might be an essential element in determining the colonization and the diversity of the microbial communities in endolithic substrates at the dry limit for life. PMID:27014224
Hidalgo, Oriane; Pellicer, Jaume; Percy, Diana; Leitch, Ilia J.
2016-01-01
Abstract Background The common stinging nettle, Urtica dioica L. sensu lato, is an invertebrate "superhost", its clonal patches maintaining large populations of insects and molluscs. It is extremely widespread in Europe and highly variable, and two ploidy levels (diploid and tetraploid) are known. However, geographical patterns in cytotype variation require further study. New information We assembled a collection of nettles in conjunction with a transect of Europe from the Aegean to Arctic Norway (primarily conducted to examine the diversity of Salix and Salix-associated insects). Using flow cytometry to measure genome size, our sample of 29 plants reveals 5 diploids and 24 tetraploids. Two diploids were found in SE Europe (Bulgaria and Romania) and three diploids in S. Finland. More detailed cytotype surveys in these regions are suggested. The tetraploid genome size (2C value) varied between accessions from 2.36 to 2.59 pg. The diploids varied from 1.31 to 1.35 pg per 2C nucleus, equivalent to a haploid genome size of c. 650 Mbp. Within the tetraploids, we find that the most northerly samples (from N. Finland and arctic Norway) have a generally higher genome size. This is possibly indicative of a distinct population in this region. PMID:27932918
Cronk, Quentin; Hidalgo, Oriane; Pellicer, Jaume; Percy, Diana; Leitch, Ilia J
2016-01-01
The common stinging nettle, Urtica dioica L. sensu lato, is an invertebrate "superhost", its clonal patches maintaining large populations of insects and molluscs. It is extremely widespread in Europe and highly variable, and two ploidy levels (diploid and tetraploid) are known. However, geographical patterns in cytotype variation require further study. We assembled a collection of nettles in conjunction with a transect of Europe from the Aegean to Arctic Norway (primarily conducted to examine the diversity of Salix and Salix -associated insects). Using flow cytometry to measure genome size, our sample of 29 plants reveals 5 diploids and 24 tetraploids. Two diploids were found in SE Europe (Bulgaria and Romania) and three diploids in S. Finland. More detailed cytotype surveys in these regions are suggested. The tetraploid genome size (2C value) varied between accessions from 2.36 to 2.59 pg. The diploids varied from 1.31 to 1.35 pg per 2C nucleus, equivalent to a haploid genome size of c. 650 Mbp. Within the tetraploids, we find that the most northerly samples (from N. Finland and arctic Norway) have a generally higher genome size. This is possibly indicative of a distinct population in this region.
Patel, Seema
2016-11-01
Despite the advent of next-generation sequencing (NGS) technologies, sophisticated data analysis and drug development efforts, bacterial drug resistance persists and is escalating in magnitude. To better control the pathogens, a thorough understanding of their genomic architecture and dynamics is vital. Bacterial genome is extremely complex, a mosaic of numerous co-operating and antagonizing components, altruistic and self-interested entities, behavior of which are predictable and conserved to some extent, yet largely dictated by an array of variables. In this regard, mobile genetic elements (MGE), DNA repair systems, post-segregation killing systems, toxin-antitoxin (TA) systems, restriction-modification (RM) systems etc. are dominant agents and horizontal gene transfer (HGT), gene redundancy, epigenetics, phase and antigenic variation etc. processes shape the genome. By illegitimate recombinations, deletions, insertions, duplications, amplifications, inversions, conversions, translocations, modification of intergenic regions and other alterations, bacterial genome is modified to tackle stressors like drugs, and host immune effectors. Over the years, thousands of studies have investigated this aspect and mammoth amount of insights have been accumulated. This review strives to distillate the existing information, formulate hypotheses and to suggest directions, that might contribute towards improved mitigation of the vicious pathogens. Copyright © 2016 Elsevier B.V. All rights reserved.
Functional interactions of archaea, bacteria and viruses in a hypersaline endolithic community.
Crits-Christoph, Alexander; Gelsinger, Diego R; Ma, Bing; Wierzchos, Jacek; Ravel, Jacques; Davila, Alfonso; Casero, M Cristina; DiRuggiero, Jocelyne
2016-06-01
Halite endoliths in the Atacama Desert represent one of the most extreme ecosystems on Earth. Cultivation-independent methods were used to examine the functional adaptations of the microbial consortia inhabiting halite nodules. The community was dominated by haloarchaea and functional analysis attributed most of the autotrophic CO2 fixation to one unique cyanobacterium. The assembled 1.1 Mbp genome of a novel nanohaloarchaeon, Candidatus Nanopetramus SG9, revealed a photoheterotrophic life style and a low median isoelectric point (pI) for all predicted proteins, suggesting a 'salt-in' strategy for osmotic balance. Predicted proteins of the algae identified in the community also had pI distributions similar to 'salt-in' strategists. The Nanopetramus genome contained a unique CRISPR/Cas system with a spacer that matched a partial viral genome from the metagenome. A combination of reference-independent methods identified over 30 complete or near complete viral or proviral genomes with diverse genome structure, genome size, gene content and hosts. Putative hosts included Halobacteriaceae, Nanohaloarchaea and Cyanobacteria. Despite the dependence of the halite community on deliquescence for liquid water availability, this study exposed an ecosystem spanning three phylogenetic domains, containing a large diversity of viruses and predominance of a 'salt-in' strategy to balance the high osmotic pressure of the environment. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
The future of genomics in polar and alpine cyanobacteria
Anesio, Alexandre M; Sánchez-Baracaldo, Patricia
2018-01-01
Abstract In recent years, genomic analyses have arisen as an exciting way of investigating the functional capacity and environmental adaptations of numerous micro-organisms of global relevance, including cyanobacteria. In the extreme cold of Arctic, Antarctic and alpine environments, cyanobacteria are of fundamental ecological importance as primary producers and ecosystem engineers. While their role in biogeochemical cycles is well appreciated, little is known about the genomic makeup of polar and alpine cyanobacteria. In this article, we present ways that genomic techniques might be used to further our understanding of cyanobacteria in cold environments in terms of their evolution and ecology. Existing examples from other environments (e.g. marine/hot springs) are used to discuss how methods developed there might be used to investigate specific questions in the cryosphere. Phylogenomics, comparative genomics and population genomics are identified as methods for understanding the evolution and biogeography of polar and alpine cyanobacteria. Transcriptomics will allow us to investigate gene expression under extreme environmental conditions, and metagenomics can be used to complement tradition amplicon-based methods of community profiling. Finally, new techniques such as single cell genomics and metagenome assembled genomes will also help to expand our understanding of polar and alpine cyanobacteria that cannot readily be cultured. PMID:29506259
Evidence for a lineage of virulent bacteriophages that target Campylobacter.
Timms, Andrew R; Cambray-Young, Joanna; Scott, Andrew E; Petty, Nicola K; Connerton, Phillippa L; Clarke, Louise; Seeger, Kathy; Quail, Mike; Cummings, Nicola; Maskell, Duncan J; Thomson, Nicholas R; Connerton, Ian F
2010-03-30
Our understanding of the dynamics of genome stability versus gene flux within bacteriophage lineages is limited. Recently, there has been a renewed interest in the use of bacteriophages as 'therapeutic' agents; a prerequisite for their use in such therapies is a thorough understanding of their genetic complement, genome stability and their ecology to avoid the dissemination or mobilisation of phage or bacterial virulence and toxin genes. Campylobacter, a food-borne pathogen, is one of the organisms for which the use of bacteriophage is being considered to reduce human exposure to this organism. Sequencing and genome analysis was performed for two Campylobacter bacteriophages. The genomes were extremely similar at the nucleotide level (> or = 96%) with most differences accounted for by novel insertion sequences, DNA methylases and an approximately 10 kb contiguous region of metabolic genes that were dissimilar at the sequence level but similar in gene function between the two phages. Both bacteriophages contained a large number of radical S-adenosylmethionine (SAM) genes, presumably involved in boosting host metabolism during infection, as well as evidence that many genes had been acquired from a wide range of bacterial species. Further bacteriophages, from the UK Campylobacter typing set, were screened for the presence of bacteriophage structural genes, DNA methylases, mobile genetic elements and regulatory genes identified from the genome sequences. The results indicate that many of these bacteriophages are related, with 10 out of 15 showing some relationship to the sequenced genomes. Two large virulent Campylobacter bacteriophages were found to show very high levels of sequence conservation despite separation in time and place of isolation. The bacteriophages show adaptations to their host and possess genes that may enhance Campylobacter metabolism, potentially advantaging both the bacteriophage and its host. Genetic conservation has been shown to extend to other Campylobacter bacteriophages, forming a highly conserved lineage of bacteriophages that predate upon campylobacters and indicating that highly adapted bacteriophage genomes can be stable over prolonged periods of time.
Microbial minimalism: genome reduction in bacterial pathogens.
Moran, Nancy A
2002-03-08
When bacterial lineages make the transition from free-living or facultatively parasitic life cycles to permanent associations with hosts, they undergo a major loss of genes and DNA. Complete genome sequences are providing an understanding of how extreme genome reduction affects evolutionary directions and metabolic capabilities of obligate pathogens and symbionts.
Castoe, Todd A; de Koning, Jason A P; Hall, Kathryn T; Yokoyama, Ken D; Gu, Wanjun; Smith, Eric N; Feschotte, Cédric; Uetz, Peter; Ray, David A; Dobry, Jason; Bogden, Robert; Mackessy, Stephen P; Bronikowski, Anne M; Warren, Wesley C; Secor, Stephen M; Pollock, David D
2011-07-28
The Consortium for Snake Genomics is in the process of sequencing the genome and creating transcriptomic resources for the Burmese python. Here, we describe how this will be done, what analyses this work will include, and provide a timeline.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-03-01
Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Dermauw, Wannes; Van Leeuwen, Thomas; Vanholme, Bartel; Tirry, Luc
2009-01-01
Background The apparent scarcity of available sequence data has greatly impeded evolutionary studies in Acari (mites and ticks). This subclass encompasses over 48,000 species and forms the largest group within the Arachnida. Although mitochondrial genomes are widely utilised for phylogenetic and population genetic studies, only 20 mitochondrial genomes of Acari have been determined, of which only one belongs to the diverse order of the Sarcoptiformes. In this study, we describe the mitochondrial genome of the European house dust mite Dermatophagoides pteronyssinus, the most important member of this largely neglected group. Results The mitochondrial genome of D. pteronyssinus is a circular DNA molecule of 14,203 bp. It contains the complete set of 37 genes (13 protein coding genes, 2 rRNA genes and 22 tRNA genes), usually present in metazoan mitochondrial genomes. The mitochondrial gene order differs considerably from that of other Acari mitochondrial genomes. Compared to the mitochondrial genome of Limulus polyphemus, considered as the ancestral arthropod pattern, only 11 of the 38 gene boundaries are conserved. The majority strand has a 72.6% AT-content but a GC-skew of 0.194. This skew is the reverse of that normally observed for typical animal mitochondrial genomes. A microsatellite was detected in a large non-coding region (286 bp), which probably functions as the control region. Almost all tRNA genes lack a T-arm, provoking the formation of canonical cloverleaf tRNA-structures, and both rRNA genes are considerably reduced in size. Finally, the genomic sequence was used to perform a phylogenetic study. Both maximum likelihood and Bayesian inference analysis clustered D. pteronyssinus with Steganacarus magnus, forming a sistergroup of the Trombidiformes. Conclusion Although the mitochondrial genome of D. pteronyssinus shares different features with previously characterised Acari mitochondrial genomes, it is unique in many ways. Gene order is extremely rearranged and represents a new pattern within the Acari. Both tRNAs and rRNAs are truncated, corroborating the theory of the functional co-evolution of these molecules. Furthermore, the strong and reversed GC- and AT-skews suggest the inversion of the control region as an evolutionary event. Finally, phylogenetic analysis using concatenated mt gene sequences succeeded in recovering Acari relationships concordant with traditional views of phylogeny of Acari. PMID:19284646
Genomic instability in cancer: Teetering on the limit of tolerance
Andor, Noemi; Maley, Carlo C.; Ji, Hanlee P.
2017-01-01
Cancer genomic instability contributes to the phenomenon of intratumoral genetic heterogeneity, provides the genetic diversity required for natural selection and enables the extensive phenotypic diversity that is frequently observed among patients. Genomic instability has previously been associated with poor prognosis. However, we have evidence that for solid tumors of epithelial origin, extreme levels of genomic instability, where more than 75% of the genome is subject to somatic copy number alterations, are associated with a potentially better prognosis compared to intermediate levels under this threshold. This has been observed in clonal subpopulations of larger size, especially when genomic instability is shared among a limited number of clones. We hypothesize that cancers with extreme levels of genomic instability may be teetering on the brink of a threshold where so much of their genome is adversely altered that cells rarely replicate successfully. Another possibility is that tumors with high levels of genomic instability are more immunogenic than other cancers with a less extensive burden of genetic aberrations. Regardless of the exact mechanism, but hinging on our ability to quantify how a tumor’s burden of genetic aberrations is distributed among coexisting clones – genomic instability has important therapeutic implications. Herein, we explore the possibility that a high genomic instability could be the basis for a tumor’s sensitivity to DNA damaging therapies. We primarily focus on studies of epithelial-derived solid tumors. PMID:28432052
Draft genome of tule elk Cervus canadensis nannodes.
Mizzi, Jessica E; Lounsberry, Zachary T; Brown, C Titus; Sacks, Benjamin N
2017-01-01
This paper presents the first draft genome of the tule elk ( Cervus elaphus nannodes ), a subspecies native to California that underwent an extreme genetic bottleneck in the late 1800s. The genome was generated from Illumina HiSeq 3000 whole genome sequencing of four individuals, resulting in the assembly of 2.395 billion base pairs (Gbp) over 602,862 contigs over 500 bp and N50 = 6,885 bp. This genome provides a resource to facilitate future genomic research on elk and other cervids.
Symonová, Radka; Ocalewicz, Konrad; Kirtiklis, Lech; Delmastro, Giovanni Battista; Pelikánová, Šárka; Garcia, Sonia; Kovařík, Aleš
2017-05-18
Pikes represent an important genus (Esox) harbouring a pre-duplication karyotype (2n = 2x = 50) of economically important salmonid pseudopolyploids. Here, we have characterized the 5S ribosomal RNA genes (rDNA) in Esox lucius and its closely related E. cisalpinus using cytogenetic, molecular and genomic approaches. Intragenomic homogeneity and copy number estimation was carried out using Illumina reads. The higher-order structure of rDNA arrays was investigated by the analysis of long PacBio reads. Position of loci on chromosomes was determined by FISH. DNA methylation was analysed by methylation-sensitive restriction enzymes. The 5S rDNA loci occupy exclusively (peri)centromeric regions on 30-38 acrocentric chromosomes in both E. lucius and E. cisalpinus. The large number of loci is accompanied by extreme amplification of genes (>20,000 copies), which is to the best of our knowledge one of the highest copy number of rRNA genes in animals ever reported. Conserved secondary structures of predicted 5S rRNAs indicate that most of the amplified genes are potentially functional. Only few SNPs were found in genic regions indicating their high homogeneity while intergenic spacers were more heterogeneous and several families were identified. Analysis of 10-30 kb-long molecules sequenced by the PacBio technology (containing about 40% of total 5S rDNA) revealed that the vast majority (96%) of genes are organised in large several kilobase-long blocks. Dispersed genes or short tandems were less common (4%). The adjacent 5S blocks were directly linked, separated by intervening DNA and even inverted. The 5S units differing in the intergenic spacers formed both homogeneous and heterogeneous (mixed) blocks indicating variable degree of homogenisation between the loci. Both E. lucius and E. cisalpinus 5S rDNA was heavily methylated at CG dinucleotides. Extreme amplification of 5S rRNA genes in the Esox genome occurred in the absence of significant pseudogenisation suggesting its recent origin and/or intensive homogenisation processes. The dense methylation of units indicates that powerful epigenetic mechanisms have evolved in this group of fish to silence amplified genes. We discuss how the higher-order repeat structures impact on homogenisation of 5S rDNA in the genome.
Static and Dynamic Properties of DNA Confined in Nanochannels
NASA Astrophysics Data System (ADS)
Gupta, Damini
Next-generation sequencing (NGS) techniques have considerably reduced the cost of high-throughput DNA sequencing. However, it is challenging to detect large-scale genomic variations by NGS due to short read lengths. Genome mapping can easily detect large-scale structural variations because it operates on extremely large intact molecules of DNA with adequate resolution. One of the promising methods of genome mapping is based on confining large DNA molecules inside a nanochannel whose cross-sectional dimensions are approximately 50 nm. Even though this genome mapping technology has been commercialized, the current understanding of the polymer physics of DNA in nanochannel confinement is based on theories and lacks much needed experimental support. The results of this dissertation are aimed at providing a detailed experimental understanding of equilibrium properties of nanochannel-confined DNA molecules. The results are divided into three parts. In first part, we evaluate the role of channel shape on thermodynamic properties of channel confined DNA molecules using a combination of fluorescence microscopy and simulations. Specifically, we show that high aspect ratio of rectangular channels significantly alters the chain statistics as compared to an equivalent square channel with same cross-sectional area. In the second part, we present experimental evidence that weak excluded volume effects arise in DNA nanochannel confinement, which form the physical basis for the extended de Gennes regime. We also show how confinement spectroscopy and simulations can be combined to reduce molecular weight dispersity effects arising from shearing, photo-cleavage, and nonuniform staining of DNA. Finally, the third part of the thesis concerns the dynamic properties of nanochannel confined DNA. We directly measure the center-of-mass diffusivity of single DNA molecules in confinement and show that that it is necessary to modify the classical results of de Gennes to account for local chain stiffness of DNA in order to explain the experimental results. In the end, we believe that our findings from the experimental test of the phase diagram for channel-confined DNA, with careful control over molecular weight dispersity, channel geometry, and electrostatic interactions, will provide a firm foundation for the emerging genome mapping technology.
Hemani, Gibran; Yang, Jian; Vinkhuyzen, Anna; Powell, Joseph E; Willemsen, Gonneke; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Mangino, Massimo; Valdes, Ana M; Medland, Sarah E; Madden, Pamela A; Heath, Andrew C; Henders, Anjali K; Nyholt, Dale R; de Geus, Eco J C; Magnusson, Patrik K E; Ingelsson, Erik; Montgomery, Grant W; Spector, Timothy D; Boomsma, Dorret I; Pedersen, Nancy L; Martin, Nicholas G; Visscher, Peter M
2013-11-07
Evidence that complex traits are highly polygenic has been presented by population-based genome-wide association studies (GWASs) through the identification of many significant variants, as well as by family-based de novo sequencing studies indicating that several traits have a large mutational target size. Here, using a third study design, we show results consistent with extreme polygenicity for body mass index (BMI) and height. On a sample of 20,240 siblings (from 9,570 nuclear families), we used a within-family method to obtain narrow-sense heritability estimates of 0.42 (SE = 0.17, p = 0.01) and 0.69 (SE = 0.14, p = 6 × 10(-)(7)) for BMI and height, respectively, after adjusting for covariates. The genomic inflation factors from locus-specific linkage analysis were 1.69 (SE = 0.21, p = 0.04) for BMI and 2.18 (SE = 0.21, p = 2 × 10(-10)) for height. This inflation is free of confounding and congruent with polygenicity, consistent with observations of ever-increasing genomic-inflation factors from GWASs with large sample sizes, implying that those signals are due to true genetic signals across the genome rather than population stratification. We also demonstrate that the distribution of the observed test statistics is consistent with both rare and common variants underlying a polygenic architecture and that previous reports of linkage signals in complex traits are probably a consequence of polygenic architecture rather than the segregation of variants with large effects. The convergent empirical evidence from GWASs, de novo studies, and within-family segregation implies that family-based sequencing studies for complex traits require very large sample sizes because the effects of causal variants are small on average. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Johnston, Susan E; Orell, Panu; Pritchard, Victoria L; Kent, Matthew P; Lien, Sigbjørn; Niemelä, Eero; Erkinaro, Jaakko; Primmer, Craig R
2014-07-01
Delaying sexual maturation can lead to larger body size and higher reproductive success, but carries an increased risk of death before reproducing. Classical life history theory predicts that trade-offs between reproductive success and survival should lead to the evolution of an optimal strategy in a given population. However, variation in mating strategies generally persists, and in general, there remains a poor understanding of genetic and physiological mechanisms underlying this variation. One extreme case of this is in the Atlantic salmon (Salmo salar), which can show variation in the age at which they return from their marine migration to spawn (i.e. their 'sea age'). This results in large size differences between strategies, with direct implications for individual fitness. Here, we used an Illumina Infinium SNP array to identify regions of the genome associated with variation in sea age in a large population of Atlantic salmon in Northern Europe, implementing individual-based genome-wide association studies (GWAS) and population-based FST outlier analyses. We identified several regions of the genome which vary in association with phenotype and/or selection between sea ages, with nearby genes having functions related to muscle development, metabolism, immune response and mate choice. In addition, we found that individuals of different sea ages belong to different, yet sympatric populations in this system, indicating that reproductive isolation may be driven by divergence between stable strategies. Overall, this study demonstrates how genome-wide methodologies can be integrated with samples collected from wild, structured populations to understand their ecology and evolution in a natural context. © 2014 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Derelle, Evelyne; Ferraz, Conchita; Rombauts, Stephane; Rouzé, Pierre; Worden, Alexandra Z.; Robbens, Steven; Partensky, Frédéric; Degroeve, Sven; Echeynié, Sophie; Cooke, Richard; Saeys, Yvan; Wuyts, Jan; Jabbari, Kamel; Bowler, Chris; Panaud, Olivier; Piégu, Benoît; Ball, Steven G.; Ral, Jean-Philippe; Bouget, François-Yves; Piganeau, Gwenael; de Baets, Bernard; Picard, André; Delseny, Michel; Demaille, Jacques; van de Peer, Yves; Moreau, Hervé
2006-08-01
The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry. genome heterogeneity | genome sequence | green alga | Prasinophyceae | gene prediction
Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.
Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel
2018-06-01
The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Genome sequence of the cultivated cotton Gossypium arboreum
USDA-ARS?s Scientific Manuscript database
Cotton is one of the most economically important natural fiber crops in the world, and the complex tetraploid nature of its genome (AADD, 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled 98.3% of the 1.7-gigabase G. arboreum (AA, 2n = 26...
Gutiérrez-Preciado, Ana; Vargas-Chávez, Carlos; Reyes-Prieto, Mariana; Ordoñez, Omar F; Santos-García, Diego; Rosas-Pérez, Tania; Valdivia-Anistro, Jorge; Rebollar, Eria A; Saralegui, Andrés; Moya, Andrés; Merino, Enrique; Farías, María Eugenia; Latorre, Amparo; Souza, Valeria
2017-01-01
We report the genome sequence of Exiguobacterium chiriqhucha str. N139, isolated from a high-altitude Andean lake. Comparative genomic analyses of the Exiguobacterium genomes available suggest that our strain belongs to the same species as the previously reported E. pavilionensis str. RW-2 and Exiguobacterium str. GIC 31. We describe this species and propose the chiriqhucha name to group them. 'Chiri qhucha' in Quechua means 'cold lake', which is a common origin of these three cosmopolitan Exiguobacteria. The 2,952,588-bp E. chiriqhucha str. N139 genome contains one chromosome and three megaplasmids. The genome analysis of the Andean strain suggests the presence of enzymes that confer E. chiriqhucha str. N139 the ability to grow under multiple environmental extreme conditions, including high concentrations of different metals, high ultraviolet B radiation, scavenging for phosphorous and coping with high salinity. Moreover, the regulation of its tryptophan biosynthesis suggests that novel pathways remain to be discovered, and that these pathways might be fundamental in the amino acid metabolism of the microbial community from Laguna Negra, Argentina.
Whitehead, Andrew; Roach, Jennifer L; Zhang, Shujun; Galvez, Fernando
2012-04-15
The killifish Fundulus heteroclitus is abundant in osmotically dynamic estuaries and it can quickly adjust to extremes in environmental salinity. We performed a comparative osmotic challenge experiment to track the transcriptomic and physiological responses to two salinities throughout a time course of acclimation, and to explore the genome regulatory mechanisms that enable extreme osmotic acclimation. One southern and one northern coastal population, known to differ in their tolerance to hypo-osmotic exposure, were used as our comparative model. Both populations could maintain osmotic homeostasis when transferred from 32 to 0.4 p.p.t., but diverged in their compensatory abilities when challenged down to 0.1 p.p.t., in parallel with divergent transformation of gill morphology. Genes involved in cell volume regulation, nucleosome maintenance, ion transport, energetics, mitochondrion function, transcriptional regulation and apoptosis showed population- and salinity-dependent patterns of expression during acclimation. Network analysis confirmed the role of cytokine and kinase signaling pathways in coordinating the genome regulatory response to osmotic challenge, and also posited the importance of signaling coordinated through the transcription factor HNF-4α. These genome responses support hypotheses of which regulatory mechanisms are particularly relevant for enabling extreme physiological flexibility.
Musto, H; Romero, H; Zavala, A; Jabbari, K; Bernardi, G
1999-07-01
We have analyzed the patterns of synonymous codon preferences of the nuclear genes of Plasmodium falciparum, a unicellular parasite characterized by an extremely GC-poor genome. When all genes are considered, codon usage is strongly biased toward A and T in third codon positions, as expected, but multivariate statistical analysis detects a major trend among genes. At one end genes display codon choices determined mainly by the extreme genome composition of this parasite, and very probably their expression level is low. At the other end a few genes exhibit an increased relative usage of a particular subset of codons, many of which are C-ending. Since the majority of these few genes is putatively highly expressed, we postulate that the increased C-ending codons are translationally optimal. In conclusion, while codon usage of the majority of P. falciparum genes is determined mainly by compositional constraints, a small number of genes exhibit translational selection.
Structured populations of Sulfolobus acidocaldarius with susceptibility to mobile genetic elements
Anderson, Rika E.; Kouris, Angela; Seward, Christopher H.; Campbell, Kate M.; Whitaker, Rachel J.
2017-01-01
The impact of a structured environment on genome evolution can be determined through comparative population genomics of species that live in the same habitat. Recent work comparing three genome sequences of Sulfolobus acidocaldarius suggested that highly structured, extreme, hot spring environments do not limit dispersal of this thermoacidophile, in contrast to other co-occurring Sulfolobus species. Instead, a high level of conservation among these three S. acidocaldarius genomes was hypothesized to result from rapid, global-scale dispersal promoted by low susceptibility to viruses that sets S. acidocaldarius apart from its sister Sulfolobus species. To test this hypothesis, we conducted a comparative analysis of 47 genomes of S. acidocaldarius from spatial and temporal sampling of two hot springs in Yellowstone National Park. While we confirm the low diversity in the core genome, we observe differentiation among S. acidocaldarius populations, likely resulting from low migration among hot spring “islands” in Yellowstone National Park. Patterns of genomic variation indicate that differing geological contexts result in the elimination or preservation of diversity among differentiated populations. We observe multiple deletions associated with a large genomic island rich in glycosyltransferases, differential integrations of the Sulfolobus turreted icosahedral virus, as well as two different plasmid elements. These data demonstrate that neither rapid dispersal nor lack of mobile genetic elements result in low diversity in the S. acidocaldariusgenomes. We suggest instead that significant differences in the recent evolutionary history, or the intrinsic evolutionary rates, of sister Sulfolobusspecies result in the relatively low diversity of the S. acidocaldarius genome.
Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D’Hont, Angélique
2013-01-01
Background Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. Methodology/Principal Findings The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. Conclusion The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas. PMID:23840670
Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D'Hont, Angélique
2013-01-01
Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas.
Angstadt, Andrea Y; Motsinger-Reif, Alison; Thomas, Rachael; Kisseberth, William C; Guillermo Couto, C; Duval, Dawn L; Nielsen, Dahlia M; Modiano, Jaime F; Breen, Matthew
2011-11-01
Osteosarcoma (OS) is the most commonly diagnosed malignant bone tumor in humans and dogs, characterized in both species by extremely complex karyotypes exhibiting high frequencies of genomic imbalance. Evaluation of genomic signatures in human OS using array comparative genomic hybridization (aCGH) has assisted in uncovering genetic mechanisms that result in disease phenotype. Previous low-resolution (10-20 Mb) aCGH analysis of canine OS identified a wide range of recurrent DNA copy number aberrations, indicating extensive genomic instability. In this study, we profiled 123 canine OS tumors by 1 Mb-resolution aCGH to generate a dataset for direct comparison with current data for human OS, concluding that several high frequency aberrations in canine and human OS are orthologous. To ensure complete coverage of gene annotation, we identified the human refseq genes that map to these orthologous aberrant dog regions and found several candidate genes warranting evaluation for OS involvement. Specifically, subsequenct FISH and qRT-PCR analysis of RUNX2, TUSC3, and PTEN indicated that expression levels correlated with genomic copy number status, showcasing RUNX2 as an OS associated gene and TUSC3 as a possible tumor suppressor candidate. Together these data demonstrate the ability of genomic comparative oncology to identify genetic abberations which may be important for OS progression. Large scale screening of genomic imbalance in canine OS further validates the use of the dog as a suitable model for human cancers, supporting the idea that dysregulation discovered in canine cancers will provide an avenue for complementary study in human counterparts. Copyright © 2011 Wiley-Liss, Inc.
Gene-expression signatures of Atlantic salmon's plastic life cycle.
Aubin-Horth, Nadia; Letcher, Benjamin H; Hofmann, Hans A
2009-09-15
How genomic expression differs as a function of life history variation is largely unknown. Atlantic salmon exhibits extreme alternative life histories. We defined the gene-expression signatures of wild-caught salmon at two different life stages by comparing the brain expression profiles of mature sneaker males and immature males, and early migrants and late migrants. In addition to life-stage-specific signatures, we discovered a surprisingly large gene set that was differentially regulated-at similar magnitudes, yet in opposite direction-in both life history transitions. We suggest that this co-variation is not a consequence of many independent cellular and molecular switches in the same direction but rather represents the molecular equivalent of a physiological shift orchestrated by one or very few master regulators.
Reconstructing metabolic flux vectors from extreme pathways: defining the alpha-spectrum.
Wiback, Sharon J; Mahadevan, Radhakrishnan; Palsson, Bernhard Ø
2003-10-07
The move towards genome-scale analysis of cellular functions has necessitated the development of analytical (in silico) methods to understand such large and complex biochemical reaction networks. One such method is extreme pathway analysis that uses stoichiometry and thermodynamic irreversibly to define mathematically unique, systemic metabolic pathways. These extreme pathways form the edges of a high-dimensional convex cone in the flux space that contains all the attainable steady state solutions, or flux distributions, for the metabolic network. By definition, any steady state flux distribution can be described as a nonnegative linear combination of the extreme pathways. To date, much effort has been focused on calculating, defining, and understanding these extreme pathways. However, little work has been performed to determine how these extreme pathways contribute to a given steady state flux distribution. This study represents an initial effort aimed at defining how physiological steady state solutions can be reconstructed from a network's extreme pathways. In general, there is not a unique set of nonnegative weightings on the extreme pathways that produce a given steady state flux distribution but rather a range of possible values. This range can be determined using linear optimization to maximize and minimize the weightings of a particular extreme pathway in the reconstruction, resulting in what we have termed the alpha-spectrum. The alpha-spectrum defines which extreme pathways can and cannot be included in the reconstruction of a given steady state flux distribution and to what extent they individually contribute to the reconstruction. It is shown that accounting for transcriptional regulatory constraints can considerably shrink the alpha-spectrum. The alpha-spectrum is computed and interpreted for two cases; first, optimal states of a skeleton representation of core metabolism that include transcriptional regulation, and second for human red blood cell metabolism under various physiological, non-optimal conditions.
Spain, S L; Pedroso, I; Kadeva, N; Miller, M B; Iacono, W G; McGue, M; Stergiakouli, E; Smith, G D; Putallaz, M; Lubinski, D; Meaburn, E L; Plomin, R; Simpson, M A
2016-01-01
Although individual differences in intelligence (general cognitive ability) are highly heritable, molecular genetic analyses to date have had limited success in identifying specific loci responsible for its heritability. This study is the first to investigate exome variation in individuals of extremely high intelligence. Under the quantitative genetic model, sampling from the high extreme of the distribution should provide increased power to detect associations. We therefore performed a case–control association analysis with 1409 individuals drawn from the top 0.0003 (IQ >170) of the population distribution of intelligence and 3253 unselected population-based controls. Our analysis focused on putative functional exonic variants assayed on the Illumina HumanExome BeadChip. We did not observe any individual protein-altering variants that are reproducibly associated with extremely high intelligence and within the entire distribution of intelligence. Moreover, no significant associations were found for multiple rare alleles within individual genes. However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. In addition, investigation of nominally significant associations revealed fewer rare alleles associated with extremely high intelligence than would be expected under the null hypothesis. This observation is consistent with the hypothesis that rare functional alleles are more frequently detrimental than beneficial to intelligence. PMID:26239293
Spain, S L; Pedroso, I; Kadeva, N; Miller, M B; Iacono, W G; McGue, M; Stergiakouli, E; Davey Smith, G; Putallaz, M; Lubinski, D; Meaburn, E L; Plomin, R; Simpson, M A
2016-08-01
Although individual differences in intelligence (general cognitive ability) are highly heritable, molecular genetic analyses to date have had limited success in identifying specific loci responsible for its heritability. This study is the first to investigate exome variation in individuals of extremely high intelligence. Under the quantitative genetic model, sampling from the high extreme of the distribution should provide increased power to detect associations. We therefore performed a case-control association analysis with 1409 individuals drawn from the top 0.0003 (IQ >170) of the population distribution of intelligence and 3253 unselected population-based controls. Our analysis focused on putative functional exonic variants assayed on the Illumina HumanExome BeadChip. We did not observe any individual protein-altering variants that are reproducibly associated with extremely high intelligence and within the entire distribution of intelligence. Moreover, no significant associations were found for multiple rare alleles within individual genes. However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. In addition, investigation of nominally significant associations revealed fewer rare alleles associated with extremely high intelligence than would be expected under the null hypothesis. This observation is consistent with the hypothesis that rare functional alleles are more frequently detrimental than beneficial to intelligence.
Optimizing high performance computing workflow for protein functional annotation.
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-09-10
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Snyder-Mackler, Noah; Majoros, William H.; Yuan, Michael L.; Shaver, Amanda O.; Gordon, Jacob B.; Kopp, Gisela H.; Schlebusch, Stephen A.; Wall, Jeffrey D.; Alberts, Susan C.; Mukherjee, Sayan; Zhou, Xiang; Tung, Jenny
2016-01-01
Research on the genetics of natural populations was revolutionized in the 1990s by methods for genotyping noninvasively collected samples. However, these methods have remained largely unchanged for the past 20 years and lag far behind the genomics era. To close this gap, here we report an optimized laboratory protocol for genome-wide capture of endogenous DNA from noninvasively collected samples, coupled with a novel computational approach to reconstruct pedigree links from the resulting low-coverage data. We validated both methods using fecal samples from 62 wild baboons, including 48 from an independently constructed extended pedigree. We enriched fecal-derived DNA samples up to 40-fold for endogenous baboon DNA and reconstructed near-perfect pedigree relationships even with extremely low-coverage sequencing. We anticipate that these methods will be broadly applicable to the many research systems for which only noninvasive samples are available. The lab protocol and software (“WHODAD”) are freely available at www.tung-lab.org/protocols-and-software.html and www.xzlab.org/software.html, respectively. PMID:27098910
Optimizing high performance computing workflow for protein functional annotation
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-01-01
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
Equilibrium properties of DNA and other semiflexible polymers confined in nanochannels
NASA Astrophysics Data System (ADS)
Muralidhar, Abhiram
Recent developments in next-generation sequencing (NGS) techniques have opened the door for low-cost, high-throughput sequencing of genomes. However, these developments have also exposed the inability of NGS to track large scale genomic information, which are extremely important to understand the relationship between genotype and phenotype. Genome mapping offers a reliable way to obtain information about large-scale structural variations in a given genome. A promising variant of genome mapping involves confining single DNA molecules in nanochannels whose cross-sectional dimensions are approximately 50 nm. Despite the development and commercialization of nanochannel-based genome mapping technology, the polymer physics of DNA in confinement is only beginning to be understood. Apart from its biological relevance, DNA is also used as a model polymer in experiments by polymer physicists. Indeed, the seminal experiments by Reisner et al. (2005) of DNA confined in nanochannels of different widths revealed discrepancies with the classical theories of Odijk and de Gennes for polymer confinement. Picking up from the conclusions of the dissertation of Tree (2014), this dissertation addresses a number of key outstanding problems in the area of nanoconfined DNA. Adopting a Monte Carlo chain growth technique known as the pruned-enriched Rosenbluth method, we examine the equilibrium and near-equilibrium properties of DNA and other semiflexible polymers in nanochannel confinement. We begin by analyzing the dependence of molecular weight on various thermodynamic properties of confined semiflexible polymers. This allows us to point out the finite size effects that can occur when using low molecular weight DNA in experiments. We then analyze the statistics of backfolding and hairpin formation in the context of existing theories and discuss how our results can be used to engineer better conditions for genome mapping. Finally, we elucidate the diffusion behavior of confined semiflexible polymers by comparing and contrasting our results for asymptotically long chains with other similar studies in the literature. We expect our findings to be not only beneficial to the design of better genome mapping devices, but also to the fundamental understanding of semiflexible polymers in confinement.
From the genome sequence to the protein inventory of Bacillus subtilis.
Becher, Dörte; Büttner, Knut; Moche, Martin; Hessling, Bernd; Hecker, Michael
2011-08-01
Owing to the low number of proteins necessary to render a bacterial cell viable, bacteria are extremely attractive model systems to understand how the genome sequence is translated into actual life processes. One of the most intensively investigated model organisms is Bacillus subtilis. It has attracted world-wide research interest, addressing cell differentiation and adaptation on a molecular scale as well as biotechnological production processes. Meanwhile, we are looking back on more than 25 years of B. subtilis proteomics. A wide range of methods have been developed during this period for the large-scale qualitative and quantitative proteome analysis. Currently, it is possible to identify and quantify more than 50% of the predicted proteome in different cellular subfractions. In this review, we summarize the development of B. subtilis proteomics during the past 25 years. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Draft Genome Sequence of a Bacillus Bacterium from the Atacama Desert Wetlands Metagenome
Vilo, Claudia; Galetovic, Alexandra; Araya, Jorge E.; Dong, Qunfeng
2015-01-01
We report here the draft genome sequence of a Bacillus bacterium isolated from the microflora of Nostoc colonies grown at the Andean wetlands in northern Chile. We consider this genome sequence to be a molecular tool for exploring microbial relationships and adaptation strategies to the prevailing extreme conditions at the Atacama Desert. PMID:26294639
Utturkar, Sagar M.; Huber, Harald; Leptihn, Sebastian; ...
2016-02-25
We report here the draft genome sequence of Pyrodictium occultum PL19 T, a marine hyperthermophilic archaeon. In addition, the genome provides insights into molecular and cellular adaptation mechanisms to life in extreme environments and the evolution of early organisms on Earth.
A massively parallel strategy for STR marker development, capture, and genotyping.
Kistler, Logan; Johnson, Stephen M; Irwin, Mitchell T; Louis, Edward E; Ratan, Aakrosh; Perry, George H
2017-09-06
Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci-97.3-99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Sloan, Daniel B.; Nakabachi, Atsushi; Richards, Stephen; Qu, Jiaxin; Murali, Shwetha Canchi; Gibbs, Richard A.; Moran, Nancy A.
2014-01-01
Bacteria confined to intracellular environments experience extensive genome reduction. In extreme cases, insect endosymbionts have evolved genomes that are so gene-poor that they blur the distinction between bacteria and endosymbiotically derived organelles such as mitochondria and plastids. To understand the host’s role in this extreme gene loss, we analyzed gene content and expression in the nuclear genome of the psyllid Pachypsylla venusta, a sap-feeding insect that harbors an ancient endosymbiont (Carsonella) with one of the most reduced bacterial genomes ever identified. Carsonella retains many genes required for synthesis of essential amino acids that are scarce in plant sap, but most of these biosynthetic pathways have been disrupted by gene loss. Host genes that are upregulated in psyllid cells housing Carsonella appear to compensate for endosymbiont gene losses, resulting in highly integrated metabolic pathways that mirror those observed in other sap-feeding insects. The host contribution to these pathways is mediated by a combination of native eukaryotic genes and bacterial genes that were horizontally transferred from multiple donor lineages early in the evolution of psyllids, including one gene that appears to have been directly acquired from Carsonella. By comparing the psyllid genome to a recent analysis of mealybugs, we found that a remarkably similar set of functional pathways have been shaped by independent transfers of bacterial genes to the two hosts. These results show that horizontal gene transfer is an important and recurring mechanism driving coevolution between insects and their bacterial endosymbionts and highlight interesting similarities and contrasts with the evolutionary history of mitochondria and plastids. PMID:24398322
Seabury, Christopher M.; Dowd, Scot E.; Seabury, Paul M.; Raudsepp, Terje; Brightsmith, Donald J.; Liboriussen, Poul; Halley, Yvette; Fisher, Colleen A.; Owens, Elaine; Viswanathan, Ganesh; Tizard, Ian R.
2013-01-01
Data deposition to NCBI Genomes This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AMXX00000000 (SMACv1.0, unscaffolded genome assembly). The version described in this paper is the first version (AMXX01000000). The scaffolded assembly (SMACv1.1) has been deposited at DDBJ/EMBL/GenBank under the accession AOUJ00000000, and is also the first version (AOUJ01000000). Strong biological interest in traits such as the acquisition and utilization of speech, cognitive abilities, and longevity catalyzed the utilization of two next-generation sequencing platforms to provide the first-draft de novo genome assembly for the large, new world parrot Ara macao (Scarlet Macaw). Despite the challenges associated with genome assembly for an outbred avian species, including 951,507 high-quality putative single nucleotide polymorphisms, the final genome assembly (>1.035 Gb) includes more than 997 Mb of unambiguous sequence data (excluding N’s). Cytogenetic analyses including ZooFISH revealed complex rearrangements associated with two scarlet macaw macrochromosomes (AMA6, AMA7), which supports the hypothesis that translocations, fusions, and intragenomic rearrangements are key factors associated with karyotype evolution among parrots. In silico annotation of the scarlet macaw genome provided robust evidence for 14,405 nuclear gene annotation models, their predicted transcripts and proteins, and a complete mitochondrial genome. Comparative analyses involving the scarlet macaw, chicken, and zebra finch genomes revealed high levels of nucleotide-based conservation as well as evidence for overall genome stability among the three highly divergent species. Application of a new whole-genome analysis of divergence involving all three species yielded prioritized candidate genes and noncoding regions for parrot traits of interest (i.e., speech, intelligence, longevity) which were independently supported by the results of previous human GWAS studies. We also observed evidence for genes and noncoding loci that displayed extreme conservation across the three avian lineages, thereby reflecting their likely biological and developmental importance among birds. PMID:23667475
2011-01-01
Background We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. Results The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Conclusions Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution. PMID:21854559
Kanost, Michael R.; Arrese, Estela L.; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G.; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L.; Vogel, Heiko; Walters, James; Waterhouse, Robert M.; Ahn, Seung-Joon; Almeida, Francisca C.; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B.; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M.; Clarke, David F.; Dittmer, Neal T.; Ferguson, Laura C.F.; Garavelou, Spyridoula; Gordon, Karl H.J.; Gunaratna, Ramesh T.; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R.; Kuwar, Suyog S.; Lee, Sandy L.; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H.; McCulloch, Kyle J.; Mathew, Tittu; Morton, Brian; Muzny, Donna M.; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K.; Worley, Kim C.; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D.; Burmester, Thorsten; Clem, Rollie J.; Feyereisen, René; Grimmelikhuijzen, Cornelis J.P; Hamodrakas, Stavros J.; Hansson, Bill S.; Huguet, Elisabeth; Jermiin, Lars S.; Lan, Que; Lehman, Herman K.; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B.; Muthukrishnan, Subbaratnam; Oakeshott, John G.; Palmer, Will; Park, Yoonseong; Passarelli, A. Lorena; Rozas, Julio; Schwartz, Lawrence M.; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J.; Scherer, Steven E.; Richards, Stephen; Blissard, Gary W.
2016-01-01
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects. PMID:27522922
The Genomic Diversification of the Whole Acinetobacter Genus: Origins, Mechanisms, and Consequences
Touchon, Marie; Cury, Jean; Yoon, Eun-Jeong; Krizova, Lenka; Cerqueira, Gustavo C.; Murphy, Cheryl; Feldgarden, Michael; Wortman, Jennifer; Clermont, Dominique; Lambert, Thierry; Grillot-Courvalin, Catherine; Nemec, Alexandr; Courvalin, Patrice; Rocha, Eduardo P.C.
2014-01-01
Bacterial genomics has greatly expanded our understanding of microdiversification patterns within a species, but analyses at higher taxonomical levels are necessary to understand and predict the independent rise of pathogens in a genus. We have sampled, sequenced, and assessed the diversity of genomes of validly named and tentative species of the Acinetobacter genus, a clade including major nosocomial pathogens and biotechnologically important species. We inferred a robust global phylogeny and delimited several new putative species. The genus is very ancient and extremely diverse: Genomes of highly divergent species share more orthologs than certain strains within a species. We systematically characterized elements and mechanisms driving genome diversification, such as conjugative elements, insertion sequences, and natural transformation. We found many error-prone polymerases that may play a role in resistance to toxins, antibiotics, and in the generation of genetic variation. Surprisingly, temperate phages, poorly studied in Acinetobacter, were found to account for a significant fraction of most genomes. Accordingly, many genomes encode clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems with some of the largest CRISPR-arrays found so far in bacteria. Integrons are strongly overrepresented in Acinetobacter baumannii, which correlates with its frequent resistance to antibiotics. Our data suggest that A. baumannii arose from an ancient population bottleneck followed by population expansion under strong purifying selection. The outstanding diversification of the species occurred largely by horizontal transfer, including some allelic recombination, at specific hotspots preferentially located close to the replication terminus. Our work sets a quantitative basis to understand the diversification of Acinetobacter into emerging resistant and versatile pathogens. PMID:25313016
Renfree, Marilyn B; Papenfuss, Anthony T; Deakin, Janine E; Lindsay, James; Heider, Thomas; Belov, Katherine; Rens, Willem; Waters, Paul D; Pharo, Elizabeth A; Shaw, Geoff; Wong, Emily S W; Lefèvre, Christophe M; Nicholas, Kevin R; Kuroki, Yoko; Wakefield, Matthew J; Zenger, Kyall R; Wang, Chenwei; Ferguson-Smith, Malcolm; Nicholas, Frank W; Hickford, Danielle; Yu, Hongshi; Short, Kirsty R; Siddle, Hannah V; Frankenberg, Stephen R; Chew, Keng Yih; Menzies, Brandon R; Stringer, Jessica M; Suzuki, Shunsuke; Hore, Timothy A; Delbridge, Margaret L; Patel, Hardip R; Mohammadi, Amir; Schneider, Nanette Y; Hu, Yanqiu; O'Hara, William; Al Nadaf, Shafagh; Wu, Chen; Feng, Zhi-Ping; Cocks, Benjamin G; Wang, Jianghui; Flicek, Paul; Searle, Stephen M J; Fairley, Susan; Beal, Kathryn; Herrero, Javier; Carone, Dawn M; Suzuki, Yutaka; Sugano, Sumio; Toyoda, Atsushi; Sakaki, Yoshiyuki; Kondo, Shinji; Nishida, Yuichiro; Tatsumoto, Shoji; Mandiou, Ion; Hsu, Arthur; McColl, Kaighin A; Lansdell, Benjamin; Weinstock, George; Kuczek, Elizabeth; McGrath, Annette; Wilson, Peter; Men, Artem; Hazar-Rethinam, Mehlika; Hall, Allison; Davis, John; Wood, David; Williams, Sarah; Sundaravadanam, Yogi; Muzny, Donna M; Jhangiani, Shalini N; Lewis, Lora R; Morgan, Margaret B; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Nazareth, Lynne; Cree, Andrew; Fowler, Gerald; Kovar, Christie L; Dinh, Huyen H; Joshi, Vandita; Jing, Chyn; Lara, Fremiet; Thornton, Rebecca; Chen, Lei; Deng, Jixin; Liu, Yue; Shen, Joshua Y; Song, Xing-Zhi; Edson, Janette; Troon, Carmen; Thomas, Daniel; Stephens, Amber; Yapa, Lankesha; Levchenko, Tanya; Gibbs, Richard A; Cooper, Desmond W; Speed, Terence P; Fujiyama, Asao; Graves, Jennifer A M; O'Neill, Rachel J; Pask, Andrew J; Forrest, Susan M; Worley, Kim C
2011-08-29
We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.
Mutational Effects and Population Dynamics During Viral Adaptation Challenge Current Models
Miller, Craig R.; Joyce, Paul; Wichman, Holly A.
2011-01-01
Adaptation in haploid organisms has been extensively modeled but little tested. Using a microvirid bacteriophage (ID11), we conducted serial passage adaptations at two bottleneck sizes (104 and 106), followed by fitness assays and whole-genome sequencing of 631 individual isolates. Extensive genetic variation was observed including 22 beneficial, several nearly neutral, and several deleterious mutations. In the three large bottleneck lines, up to eight different haplotypes were observed in samples of 23 genomes from the final time point. The small bottleneck lines were less diverse. The small bottleneck lines appeared to operate near the transition between isolated selective sweeps and conditions of complex dynamics (e.g., clonal interference). The large bottleneck lines exhibited extensive interference and less stochasticity, with multiple beneficial mutations establishing on a variety of backgrounds. Several leapfrog events occurred. The distribution of first-step adaptive mutations differed significantly from the distribution of second-steps, and a surprisingly large number of second-step beneficial mutations were observed on a highly fit first-step background. Furthermore, few first-step mutations appeared as second-steps and second-steps had substantially smaller selection coefficients. Collectively, the results indicate that the fitness landscape falls between the extremes of smooth and fully uncorrelated, violating the assumptions of many current mutational landscape models. PMID:21041559
Evolution of a Cellular Immune Response in Drosophila: A Phenotypic and Genomic Comparative Analysis
Salazar-Jaramillo, Laura; Paspati, Angeliki; van de Zande, Louis; Vermeulen, Cornelis Joseph; Schwander, Tanja; Wertheim, Bregje
2014-01-01
Understanding the genomic basis of evolutionary adaptation requires insight into the molecular basis underlying phenotypic variation. However, even changes in molecular pathways associated with extreme variation, gains and losses of specific phenotypes, remain largely uncharacterized. Here, we investigate the large interspecific differences in the ability to survive infection by parasitoids across 11 Drosophila species and identify genomic changes associated with gains and losses of parasitoid resistance. We show that a cellular immune defense, encapsulation, and the production of a specialized blood cell, lamellocytes, are restricted to a sublineage of Drosophila, but that encapsulation is absent in one species of this sublineage, Drosophila sechellia. Our comparative analyses of hemopoiesis pathway genes and of genes differentially expressed during the encapsulation response revealed that hemopoiesis-associated genes are highly conserved and present in all species independently of their resistance. In contrast, 11 genes that are differentially expressed during the response to parasitoids are novel genes, specific to the Drosophila sublineage capable of lamellocyte-mediated encapsulation. These novel genes, which are predominantly expressed in hemocytes, arose via duplications, whereby five of them also showed signatures of positive selection, as expected if they were recruited for new functions. Three of these novel genes further showed large-scale and presumably loss-of-function sequence changes in D. sechellia, consistent with the loss of resistance in this species. In combination, these convergent lines of evidence suggest that co-option of duplicated genes in existing pathways and subsequent neofunctionalization are likely to have contributed to the evolution of the lamellocyte-mediated encapsulation in Drosophila. PMID:24443439
Salazar-Jaramillo, Laura; Paspati, Angeliki; van de Zande, Louis; Vermeulen, Cornelis Joseph; Schwander, Tanja; Wertheim, Bregje
2014-02-01
Understanding the genomic basis of evolutionary adaptation requires insight into the molecular basis underlying phenotypic variation. However, even changes in molecular pathways associated with extreme variation, gains and losses of specific phenotypes, remain largely uncharacterized. Here, we investigate the large interspecific differences in the ability to survive infection by parasitoids across 11 Drosophila species and identify genomic changes associated with gains and losses of parasitoid resistance. We show that a cellular immune defense, encapsulation, and the production of a specialized blood cell, lamellocytes, are restricted to a sublineage of Drosophila, but that encapsulation is absent in one species of this sublineage, Drosophila sechellia. Our comparative analyses of hemopoiesis pathway genes and of genes differentially expressed during the encapsulation response revealed that hemopoiesis-associated genes are highly conserved and present in all species independently of their resistance. In contrast, 11 genes that are differentially expressed during the response to parasitoids are novel genes, specific to the Drosophila sublineage capable of lamellocyte-mediated encapsulation. These novel genes, which are predominantly expressed in hemocytes, arose via duplications, whereby five of them also showed signatures of positive selection, as expected if they were recruited for new functions. Three of these novel genes further showed large-scale and presumably loss-of-function sequence changes in D. sechellia, consistent with the loss of resistance in this species. In combination, these convergent lines of evidence suggest that co-option of duplicated genes in existing pathways and subsequent neofunctionalization are likely to have contributed to the evolution of the lamellocyte-mediated encapsulation in Drosophila.
Daly, M J
2001-09-01
During the past decades, representatives of Archaea, Bacteria, and Protista have been found thriving in many newly discovered extremely hostile habitats, which hitherto were regarded as too harsh to harbor life. To illustrate how an extremophile could be targeted for development as a biowarfare agent, an example is presented describing current advances in engineering Deinococcus radiodurans. Using a generally applicable combination of conventional genetic engineering and genomic informatics, this extremely radiation-resistant and environmentally robust bacterium is being developed for biotechnology.
Li, Dalin; Lewinger, Juan Pablo; Gauderman, William J; Murcray, Cassandra Elizabeth; Conti, David
2011-12-01
Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach for a likelihood-based analysis method. We then used this approach to demonstrate the potential advantages of extreme phenotype sampling for rare variants. Next, we discussed how this design can influence future sequencing-based association studies from a cost-efficiency (with the phenotyping cost included) perspective. Moreover, we discussed the potential of a two-stage design with the extreme sample as the first stage and the remaining nonextreme subjects as the second stage. We demonstrated that this two-stage design is a cost-efficient alternative to the one-stage cross-sectional design or traditional two-stage design. We then discussed the analysis strategies for this extreme two-stage design and proposed a corresponding design optimization procedure. To address many practical concerns, for example measurement error or phenotypic heterogeneity at the very extremes, we examined an approach in which individuals with very extreme phenotypes are discarded. We demonstrated that even with a substantial proportion of these extreme individuals discarded, an extreme-based sampling can still be more efficient. Finally, we expanded the current analysis and design framework to accommodate the CMC approach where multiple rare-variants in the same gene region are analyzed jointly. © 2011 Wiley Periodicals, Inc.
Li, Dalin; Lewinger, Juan Pablo; Gauderman, William J.; Murcray, Cassandra Elizabeth; Conti, David
2014-01-01
Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach for a likelihood-based analysis method. We then used this approach to demonstrate the potential advantages of extreme phenotype sampling for rare variants. Next, we discussed how this design can influence future sequencing-based association studies from a cost-efficiency (with the phenotyping cost included) perspective. Moreover, we discussed the potential of a two-stage design with the extreme sample as the first stage and the remaining nonextreme subjects as the second stage. We demonstrated that this two-stage design is a cost-efficient alternative to the one-stage cross-sectional design or traditional two-stage design. We then discussed the analysis strategies for this extreme two-stage design and proposed a corresponding design optimization procedure. To address many practical concerns, for example measurement error or phenotypic heterogeneity at the very extremes, we examined an approach in which individuals with very extreme phenotypes are discarded. We demonstrated that even with a substantial proportion of these extreme individuals discarded, an extreme-based sampling can still be more efficient. Finally, we expanded the current analysis and design framework to accommodate the CMC approach where multiple rare-variants in the same gene region are analyzed jointly. PMID:21922541
Moskalev, Alexey А; Kudryavtseva, Anna V; Graphodatsky, Alexander S; Beklemisheva, Violetta R; Serdyukova, Natalya A; Krutovsky, Konstantin V; Sharov, Vadim V; Kulakovskiy, Ivan V; Lando, Andrey S; Kasianov, Artem S; Kuzmin, Dmitry A; Putintseva, Yuliya A; Feranchuk, Sergey I; Shaposhnikov, Mikhail V; Fraifeld, Vadim E; Toren, Dmitri; Snezhkina, Anastasia V; Sitnik, Vasily V
2017-12-28
Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.
Whole-Genome Sequencing of the World’s Oldest People
Gierman, Hinco J.; Fortney, Kristen; Roach, Jared C.; Coles, Natalie S.; Li, Hong; Glusman, Gustavo; Markov, Glenn J.; Smith, Justin D.; Hood, Leroy; Coles, L. Stephen; Kim, Stuart K.
2014-01-01
Supercentenarians (110 years or older) are the world’s oldest people. Seventy four are alive worldwide, with twenty two in the United States. We performed whole-genome sequencing on 17 supercentenarians to explore the genetic basis underlying extreme human longevity. We found no significant evidence of enrichment for a single rare protein-altering variant or for a gene harboring different rare protein altering variants in supercentenarian compared to control genomes. We followed up on the gene most enriched for rare protein-altering variants in our cohort of supercentenarians, TSHZ3, by sequencing it in a second cohort of 99 long-lived individuals but did not find a significant enrichment. The genome of one supercentenarian had a pathogenic mutation in DSC2, known to predispose to arrhythmogenic right ventricular cardiomyopathy, which is recommended to be reported to this individual as an incidental finding according to a recent position statement by the American College of Medical Genetics and Genomics. Even with this pathogenic mutation, the proband lived to over 110 years. The entire list of rare protein-altering variants and DNA sequence of all 17 supercentenarian genomes is available as a resource to assist the discovery of the genetic basis of extreme longevity in future studies. PMID:25390934
Whole-genome sequencing of the world's oldest people.
Gierman, Hinco J; Fortney, Kristen; Roach, Jared C; Coles, Natalie S; Li, Hong; Glusman, Gustavo; Markov, Glenn J; Smith, Justin D; Hood, Leroy; Coles, L Stephen; Kim, Stuart K
2014-01-01
Supercentenarians (110 years or older) are the world's oldest people. Seventy four are alive worldwide, with twenty two in the United States. We performed whole-genome sequencing on 17 supercentenarians to explore the genetic basis underlying extreme human longevity. We found no significant evidence of enrichment for a single rare protein-altering variant or for a gene harboring different rare protein altering variants in supercentenarian compared to control genomes. We followed up on the gene most enriched for rare protein-altering variants in our cohort of supercentenarians, TSHZ3, by sequencing it in a second cohort of 99 long-lived individuals but did not find a significant enrichment. The genome of one supercentenarian had a pathogenic mutation in DSC2, known to predispose to arrhythmogenic right ventricular cardiomyopathy, which is recommended to be reported to this individual as an incidental finding according to a recent position statement by the American College of Medical Genetics and Genomics. Even with this pathogenic mutation, the proband lived to over 110 years. The entire list of rare protein-altering variants and DNA sequence of all 17 supercentenarian genomes is available as a resource to assist the discovery of the genetic basis of extreme longevity in future studies.
Reyes-Prieto, Mariana; Ordoñez, Omar F.; Santos-García, Diego; Rosas-Pérez, Tania; Valdivia-Anistro, Jorge; Rebollar, Eria A.; Saralegui, Andrés; Moya, Andrés; Merino, Enrique; Farías, María Eugenia
2017-01-01
We report the genome sequence of Exiguobacterium chiriqhucha str. N139, isolated from a high-altitude Andean lake. Comparative genomic analyses of the Exiguobacterium genomes available suggest that our strain belongs to the same species as the previously reported E. pavilionensis str. RW-2 and Exiguobacterium str. GIC 31. We describe this species and propose the chiriqhucha name to group them. ‘Chiri qhucha’ in Quechua means ‘cold lake’, which is a common origin of these three cosmopolitan Exiguobacteria. The 2,952,588-bp E. chiriqhucha str. N139 genome contains one chromosome and three megaplasmids. The genome analysis of the Andean strain suggests the presence of enzymes that confer E. chiriqhucha str. N139 the ability to grow under multiple environmental extreme conditions, including high concentrations of different metals, high ultraviolet B radiation, scavenging for phosphorous and coping with high salinity. Moreover, the regulation of its tryptophan biosynthesis suggests that novel pathways remain to be discovered, and that these pathways might be fundamental in the amino acid metabolism of the microbial community from Laguna Negra, Argentina. PMID:28439458
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-01-01
Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
Lima-Costa, M. Fernanda; Rodrigues, Laura C.; Barreto, Maurício L.; Gouveia, Mateus; Horta, Bernardo L.; Mambrini, Juliana; Kehdy, Fernanda S. G.; Pereira, Alexandre; Rodrigues-Soares, Fernanda; Victora, Cesar G.; Tarazona-Santos, Eduardo; Cesar, Cibele C.; Conceição, Jackson S.; Costa, Gustavo N.O.; Esteban, Nubia; Fiaccone, Rosemeire L.; Figueiredo, Camila A.; Firmo, Josélia O.A.; Horimoto, Andrea R.V.R.; Leal, Thiago P.; Machado, Moara; Magalhães, Wagner C.S.; de Oliveira, Isabel Oliveira; Peixoto, Sérgio V.; Rodrigues, Maíra R.; Santos, Hadassa C.; Silva, Thiago M.
2015-01-01
Brazil never had segregation laws defining membership of an ethnoracial group. Thus, the composition of the Brazilian population is mixed, and its ethnoracial classification is complex. Previous studies showed conflicting results on the correlation between genome ancestry and ethnoracial classification in Brazilians. We used 370,539 Single Nucleotide Polymorphisms to quantify this correlation in 5,851 community-dwelling individuals in the South (Pelotas), Southeast (Bambui) and Northeast (Salvador) Brazil. European ancestry was predominant in Pelotas and Bambui (median = 85.3% and 83.8%, respectively). African ancestry was highest in Salvador (median = 50.5%). The strength of the association between the phenotype and median proportion of African ancestry varied largely across populations, with pseudo R2 values of 0.50 in Pelotas, 0.22 in Bambui and 0.13 in Salvador. The continuous proportion of African genomic ancestry showed a significant S-shape positive association with self-reported Blacks in the three sites, and the reverse trend was found for self reported Whites, with most consistent classifications in the extremes of the high and low proportion of African ancestry. In self-classified Mixed individuals, the predicted probability of having African ancestry was bell-shaped. Our results support the view that ethnoracial self-classification is affected by both genome ancestry and non-biological factors. PMID:25913126
Finstad, Kari M.; Probst, Alexander J.; Thomas, Brian C.; ...
2017-07-28
Although once thought to be devoid of biology, recent studies have identified salt deposits as oases for life in the hyperarid Atacama Desert. To examine spatial patterns of microbial species and key nutrient sources, we genomically characterized 26 salt crusts from three sites along a fog gradient. The communities are dominated by a large variety of Halobacteriales and Bacteroidetes, plus a few algal and Cyanobacterial species. CRISPR locus analysis suggests the distribution of a single Cyanobacterial population among all sites. This is in stark contrast to the extremely high sample specificity of most other community members. Only present at themore » highest moisture site is a genomically characterized Thermoplasmatales archaeon (Marine Group II) and six Nanohaloarchaea, one of which is represented by a complete genome. Parcubacteria (OD1) and Saccharibacteria (TM7), not previously reported from hypersaline environments, were found at low abundances. We found no indication of a N 2 fixation pathway in the communities, suggesting acquisition of bioavailable nitrogen from atmospherically derived nitrate. Samples cluster by site based on bacterial and archaeal abundance patterns and photosynthetic capacity decreases with increasing distance from the ocean. We conclude that moisture level, controlled by coastal fog intensity, is the strongest driver of community membership.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Finstad, Kari M.; Probst, Alexander J.; Thomas, Brian C.
Although once thought to be devoid of biology, recent studies have identified salt deposits as oases for life in the hyperarid Atacama Desert. To examine spatial patterns of microbial species and key nutrient sources, we genomically characterized 26 salt crusts from three sites along a fog gradient. The communities are dominated by a large variety of Halobacteriales and Bacteroidetes, plus a few algal and Cyanobacterial species. CRISPR locus analysis suggests the distribution of a single Cyanobacterial population among all sites. This is in stark contrast to the extremely high sample specificity of most other community members. Only present at themore » highest moisture site is a genomically characterized Thermoplasmatales archaeon (Marine Group II) and six Nanohaloarchaea, one of which is represented by a complete genome. Parcubacteria (OD1) and Saccharibacteria (TM7), not previously reported from hypersaline environments, were found at low abundances. We found no indication of a N 2 fixation pathway in the communities, suggesting acquisition of bioavailable nitrogen from atmospherically derived nitrate. Samples cluster by site based on bacterial and archaeal abundance patterns and photosynthetic capacity decreases with increasing distance from the ocean. We conclude that moisture level, controlled by coastal fog intensity, is the strongest driver of community membership.« less
Finstad, Kari M; Probst, Alexander J; Thomas, Brian C; Andersen, Gary L; Demergasso, Cecilia; Echeverría, Alex; Amundson, Ronald G; Banfield, Jillian F
2017-01-01
Although once thought to be devoid of biology, recent studies have identified salt deposits as oases for life in the hyperarid Atacama Desert. To examine spatial patterns of microbial species and key nutrient sources, we genomically characterized 26 salt crusts from three sites along a fog gradient. The communities are dominated by a large variety of Halobacteriales and Bacteroidetes, plus a few algal and Cyanobacterial species. CRISPR locus analysis suggests the distribution of a single Cyanobacterial population among all sites. This is in stark contrast to the extremely high sample specificity of most other community members. Only present at the highest moisture site is a genomically characterized Thermoplasmatales archaeon (Marine Group II) and six Nanohaloarchaea, one of which is represented by a complete genome. Parcubacteria (OD1) and Saccharibacteria (TM7), not previously reported from hypersaline environments, were found at low abundances. We found no indication of a N 2 fixation pathway in the communities, suggesting acquisition of bioavailable nitrogen from atmospherically derived nitrate. Samples cluster by site based on bacterial and archaeal abundance patterns and photosynthetic capacity decreases with increasing distance from the ocean. We conclude that moisture level, controlled by coastal fog intensity, is the strongest driver of community membership.
Bray, Molly S; Loos, Ruth J F; McCaffery, Jeanne M; Ling, Charlotte; Franks, Paul W; Weinstock, George M; Snyder, Michael P; Vassy, Jason L; Agurs-Collins, Tanya
2016-01-01
Precision medicine utilizes genomic and other data to optimize and personalize treatment. Although more than 2,500 genetic tests are currently available, largely for extreme and/or rare phenotypes, the question remains whether this approach can be used for the treatment of common, complex conditions like obesity, inflammation, and insulin resistance, which underlie a host of metabolic diseases. This review, developed from a Trans-NIH Conference titled "Genes, Behaviors, and Response to Weight Loss Interventions," provides an overview of the state of genetic and genomic research in the area of weight change and identifies key areas for future research. Although many loci have been identified that are associated with cross-sectional measures of obesity/body size, relatively little is known regarding the genes/loci that influence dynamic measures of weight change over time. Although successful short-term weight loss has been achieved using many different strategies, sustainable weight loss has proven elusive for many, and there are important gaps in our understanding of energy balance regulation. Elucidating the molecular basis of variability in weight change has the potential to improve treatment outcomes and inform innovative approaches that can simultaneously take into account information from genomic and other sources in devising individualized treatment plans. © 2015 The Obesity Society.
Lima-Costa, M Fernanda; Rodrigues, Laura C; Barreto, Maurício L; Gouveia, Mateus; Horta, Bernardo L; Mambrini, Juliana; Kehdy, Fernanda S G; Pereira, Alexandre; Rodrigues-Soares, Fernanda; Victora, Cesar G; Tarazona-Santos, Eduardo
2015-04-27
Brazil never had segregation laws defining membership of an ethnoracial group. Thus, the composition of the Brazilian population is mixed, and its ethnoracial classification is complex. Previous studies showed conflicting results on the correlation between genome ancestry and ethnoracial classification in Brazilians. We used 370,539 Single Nucleotide Polymorphisms to quantify this correlation in 5,851 community-dwelling individuals in the South (Pelotas), Southeast (Bambui) and Northeast (Salvador) Brazil. European ancestry was predominant in Pelotas and Bambui (median = 85.3% and 83.8%, respectively). African ancestry was highest in Salvador (median = 50.5%). The strength of the association between the phenotype and median proportion of African ancestry varied largely across populations, with pseudo R(2) values of 0.50 in Pelotas, 0.22 in Bambui and 0.13 in Salvador. The continuous proportion of African genomic ancestry showed a significant S-shape positive association with self-reported Blacks in the three sites, and the reverse trend was found for self reported Whites, with most consistent classifications in the extremes of the high and low proportion of African ancestry. In self-classified Mixed individuals, the predicted probability of having African ancestry was bell-shaped. Our results support the view that ethnoracial self-classification is affected by both genome ancestry and non-biological factors.
Rousseau-Gueutin, Mathieu; Lerceteau-Köhler, Estelle; Barrot, Laure; Sargent, Daniel James; Monfort, Amparo; Simpson, David; Arús, Pere; Guérin, Guy; Denoyes-Rothan, Béatrice
2008-01-01
Macrosynteny and colinearity between Fragaria (strawberry) species showing extreme levels of ploidy have been studied through comparative genetic mapping between the octoploid cultivated strawberry (F. ×ananassa) and its diploid relatives. A comprehensive map of the octoploid strawberry, in which almost all linkage groups are ranged into the seven expected homoeologous groups was obtained, thus providing the first reference map for the octoploid Fragaria. High levels of conserved macrosynteny and colinearity were observed between homo(eo)logous linkage groups and between the octoploid homoeologous groups and their corresponding diploid linkage groups. These results reveal that the polyploidization events that took place along the evolution of the Fragaria genus and the more recent juxtaposition of two octoploid strawberry genomes in the cultivated strawberry did not trigger any major chromosomal rearrangements in genomes involved in F. ×ananassa. They further suggest the existence of a close relationship between the diploid Fragaria genomes. In addition, despite the possible existence of residual levels of polysomic segregation suggested by the observation of large linkage groups in coupling phase only, the prevalence of linkage groups in coupling/repulsion phase clearly demonstrates that the meiotic behavior is mainly disomic in the cultivated strawberry. PMID:18660542
Bray, Molly S; Loos, Ruth JF; McCaffery, Jeanne M; Ling, Charlotte; Franks, Paul W; Weinstock, George M; Snyder, Michael P; Vassy, Jason L; Agurs-Collins, Tanya
2016-01-01
Objective Precision medicine utilizes genomic and other data to optimize and personalize treatment. Although more than 2,500 genetic tests are currently available, largely for extreme and/or rare phenotypes, the question remains whether this approach can be used for the treatment of common, complex conditions like obesity, inflammation, and insulin resistance, which underlie a host of metabolic diseases. Methods This review, developed from a Trans-NIH Conference titled “Genes, Behaviors, and Response to Weight Loss Interventions,” provides an overview of the state of genetic and genomic research in the area of weight change and identifies key areas for future research. Results Although many loci have been identified that are associated with cross-sectional measures of obesity/body size, relatively little is known regarding the genes/loci that influence dynamic measures of weight change over time. Although successful short-term weight loss has been achieved using many different strategies, sustainable weight loss has proven elusive for many, and there are important gaps in our understanding of energy balance regulation. Conclusions Elucidating the molecular basis of variability in weight change has the potential to improve treatment outcomes and inform innovative approaches that can simultaneously take into account information from genomic and other sources in devising individualized treatment plans. PMID:26692578
Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes.
Puigbò, Pere; Lobkovsky, Alexander E; Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V
2014-08-21
Genomes of bacteria and archaea (collectively, prokaryotes) appear to exist in incessant flux, expanding via horizontal gene transfer and gene duplication, and contracting via gene loss. However, the actual rates of genome dynamics and relative contributions of different types of event across the diversity of prokaryotes are largely unknown, as are the sizes of microbial supergenomes, i.e. pools of genes that are accessible to the given microbial species. We performed a comprehensive analysis of the genome dynamics in 35 groups (34 bacterial and one archaeal) of closely related microbial genomes using a phylogenetic birth-and-death maximum likelihood model to quantify the rates of gene family gain and loss, as well as expansion and reduction. The results show that loss of gene families dominates the evolution of prokaryotes, occurring at approximately three times the rate of gain. The rates of gene family expansion and reduction are typically seven and twenty times less than the gain and loss rates, respectively. Thus, the prevailing mode of evolution in bacteria and archaea is genome contraction, which is partially compensated by the gain of new gene families via horizontal gene transfer. However, the rates of gene family gain, loss, expansion and reduction vary within wide ranges, with the most stable genomes showing rates about 25 times lower than the most dynamic genomes. For many groups, the supergenome estimated from the fraction of repetitive gene family gains includes about tenfold more gene families than the typical genome in the group although some groups appear to have vast, 'open' supergenomes. Reconstruction of evolution for groups of closely related bacteria and archaea reveals an extremely rapid and highly variable flux of genes in evolving microbial genomes, demonstrates that extensive gene loss and horizontal gene transfer leading to innovation are the two dominant evolutionary processes, and yields robust estimates of the supergenome size.
Biswas et al. describe an “exceptional responder” lung adenocarcinoma patient who survived with metastatic lung adenocarcinoma for 7 years while undergoing single or combination ERBB2-directed therapies. Whole-genome, whole-exome, and high-coverage ion-torrent targeted sequencing were used to demonstrate extreme genomic heterogeneity between the lung and lymph node metastatic
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34
Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...
2016-09-10
Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Iain J.; DasSarma, Priya; Lucas, Susan
Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.
Genomic basis for the pest status of two Helicoverpa species
USDA-ARS?s Scientific Manuscript database
Background: Helicoverpa armigera and Helicoverpa zea are major caterpillar pests of Old and New World agriculture respectively. Both, particularly H. armigera, are extremely polyphagous, and H. armigera has developed resistance to many insecticides. Here we use comparative and population genomics an...
Gene-expression signatures of Atlantic salmon's plastic life cycle
Aubin-Horth, N.; Letcher, B.H.; Hofmann, H.A.
2009-01-01
How genomic expression differs as a function of life history variation is largely unknown. Atlantic salmon exhibits extreme alternative life histories. We defined the gene-expression signatures of wild-caught salmon at two different life stages by comparing the brain expression profiles of mature sneaker males and immature males, and early migrants and late migrants. In addition to life-stage-specific signatures, we discovered a surprisingly large gene set that was differentially regulated-at similar magnitudes, yet in opposite direction-in both life history transitions. We suggest that this co-variation is not a consequence of many independent cellular and molecular switches in the same direction but rather represents the molecular equivalent of a physiological shift orchestrated by one or very few master regulators. ?? 2009 Elsevier Inc. All rights reserved.
Gene-expression signatures of Atlantic salmon’s plastic life cycle
Aubin-Horth, Nadia; Letcher, Benjamin H.; Hofmann, Hans A.
2009-01-01
How genomic expression differs as a function of life history variation is largely unknown. Atlantic salmon exhibits extreme alternative life histories. We defined the gene-expression signatures of wild-caught salmon at two different life stages by comparing the brain expression profiles of mature sneaker males and immature males, and early migrants and late migrants. In addition to life-stage-specific signatures, we discovered a surprisingly large gene set that was differentially regulated - at similar magnitudes, yet in opposite direction - in both life history transitions. We suggest that this co-variation is not a consequence of many independent cellular and molecular switches in the same direction but rather represents the molecular equivalent of a physiological shift orchestrated by one or very few master regulators. PMID:19401203
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.
Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti
2016-10-06
With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
Yerrapragada, Shaila; Shukla, Animesh; Hallsworth-Pepin, Kymberlie; Choi, Kwangmin; Wollam, Aye; Clifton, Sandra; Qin, Xiang; Muzny, Donna; Raghuraman, Sriram; Ashki, Haleh; Uzman, Akif; Highlander, Sarah K.; Fryszczyn, Bartlomiej G.; Fox, George E.; Tirumalai, Madhan R.; Liu, Yamei; Kim, Sun
2015-01-01
Tolypothrix sp. PCC 7601 is a freshwater filamentous cyanobacterium with complex responses to environmental conditions. Here, we present its 9.96-Mbp draft genome sequence, containing 10,065 putative protein-coding sequences, including 305 predicted two-component system proteins and 27 putative phytochrome-class photoreceptors, the most such proteins in any sequenced genome. PMID:25953173
Khelaifia, S; Caputo, A; Djossou, F; Raoult, D
2017-01-01
We report the draft genome sequence of Haloferax alexandrinus strain Arc-hr (CSUR P798), isolated from the human gut of a 10-year-old Amazonian individual. Its 3 893 626 bp genome exhibits a 66.00% GC content. The genome of the strain Arc-hr contains 37 genes identified as ORFans, seven genes associated to halocin and 11 genes associated with polyketide synthases or nonribosomal peptide synthetases.
USDA-ARS?s Scientific Manuscript database
Mycosphaerella graminicola causes septoria tritici blotch, one of the most important diseases of wheat worldwide. Previous analyses showed that populations of this species are extremely variable and that polymorphisms for chromosome length and number can be generated during meiosis. To better unders...
Sequencing, assembly, and annotation of Maize B104 : A maize transformation resource
USDA-ARS?s Scientific Manuscript database
Maize transformation is complicated. Most lines are not readily cultured and transformed, making the germplasm available for genome engineering extremely limited. Developing a better understanding of the genomic regions responsible for differences in culturability and transformability would be a goo...
Chromosomal Evolution in Chiroptera
Sotero-Caio, Cibele G.; Baker, Robert J.; Volleth, Marianne
2017-01-01
Chiroptera is the second largest order among mammals, with over 1300 species in 21 extant families. The group is extremely diverse in several aspects of its natural history, including dietary strategies, ecology, behavior and morphology. Bat genomes show ample chromosome diversity (from 2n = 14 to 62). As with other mammalian orders, Chiroptera is characterized by clades with low, moderate and extreme chromosomal change. In this article, we will discuss trends of karyotypic evolution within distinct bat lineages (especially Phyllostomidae, Hipposideridae and Rhinolophidae), focusing on two perspectives: evolution of genome architecture, modes of chromosomal evolution, and the use of chromosome data to resolve taxonomic problems. PMID:29027987
Leitwein, M; Gagnaire, P-A; Desmarais, E; Guendouz, S; Rohmer, M; Berrebi, P; Guinand, B
2016-12-01
A genome-wide assessment of diversity is provided for wild Mediterranean brown trout Salmo trutta populations from headwater tributaries of the Orb River and from Atlantic and Mediterranean hatchery-reared strains that have been used for stocking. Double-digest restriction-site-associated DNA sequencing (dd-RADseq) was performed and the efficiency of de novo and reference-mapping approaches to obtain individual genotypes was compared. Large numbers of single nucleotide polymorphism (SNP) markers with similar genome-wide distributions were discovered using both approaches (196 639 v. 121 016 SNPs, respectively), with c. 80% of the loci detected de novo being also found with reference mapping, using the Atlantic salmon Salmo salar genome as a reference. Lower mapping density but larger nucleotide diversity (π) was generally observed near extremities of linkage groups, consistent with regions of residual tetrasomic inheritance observed in salmonids. Genome-wide diversity estimates revealed reduced polymorphism in hatchery strains (π = 0·0040 and π = 0·0029 in Atlantic and Mediterranean strains, respectively) compared to wild populations (π = 0·0049), a pattern that was congruent with allelic richness estimated from microsatellite markers. Finally, pronounced heterozygote deficiency was found in hatchery strains (Atlantic F IS = 0·18; Mediterranean F IS = 0·42), indicating that stocking practices may affect the genetic diversity in wild populations. These new genomic resources will provide important tools to define better conservation strategies in S. trutta. © 2016 The Fisheries Society of the British Isles.
Xanthopoulou, Aliki; Ganopoulos, Ioannis; Psomopoulos, Fotis; Manioudaki, Maria; Moysiadis, Theodoros; Kapazoglou, Aliki; Osathanunkul, Maslin; Michailidou, Sofia; Kalivas, Apostolos; Tsaftaris, Athanasios; Nianiou-Obeidat, Irini; Madesis, Panagiotis
2017-07-30
The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. 'Munchkin' (small-fruit) and cv. 'Big Moose' (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits. Copyright © 2017. Published by Elsevier B.V.
Kanost, Michael R; Arrese, Estela L; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L; Vogel, Heiko; Walters, James; Waterhouse, Robert M; Ahn, Seung-Joon; Almeida, Francisca C; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M; Clarke, David F; Dittmer, Neal T; Ferguson, Laura C F; Garavelou, Spyridoula; Gordon, Karl H J; Gunaratna, Ramesh T; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R; Kuwar, Suyog S; Lee, Sandy L; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H; McCulloch, Kyle J; Mathew, Tittu; Morton, Brian; Muzny, Donna M; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K; Worley, Kim C; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D; Burmester, Thorsten; Clem, Rollie J; Feyereisen, René; Grimmelikhuijzen, Cornelis J P; Hamodrakas, Stavros J; Hansson, Bill S; Huguet, Elisabeth; Jermiin, Lars S; Lan, Que; Lehman, Herman K; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B; Muthukrishnan, Subbaratnam; Oakeshott, John G; Palmer, Will; Park, Yoonseong; Passarelli, A Lorena; Rozas, Julio; Schwartz, Lawrence M; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J; Scherer, Steven E; Richards, Stephen; Blissard, Gary W
2016-09-01
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects. Copyright © 2016 Elsevier Ltd. All rights reserved.
The genomic diversification of the whole Acinetobacter genus: origins, mechanisms, and consequences.
Touchon, Marie; Cury, Jean; Yoon, Eun-Jeong; Krizova, Lenka; Cerqueira, Gustavo C; Murphy, Cheryl; Feldgarden, Michael; Wortman, Jennifer; Clermont, Dominique; Lambert, Thierry; Grillot-Courvalin, Catherine; Nemec, Alexandr; Courvalin, Patrice; Rocha, Eduardo P C
2014-10-13
Bacterial genomics has greatly expanded our understanding of microdiversification patterns within a species, but analyses at higher taxonomical levels are necessary to understand and predict the independent rise of pathogens in a genus. We have sampled, sequenced, and assessed the diversity of genomes of validly named and tentative species of the Acinetobacter genus, a clade including major nosocomial pathogens and biotechnologically important species. We inferred a robust global phylogeny and delimited several new putative species. The genus is very ancient and extremely diverse: Genomes of highly divergent species share more orthologs than certain strains within a species. We systematically characterized elements and mechanisms driving genome diversification, such as conjugative elements, insertion sequences, and natural transformation. We found many error-prone polymerases that may play a role in resistance to toxins, antibiotics, and in the generation of genetic variation. Surprisingly, temperate phages, poorly studied in Acinetobacter, were found to account for a significant fraction of most genomes. Accordingly, many genomes encode clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems with some of the largest CRISPR-arrays found so far in bacteria. Integrons are strongly overrepresented in Acinetobacter baumannii, which correlates with its frequent resistance to antibiotics. Our data suggest that A. baumannii arose from an ancient population bottleneck followed by population expansion under strong purifying selection. The outstanding diversification of the species occurred largely by horizontal transfer, including some allelic recombination, at specific hotspots preferentially located close to the replication terminus. Our work sets a quantitative basis to understand the diversification of Acinetobacter into emerging resistant and versatile pathogens. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Sinha, Pallavi; Kale, Sandip M; Parupalli, Swathi; Kumar, Vinay; Chitikineni, Annapurna; Vechalapu, Suryanarayana; Sameer Kumar, Chanda Venkata; Sharma, Mamta; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Muniswamy, Sonnappa; Varshney, Rajeev K
2017-07-01
Identification of candidate genomic regions associated with target traits using conventional mapping methods is challenging and time-consuming. In recent years, a number of single nucleotide polymorphism (SNP)-based mapping approaches have been developed and used for identification of candidate/putative genomic regions. However, in the majority of these studies, insertion-deletion (Indel) were largely ignored. For efficient use of Indels in mapping target traits, we propose Indel-seq approach, which is a combination of whole-genome resequencing (WGRS) and bulked segregant analysis (BSA) and relies on the Indel frequencies in extreme bulks. Deployment of Indel-seq approach for identification of candidate genomic regions associated with fusarium wilt (FW) and sterility mosaic disease (SMD) resistance in pigeonpea has identified 16 Indels affecting 26 putative candidate genes. Of these 26 affected putative candidate genes, 24 genes showed effect in the upstream/downstream of the genic region and two genes showed effect in the genes. Validation of these 16 candidate Indels in other FW- and SMD-resistant and FW- and SMD-susceptible genotypes revealed a significant association of five Indels (three for FW and two for SMD resistance). Comparative analysis of Indel-seq with other genetic mapping approaches highlighted the importance of the approach in identification of significant genomic regions associated with target traits. Therefore, the Indel-seq approach can be used for quick and precise identification of candidate genomic regions for any target traits in any crop species. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Zhong, Huaming; Shang, Shuai; Wu, Xiaoyang; Chen, Jun; Zhu, Wanchao; Yan, Jiakuo; Li, Haotian
2017-01-01
As nontraditional model organisms with extreme physiological and morphological phenotypes, snakes are believed to possess an inferior taste system. However, the bitter taste sensation is essential to distinguish the nutritious and poisonous food resources and the genomic evidence of bitter taste in snakes is largely scarce. To explore the genetic basis of the bitter taste of snakes and characterize the evolution of bitter taste receptor genes (Tas2rs) in reptiles, we identified Tas2r genes in 19 genomes (species) corresponding to three orders of non-avian reptiles. Our results indicated contractions of Tas2r gene repertoires in snakes, however dramatic gene expansions have occurred in lizards. Phylogenetic analysis of the Tas2rs with NJ and BI methods revealed that Tas2r genes of snake species formed two clades, whereas in lizards the Tas2r genes clustered into two monophyletic clades and four large clades. Evolutionary changes (birth and death) of intact Tas2r genes in reptiles were determined by reconciliation analysis. Additionally, the taste signaling pathway calcium homeostasis modulator 1 (Calhm1) gene of snakes was putatively functional, suggesting that snakes still possess bitter taste sensation. Furthermore, Phylogenetically Independent Contrasts (PIC) analyses reviewed a significant correlation between the number of Tas2r genes and the amount of potential toxins in reptilian diets, suggesting that insectivores such as some lizards may require more Tas2rs genes than omnivorous and carnivorous reptiles. PMID:28828281
Nutrigenomics: the role of nutrients in gene expression.
Dang, Tarana Singh; Walker, Mark; Ford, Dianne; Valentine, Ruth A
2014-02-01
Improved understanding of the mechanism behind periodontal tissue destruction, the potential protective role of nutrients and the advent of modern genomic measurement tools has led to an increased interest in the association between nutrition and periodontal disease. To date, evidence for a direct link between periodontal disease and nutrition has come mainly from large observational cross-sectional studies or very small double-blind randomized supplementation trials, with a large proportion finding no significant association between the nutrient being analyzed and markers of periodontal disease status. The advent of the 'genomic era' has introduced the concept of nutrigenomic studies, which aim to reveal the relationship between nutrition and the genome to provide a scientific basis for improved public health through dietary means. Used alongside relatively inexpensive high-throughput technology, this will allow the effect of diet on the etiology of periodontal disease to be studied in greater detail. As it is extremely likely that interactions between genotype and diet are important in determining the risk of the most common complex diseases, it is highly probable that these interactions will be important in determining periodontal disease risk. Numerous nutritional genetic studies where the outcome measures have been markers of disease risk, most notably cardiovascular disease and cancer, provide proof of principle, highlight the importance of understanding these interactions and illustrate where the effect of dietary modification on periodontal disease progression may have been overlooked previously by observational studies. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Santos-Garcia, Diego; Latorre, Amparo; Moya, Andrés; Gibbs, George; Hartung, Viktor; Dettner, Konrad; Kuechler, Stefan Martin; Silva, Francisco J.
2014-01-01
Moss bugs (Coleorrhyncha: Peloridiidae) are members of the order Hemiptera, and like many hemipterans, they have symbiotic associations with intracellular bacteria to fulfill nutritional requirements resulting from their unbalanced diet. The primary endosymbiont of the moss bugs, Candidatus Evansia muelleri, is phylogenetically related to Candidatus Carsonella ruddii and Candidatus Portiera aleyrodidarum, primary endosymbionts of psyllids and whiteflies, respectively. In this work, we report the genome of Candidatus Evansia muelleri Xc1 from Xenophyes cascus, which is the only obligate endosymbiont present in the association. This endosymbiont possesses an extremely reduced genome similar to Carsonella and Portiera. It has crossed the borderline to be considered as an autonomous cell, requiring the support of the insect host for some housekeeping cell functions. Interestingly, in spite of its small genome size, Evansia maintains enriched amino acid (complete or partial pathways for ten essential and six nonessential amino acids) and sulfur metabolisms, probably related to the poor diet of the insect, based on bryophytes, which contains very low levels of nitrogenous and sulfur compounds. Several facts, including the congruence of host (moss bugs, whiteflies, and psyllids) and endosymbiont phylogenies and the retention of the same ribosomal RNA operon during genome reduction in Evansia, Portiera, and Carsonella, suggest the existence of an ancient endosymbiotic Halomonadaceae clade associated with Hemiptera. Three possible scenarios for the origin of these three primary endosymbiont genera are proposed and discussed. PMID:25115011
Bentolila, Stéphane; Stefanov, Stefan
2012-01-01
Plant mitochondrial genomes have features that distinguish them radically from their animal counterparts: a high rate of rearrangement, of uptake and loss of DNA sequences, and an extremely low point mutation rate. Perhaps the most unique structural feature of plant mitochondrial DNAs is the presence of large repeated sequences involved in intramolecular and intermolecular recombination. In addition, rare recombination events can occur across shorter repeats, creating rearrangements that result in aberrant phenotypes, including pollen abortion, which is known as cytoplasmic male sterility (CMS). Using next-generation sequencing, we pyrosequenced two rice (Oryza sativa) mitochondrial genomes that belong to the indica subspecies. One genome is normal, while the other carries the wild abortive-CMS. We find that numerous rearrangements in the rice mitochondrial genome occur even between close cytotypes during rice evolution. Unlike maize (Zea mays), a closely related species also belonging to the grass family, integration of plastid sequences did not play a role in the sequence divergence between rice cytotypes. This study also uncovered an excellent candidate for the wild abortive-CMS-encoding gene; like most of the CMS-associated open reading frames that are known in other species, this candidate was created via a rearrangement, is chimeric in structure, possesses predicted transmembrane domains, and coopted the promoter of a genuine mitochondrial gene. Our data give new insights into rice mitochondrial evolution, correcting previous reports. PMID:22128137
Rochus, Christina Marie; Tortereau, Flavie; Plisson-Petit, Florence; Restoux, Gwendal; Moreno-Romieux, Carole; Tosser-Klopp, Gwenola; Servin, Bertrand
2018-01-23
One of the approaches to detect genetics variants affecting fitness traits is to identify their surrounding genomic signatures of past selection. With established methods for detecting selection signatures and the current and future availability of large datasets, such studies should have the power to not only detect these signatures but also to infer their selective histories. Domesticated animals offer a powerful model for these approaches as they adapted rapidly to environmental and human-mediated constraints in a relatively short time. We investigated this question by studying a large dataset of 542 individuals from 27 domestic sheep populations raised in France, genotyped for more than 500,000 SNPs. Population structure analysis revealed that this set of populations harbour a large part of European sheep diversity in a small geographical area, offering a powerful model for the study of adaptation. Identification of extreme SNP and haplotype frequency differences between populations listed 126 genomic regions likely affected by selection. These signatures revealed selection at loci commonly identified as selection targets in many species ("selection hotspots") including ABCG2, LCORL/NCAPG, MSTN, and coat colour genes such as ASIP, MC1R, MITF, and TYRP1. For one of these regions (ABCG2, LCORL/NCAPG), we could propose a historical scenario leading to the introgression of an adaptive allele into a new genetic background. Among selection signatures, we found clear evidence for parallel selection events in different genetic backgrounds, most likely for different mutations. We confirmed this allelic heterogeneity in one case by resequencing the MC1R gene in three black-faced breeds. Our study illustrates how dense genetic data in multiple populations allows the deciphering of evolutionary history of populations and of their adaptive mutations.
Sequencing small genomic targets with high efficiency and extreme accuracy
Schmitt, Michael W.; Fox, Edward J.; Prindle, Marc J.; Reid-Bayliss, Kate S.; True, Lawrence D.; Radich, Jerald P.; Loeb, Lawrence A.
2015-01-01
The detection of minority variants in mixed samples demands methods for enrichment and accurate sequencing of small genomic intervals. We describe an efficient approach based on sequential rounds of hybridization with biotinylated oligonucleotides, enabling more than one-million fold enrichment of genomic regions of interest. In conjunction with error correcting double-stranded molecular tags, our approach enables the quantification of mutations in individual DNA molecules. PMID:25849638
Yerrapragada, Shaila; Shukla, Animesh; Hallsworth-Pepin, Kymberlie; Choi, Kwangmin; Wollam, Aye; Clifton, Sandra; Qin, Xiang; Muzny, Donna; Raghuraman, Sriram; Ashki, Haleh; Uzman, Akif; Highlander, Sarah K; Fryszczyn, Bartlomiej G; Fox, George E; Tirumalai, Madhan R; Liu, Yamei; Kim, Sun; Kehoe, David M; Weinstock, George M
2015-05-07
Tolypothrix sp. PCC 7601 is a freshwater filamentous cyanobacterium with complex responses to environmental conditions. Here, we present its 9.96-Mbp draft genome sequence, containing 10,065 putative protein-coding sequences, including 305 predicted two-component system proteins and 27 putative phytochrome-class photoreceptors, the most such proteins in any sequenced genome. Copyright © 2015 Yerrapragada et al.
Messina, Enzo; Sorokin, Dimitry Y; Kublanov, Ilya V; Toshchakov, Stepan; Lopatina, Anna; Arcadi, Erika; Smedile, Francesco; La Spada, Gina; La Cono, Violetta; Yakimov, Michail M
2016-01-01
Strain M27-SA2 was isolated from the deep-sea salt-saturated anoxic lake Medee, which represents one of the most hostile extreme environments on our planet. On the basis of physiological studies and phylogenetic positioning this extremely halophilic euryarchaeon belongs to a novel genus 'Halanaeroarchaeum' within the family Halobacteriaceae. All members of this genus cultivated so far are strict anaerobes using acetate as the sole carbon and energy source and elemental sulfur as electron acceptor. Here we report the complete genome sequence of the strain M27-SA2 which is composed of a 2,129,244-bp chromosome and a 124,256-bp plasmid. This is the second complete genome sequence within the genus Halanaeroarchaeum. We demonstrate that genome of 'Halanaeroarchaeum sulfurireducens' M27-SA2 harbors complete metabolic pathways for acetate and sulfur catabolism and for de novo biosynthesis of 19 amino acids. The genomic analysis also reveals that 'Halanaeroarchaeum sulfurireducens' M27-SA2 harbors two prophage loci and one CRISPR locus, highly similar to that of Kulunda Steppe (Altai, Russia) isolate 'H. sulfurireducens' HSR2(T). The discovery of sulfur-respiring acetate-utilizing haloarchaeon in deep-sea hypersaline anoxic lakes has certain significance for understanding the biogeochemical functioning of these harsh ecosystems, which are incompatible with life for common organisms. Moreover, isolations of Halanaeroarchaeum members from geographically distant salt-saturated sites of different origin suggest a high degree of evolutionary success in their adaptation to this type of extreme biotopes around the world.
2013-01-01
Background Preterm birth confers a high risk of adverse long term health outcomes for survivors, yet the underlying molecular mechanisms are unclear. We hypothesized that effects of preterm birth can be mediated through measurable epigenomic changes throughout development. We therefore used a longitudinal birth cohort to measure the epigenetic mark of DNA methylation at birth and 18 years comparing survivors of extremely preterm birth with infants born at term. Methods Using 12 extreme preterm birth cases and 12 matched, term controls, we extracted DNA from archived neonatal blood spots and blood collected in a similar way at 18 years of age. DNA methylation was measured at 347,789 autosomal locations throughout the genome using Infinium HM450 arrays. Representative methylation differences were confirmed by Sequenom MassArray EpiTYPER. Results At birth we found 1,555 sites with significant differences in methylation between term and preterm babies. At 18 years of age, these differences had largely resolved, suggesting that DNA methylation differences at birth are mainly driven by factors relating to gestational age, such as cell composition and/or maturity. Using matched longitudinal samples, we found evidence for an epigenetic legacy associated with preterm birth, identifying persistent methylation differences at ten genomic loci. Longitudinal comparisons of DNA methylation at birth and 18 years uncovered a significant overlap between sites that were differentially-methylated at birth and those that changed with age. However, we note that overlapping sites may either differ in the same (300/1,555) or opposite (431/1,555) direction during gestation and aging respectively. Conclusions We present evidence for widespread methylation differences between extreme preterm and term infants at birth that are largely resolved by 18 years of age. These results are consistent with methylation changes associated with blood cell development, cellular composition, immune induction and age at these time points. Finally, we identified ten probes significantly associated with preterm individuals and with greater than 5% methylation discordance at birth and 18 years that may reflect a long term epigenetic legacy of preterm birth. PMID:24134860
Borneman, Anthony R.; Forgan, Angus H.; Kolouchova, Radka; Fraser, James A.; Schmidt, Simon A.
2016-01-01
Humans have been consuming wines for more than 7000 yr . For most of this time, fermentations were presumably performed by strains of Saccharomyces cerevisiae that naturally found their way into the fermenting must . In contrast, most commercial wines are now produced by inoculation with pure yeast monocultures, ensuring consistent, reliable and reproducible fermentations, and there are now hundreds of these yeast starter cultures commercially available. In order to thoroughly investigate the genetic diversity that has been captured by over 50 yr of commercial wine yeast development and domestication, whole genome sequencing has been performed on 212 strains of S. cerevisiae, including 119 commercial wine and brewing starter strains, and wine isolates from across seven decades. Comparative genomic analysis indicates that, despite their large numbers, commercial strains, and wine strains in general, are extremely similar genetically, possessing all of the hallmarks of a population bottle-neck, and high levels of inbreeding. In addition, many commercial strains from multiple suppliers are nearly genetically identical, suggesting that the limits of effective genetic variation within this genetically narrow group may be approaching saturation. PMID:26869621
Defining the biological bases of individual differences in musicality
Gingras, Bruno; Honing, Henkjan; Peretz, Isabelle; Trainor, Laurel J.; Fisher, Simon E.
2015-01-01
Advances in molecular technologies make it possible to pinpoint genomic factors associated with complex human traits. For cognition and behaviour, identification of underlying genes provides new entry points for deciphering the key neurobiological pathways. In the past decade, the search for genetic correlates of musicality has gained traction. Reports have documented familial clustering for different extremes of ability, including amusia and absolute pitch (AP), with twin studies demonstrating high heritability for some music-related skills, such as pitch perception. Certain chromosomal regions have been linked to AP and musical aptitude, while individual candidate genes have been investigated in relation to aptitude and creativity. Most recently, researchers in this field started performing genome-wide association scans. Thus far, studies have been hampered by relatively small sample sizes and limitations in defining components of musicality, including an emphasis on skills that can only be assessed in trained musicians. With opportunities to administer standardized aptitude tests online, systematic large-scale assessment of musical abilities is now feasible, an important step towards high-powered genome-wide screens. Here, we offer a synthesis of existing literatures and outline concrete suggestions for the development of comprehensive operational tools for the analysis of musical phenotypes. PMID:25646515
Chen, Ze-Hui; Zhang, Min; Lv, Feng-Hua; Ren, Xue; Li, Wen-Rong; Liu, Ming-Jun; Nam, Kiwoong; Bruford, Michael W; Li, Meng-Hua
2018-04-01
Analyses of genomic diversity along the X chromosome and of its correlation with autosomal diversity can facilitate understanding of evolutionary forces in shaping sex-linked genomic architecture. Strong selective sweeps and accelerated genetic drift on the X-chromosome have been inferred in primates and other model species, but no such insight has yet been gained in domestic animals compared with their wild relatives. Here, we analyzed X-chromosome variability in a large ovine data set, including a BeadChip array for 943 ewes from the world's sheep populations and 110 whole genomes of wild and domestic sheep. Analyzing whole-genome sequences, we observed a substantially reduced X-to-autosome diversity ratio (∼0.6) compared with the value expected under a neutral model (0.75). In particular, one large X-linked segment (43.05-79.25 Mb) was found to show extremely low diversity, most likely due to a high density of coding genes, featuring highly conserved regions. In general, we observed higher nucleotide diversity on the autosomes, but a flat diversity gradient in X-linked segments, as a function of increasing distance from the nearest genes, leading to a decreased X: autosome (X/A) diversity ratio and contrasting to the positive correlation detected in primates and other model animals. Our evidence suggests that accelerated genetic drift but reduced directional selection on X chromosome, as well as sex-biased demographic events, explain low X-chromosome diversity in sheep species. The distinct patterns of X-linked and X/A diversity we observed between Middle Eastern and non-Middle Eastern sheep populations can be explained by multiple migrations, selection, and admixture during the domestic sheep's recent postdomestication demographic expansion, coupled with natural selection for adaptation to new environments. In addition, we identify important novel genes involved in abnormal behavioral phenotypes, metabolism, and immunity, under selection on the sheep X-chromosome.
Chen, Ze-Hui; Zhang, Min; Lv, Feng-Hua; Ren, Xue; Li, Wen-Rong; Liu, Ming-Jun; Nam, Kiwoong; Bruford, Michael W; Li, Meng-Hua
2018-01-01
Abstract Analyses of genomic diversity along the X chromosome and of its correlation with autosomal diversity can facilitate understanding of evolutionary forces in shaping sex-linked genomic architecture. Strong selective sweeps and accelerated genetic drift on the X-chromosome have been inferred in primates and other model species, but no such insight has yet been gained in domestic animals compared with their wild relatives. Here, we analyzed X-chromosome variability in a large ovine data set, including a BeadChip array for 943 ewes from the world’s sheep populations and 110 whole genomes of wild and domestic sheep. Analyzing whole-genome sequences, we observed a substantially reduced X-to-autosome diversity ratio (∼0.6) compared with the value expected under a neutral model (0.75). In particular, one large X-linked segment (43.05–79.25 Mb) was found to show extremely low diversity, most likely due to a high density of coding genes, featuring highly conserved regions. In general, we observed higher nucleotide diversity on the autosomes, but a flat diversity gradient in X-linked segments, as a function of increasing distance from the nearest genes, leading to a decreased X: autosome (X/A) diversity ratio and contrasting to the positive correlation detected in primates and other model animals. Our evidence suggests that accelerated genetic drift but reduced directional selection on X chromosome, as well as sex-biased demographic events, explain low X-chromosome diversity in sheep species. The distinct patterns of X-linked and X/A diversity we observed between Middle Eastern and non-Middle Eastern sheep populations can be explained by multiple migrations, selection, and admixture during the domestic sheep’s recent postdomestication demographic expansion, coupled with natural selection for adaptation to new environments. In addition, we identify important novel genes involved in abnormal behavioral phenotypes, metabolism, and immunity, under selection on the sheep X-chromosome. PMID:29790980
Strain Prioritization and Genome Mining for Enediyne Natural Products.
Yan, Xiaohui; Ge, Huiming; Huang, Tingting; Hindra; Yang, Dong; Teng, Qihui; Crnovčić, Ivana; Li, Xiuling; Rudolf, Jeffrey D; Lohman, Jeremy R; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Van Nieuwerburgh, Filip; Rader, Christoph; Duan, Yanwen; Shen, Ben
2016-12-20
The enediyne family of natural products has had a profound impact on modern chemistry, biology, and medicine, and yet only 11 enediynes have been structurally characterized to date. Here we report a genome survey of 3,400 actinomycetes, identifying 81 strains that harbor genes encoding the enediyne polyketide synthase cassettes that could be grouped into 28 distinct clades based on phylogenetic analysis. Genome sequencing of 31 representative strains confirmed that each clade harbors a distinct enediyne biosynthetic gene cluster. A genome neighborhood network allows prediction of new structural features and biosynthetic insights that could be exploited for enediyne discovery. We confirmed one clade as new C-1027 producers, with a significantly higher C-1027 titer than the original producer, and discovered a new family of enediyne natural products, the tiancimycins (TNMs), that exhibit potent cytotoxicity against a broad spectrum of cancer cell lines. Our results demonstrate the feasibility of rapid discovery of new enediynes from a large strain collection. Recent advances in microbial genomics clearly revealed that the biosynthetic potential of soil actinomycetes to produce enediynes is underappreciated. A great challenge is to develop innovative methods to discover new enediynes and produce them in sufficient quantities for chemical, biological, and clinical investigations. This work demonstrated the feasibility of rapid discovery of new enediynes from a large strain collection. The new C-1027 producers, with a significantly higher C-1027 titer than the original producer, will impact the practical supply of this important drug lead. The TNMs, with their extremely potent cytotoxicity against various cancer cells and their rapid and complete cancer cell killing characteristics, in comparison with the payloads used in FDA-approved antibody-drug conjugates (ADCs), are poised to be exploited as payload candidates for the next generation of anticancer ADCs. Follow-up studies on the other identified hits promise the discovery of new enediynes, radically expanding the chemical space for the enediyne family. Copyright © 2016 Yan et al.
Bacterial and archaeal resistance to ionizing radiation
NASA Astrophysics Data System (ADS)
Confalonieri, F.; Sommer, S.
2011-01-01
Organisms living in extreme environments must cope with large fluctuations of temperature, high levels of radiation and/or desiccation, conditions that can induce DNA damage ranging from base modifications to DNA double-strand breaks. The bacterium Deinococcus radiodurans is known for its resistance to extremely high doses of ionizing radiation and for its ability to reconstruct a functional genome from hundreds of radiation-induced chromosomal fragments. Recently, extreme ionizing radiation resistance was also generated by directed evolution of an apparently radiation-sensitive bacterial species, Escherichia coli. Radioresistant organisms are not only found among the Eubacteria but also among the Archaea that represent the third kingdom of life. They present a set of particular features that differentiate them from the Eubacteria and eukaryotes. Moreover, Archaea are often isolated from extreme environments where they live under severe conditions of temperature, pressure, pH, salts or toxic compounds that are lethal for the large majority of living organisms. Thus, Archaea offer the opportunity to understand how cells are able to cope with such harsh conditions. Among them, the halophilic archaeon Halobacterium sp and several Pyrococcus or Thermococcus species, such as Thermococcus gammatolerans, were also shown to display high level of radiation resistance. The dispersion, in the phylogenetic tree, of radioresistant prokaryotes suggests that they have independently acquired radioresistance. Different strategies were selected during evolution including several mechanisms of radiation byproduct detoxification and subtle cellular metabolism modifications to help cells recover from radiation-induced injuries, protection of proteins against oxidation, an efficient DNA repair tool box, an original pathway of DNA double-strand break repair, a condensed nucleoid that may prevent the dispersion of the DNA fragments and specific radiation-induced proteins involved in radioresistance. Here, we compare mechanisms and discuss hypotheses suggested to contribute to radioresistance in several Archaea and Eubacteria.
The whole genome sequence assembly of the soybean aphid, Aphis glycines
USDA-ARS?s Scientific Manuscript database
Aphids are emerging as model organisms for both basic and applied research. Of the 5,000 estimated species, only two aphids have published whole genome sequences: the pea aphid Acyrthosiphon pisum, and the Russian wheat aphid, Diuraphis noxia. The soybean aphid (Aphis glycines) is an extreme special...
Genome Sequence of Sphingomonas sp. Strain PAMC 26605, Isolated from Arctic Lichen (Ochrolechia sp.)
Shin, Seung Chul; Ahn, Do Hwan; Lee, Jong Kyu; Kim, Su Jin; Hong, Soon Gyu; Kim, Eun Hye
2012-01-01
The endosymbiotic bacterium Sphingomonas sp. strain PAMC 26605 was isolated from Arctic lichens (Ochrolechia sp.) on the Svalbard Islands. Here we report the draft genome sequence of this strain, which could provide further insights into the symbiotic mechanism of lichens in extreme environments. PMID:22374946
Gene expression changes governing extreme dehydration tolerance in an Antarctic insect
Teets, Nicholas M.; Peyton, Justin T.; Colinet, Herve; Renault, David; Kelley, Joanna L.; Kawarasaki, Yuta; Lee, Richard E.; Denlinger, David L.
2012-01-01
Among terrestrial organisms, arthropods are especially susceptible to dehydration, given their small body size and high surface area to volume ratio. This challenge is particularly acute for polar arthropods that face near-constant desiccating conditions, as water is frozen and thus unavailable for much of the year. The molecular mechanisms that govern extreme dehydration tolerance in insects remain largely undefined. In this study, we used RNA sequencing to quantify transcriptional mechanisms of extreme dehydration tolerance in the Antarctic midge, Belgica antarctica, the world’s southernmost insect and only insect endemic to Antarctica. Larvae of B. antarctica are remarkably tolerant of dehydration, surviving losses up to 70% of their body water. Gene expression changes in response to dehydration indicated up-regulation of cellular recycling pathways including the ubiquitin-mediated proteasome and autophagy, with concurrent down-regulation of genes involved in general metabolism and ATP production. Metabolomics results revealed shifts in metabolite pools that correlated closely with changes in gene expression, indicating that coordinated changes in gene expression and metabolism are a critical component of the dehydration response. Finally, using comparative genomics, we compared our gene expression results with a transcriptomic dataset for the Arctic collembolan, Megaphorura arctica. Although B. antarctica and M. arctica are adapted to similar environments, our analysis indicated very little overlap in expression profiles between these two arthropods. Whereas several orthologous genes showed similar expression patterns, transcriptional changes were largely species specific, indicating these polar arthropods have developed distinct transcriptional mechanisms to cope with similar desiccating conditions. PMID:23197828
Gene expression changes governing extreme dehydration tolerance in an Antarctic insect.
Teets, Nicholas M; Peyton, Justin T; Colinet, Herve; Renault, David; Kelley, Joanna L; Kawarasaki, Yuta; Lee, Richard E; Denlinger, David L
2012-12-11
Among terrestrial organisms, arthropods are especially susceptible to dehydration, given their small body size and high surface area to volume ratio. This challenge is particularly acute for polar arthropods that face near-constant desiccating conditions, as water is frozen and thus unavailable for much of the year. The molecular mechanisms that govern extreme dehydration tolerance in insects remain largely undefined. In this study, we used RNA sequencing to quantify transcriptional mechanisms of extreme dehydration tolerance in the Antarctic midge, Belgica antarctica, the world's southernmost insect and only insect endemic to Antarctica. Larvae of B. antarctica are remarkably tolerant of dehydration, surviving losses up to 70% of their body water. Gene expression changes in response to dehydration indicated up-regulation of cellular recycling pathways including the ubiquitin-mediated proteasome and autophagy, with concurrent down-regulation of genes involved in general metabolism and ATP production. Metabolomics results revealed shifts in metabolite pools that correlated closely with changes in gene expression, indicating that coordinated changes in gene expression and metabolism are a critical component of the dehydration response. Finally, using comparative genomics, we compared our gene expression results with a transcriptomic dataset for the Arctic collembolan, Megaphorura arctica. Although B. antarctica and M. arctica are adapted to similar environments, our analysis indicated very little overlap in expression profiles between these two arthropods. Whereas several orthologous genes showed similar expression patterns, transcriptional changes were largely species specific, indicating these polar arthropods have developed distinct transcriptional mechanisms to cope with similar desiccating conditions.
Hou, Shaobin; Makarova, Kira S; Saw, Jimmy HW; Senin, Pavel; Ly, Benjamin V; Zhou, Zhemin; Ren, Yan; Wang, Jianmei; Galperin, Michael Y; Omelchenko, Marina V; Wolf, Yuri I; Yutin, Natalya; Koonin, Eugene V; Stott, Matthew B; Mountain, Bruce W; Crowe, Michelle A; Smirnova, Angela V; Dunfield, Peter F; Feng, Lu; Wang, Lei; Alam, Maqsudul
2008-01-01
Background The phylum Verrucomicrobia is a widespread but poorly characterized bacterial clade. Although cultivation-independent approaches detect representatives of this phylum in a wide range of environments, including soils, seawater, hot springs and human gastrointestinal tract, only few have been isolated in pure culture. We have recently reported cultivation and initial characterization of an extremely acidophilic methanotrophic member of the Verrucomicrobia, strain V4, isolated from the Hell's Gate geothermal area in New Zealand. Similar organisms were independently isolated from geothermal systems in Italy and Russia. Results We report the complete genome sequence of strain V4, the first one from a representative of the Verrucomicrobia. Isolate V4, initially named "Methylokorus infernorum" (and recently renamed Methylacidiphilum infernorum) is an autotrophic bacterium with a streamlined genome of ~2.3 Mbp that encodes simple signal transduction pathways and has a limited potential for regulation of gene expression. Central metabolism of M. infernorum was reconstructed almost completely and revealed highly interconnected pathways of autotrophic central metabolism and modifications of C1-utilization pathways compared to other known methylotrophs. The M. infernorum genome does not encode tubulin, which was previously discovered in bacteria of the genus Prosthecobacter, or close homologs of any other signature eukaryotic proteins. Phylogenetic analysis of ribosomal proteins and RNA polymerase subunits unequivocally supports grouping Planctomycetes, Verrucomicrobia and Chlamydiae into a single clade, the PVC superphylum, despite dramatically different gene content in members of these three groups. Comparative-genomic analysis suggests that evolution of the M. infernorum lineage involved extensive horizontal gene exchange with a variety of bacteria. The genome of M. infernorum shows apparent adaptations for existence under extremely acidic conditions including a major upward shift in the isoelectric points of proteins. Conclusion The results of genome analysis of M. infernorum support the monophyly of the PVC superphylum. M. infernorum possesses a streamlined genome but seems to have acquired numerous genes including those for enzymes of methylotrophic pathways via horizontal gene transfer, in particular, from Proteobacteria. Reviewers This article was reviewed by John A. Fuerst, Ludmila Chistoserdova, and Radhey S. Gupta. PMID:18593465
Hou, Shaobin; Makarova, Kira S; Saw, Jimmy H W; Senin, Pavel; Ly, Benjamin V; Zhou, Zhemin; Ren, Yan; Wang, Jianmei; Galperin, Michael Y; Omelchenko, Marina V; Wolf, Yuri I; Yutin, Natalya; Koonin, Eugene V; Stott, Matthew B; Mountain, Bruce W; Crowe, Michelle A; Smirnova, Angela V; Dunfield, Peter F; Feng, Lu; Wang, Lei; Alam, Maqsudul
2008-07-01
The phylum Verrucomicrobia is a widespread but poorly characterized bacterial clade. Although cultivation-independent approaches detect representatives of this phylum in a wide range of environments, including soils, seawater, hot springs and human gastrointestinal tract, only few have been isolated in pure culture. We have recently reported cultivation and initial characterization of an extremely acidophilic methanotrophic member of the Verrucomicrobia, strain V4, isolated from the Hell's Gate geothermal area in New Zealand. Similar organisms were independently isolated from geothermal systems in Italy and Russia. We report the complete genome sequence of strain V4, the first one from a representative of the Verrucomicrobia. Isolate V4, initially named "Methylokorus infernorum" (and recently renamed Methylacidiphilum infernorum) is an autotrophic bacterium with a streamlined genome of ~2.3 Mbp that encodes simple signal transduction pathways and has a limited potential for regulation of gene expression. Central metabolism of M. infernorum was reconstructed almost completely and revealed highly interconnected pathways of autotrophic central metabolism and modifications of C1-utilization pathways compared to other known methylotrophs. The M. infernorum genome does not encode tubulin, which was previously discovered in bacteria of the genus Prosthecobacter, or close homologs of any other signature eukaryotic proteins. Phylogenetic analysis of ribosomal proteins and RNA polymerase subunits unequivocally supports grouping Planctomycetes, Verrucomicrobia and Chlamydiae into a single clade, the PVC superphylum, despite dramatically different gene content in members of these three groups. Comparative-genomic analysis suggests that evolution of the M. infernorum lineage involved extensive horizontal gene exchange with a variety of bacteria. The genome of M. infernorum shows apparent adaptations for existence under extremely acidic conditions including a major upward shift in the isoelectric points of proteins. The results of genome analysis of M. infernorum support the monophyly of the PVC superphylum. M. infernorum possesses a streamlined genome but seems to have acquired numerous genes including those for enzymes of methylotrophic pathways via horizontal gene transfer, in particular, from Proteobacteria. This article was reviewed by John A. Fuerst, Ludmila Chistoserdova, and Radhey S. Gupta.
Low-pass sequencing for microbial comparative genomics
Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy
2004-01-01
Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067
Utilization of TALEN and CRISPR/Cas9 technologies for gene targeting and modification
Pu, Jiali; Zhang, Baorong; Feng, Jian
2015-01-01
The capability to modify the genome precisely and efficiently offers an extremely useful tool for biomedical research. Recent developments in genome editing technologies such as transcription activator-like effector nuclease and the clustered regularly interspaced short palindromic repeats system have made genome modification available for a number of organisms with relative ease. Here, we introduce these genome editing techniques, compare and contrast each technical approach and discuss their potential to study the underlying mechanisms of human disease using patient-derived induced pluripotent stem cells. PMID:25956682
The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds
Suh, Alexander; Smeds, Linnéa; Ellegren, Hans
2015-01-01
The diversification of neoavian birds is one of the most rapid adaptive radiations of extant organisms. Recent whole-genome sequence analyses have much improved the resolution of the neoavian radiation and suggest concurrence with the Cretaceous-Paleogene (K-Pg) boundary, yet the causes of the remaining genome-level irresolvabilities appear unclear. Here we show that genome-level analyses of 2,118 retrotransposon presence/absence markers converge at a largely consistent Neoaves phylogeny and detect a highly differential temporal prevalence of incomplete lineage sorting (ILS), i.e., the persistence of ancestral genetic variation as polymorphisms during speciation events. We found that ILS-derived incongruences are spread over the genome and involve 35% and 34% of the analyzed loci on the autosomes and the Z chromosome, respectively. Surprisingly, Neoaves diversification comprises three adaptive radiations, an initial near-K-Pg super-radiation with highly discordant phylogenetic signals from near-simultaneous speciation events, followed by two post-K-Pg radiations of core landbirds and core waterbirds with much less pronounced ILS. We provide evidence that, given the extreme level of up to 100% ILS per branch in super-radiations, particularly rapid speciation events may neither resemble a fully bifurcating tree nor are they resolvable as such. As a consequence, their complex demographic history is more accurately represented as local networks within a species tree. PMID:26284513
Hernandez-Maldonado, Jaime; Stoneburner, Brendon; Boren, Alison; Miller, Laurence; Rosen, Michael R.; Oremland, Ronald S.; Saltikov, Chad W
2016-01-01
The full genome sequence of Ectothiorhodospira sp. strain BSL-9 is reported here. This purple sulfur bacterium encodes an arxA-type arsenite oxidase within the arxB2AB1CD gene island and is capable of carrying out “photoarsenotrophy” anoxygenic photosynthetic arsenite oxidation. Its genome is composed of 3.5 Mb and has approximately 63% G+C content.
The Glyphosate-Based Herbicide Roundup Does not Elevate Genome-Wide Mutagenesis of Escherichia coli.
Tincher, Clayton; Long, Hongan; Behringer, Megan; Walker, Noah; Lynch, Michael
2017-10-05
Mutations induced by pollutants may promote pathogen evolution, for example by accelerating mutations conferring antibiotic resistance. Generally, evaluating the genome-wide mutagenic effects of long-term sublethal pollutant exposure at single-nucleotide resolution is extremely difficult. To overcome this technical barrier, we use the mutation accumulation/whole-genome sequencing (MA/WGS) method as a mutagenicity test, to quantitatively evaluate genome-wide mutagenesis of Escherichia coli after long-term exposure to a wide gradient of the glyphosate-based herbicide (GBH) Roundup Concentrate Plus. The genome-wide mutation rate decreases as GBH concentration increases, suggesting that even long-term GBH exposure does not compromise the genome stability of bacteria. Copyright © 2017 Tincher et al.
Comparative genome analysis of 19 Ureaplasma urealyticum and Ureaplasma parvum strains
2012-01-01
Background Ureaplasma urealyticum (UUR) and Ureaplasma parvum (UPA) are sexually transmitted bacteria among humans implicated in a variety of disease states including but not limited to: nongonococcal urethritis, infertility, adverse pregnancy outcomes, chorioamnionitis, and bronchopulmonary dysplasia in neonates. There are 10 distinct serotypes of UUR and 4 of UPA. Efforts to determine whether difference in pathogenic potential exists at the ureaplasma serovar level have been hampered by limitations of antibody-based typing methods, multiple cross-reactions and poor discriminating capacity in clinical samples containing two or more serovars. Results We determined the genome sequences of the American Type Culture Collection (ATCC) type strains of all UUR and UPA serovars as well as four clinical isolates of UUR for which we were not able to determine serovar designation. UPA serovars had 0.75−0.78 Mbp genomes and UUR serovars were 0.84−0.95 Mbp. The original classification of ureaplasma isolates into distinct serovars was largely based on differences in the major ureaplasma surface antigen called the multiple banded antigen (MBA) and reactions of human and animal sera to the organisms. Whole genome analysis of the 14 serovars and the 4 clinical isolates showed the mba gene was part of a large superfamily, which is a phase variable gene system, and that some serovars have identical sets of mba genes. Most of the differences among serovars are hypothetical genes, and in general the two species and 14 serovars are extremely similar at the genome level. Conclusions Comparative genome analysis suggests UUR is more capable of acquiring genes horizontally, which may contribute to its greater virulence for some conditions. The overwhelming evidence of extensive horizontal gene transfer among these organisms from our previous studies combined with our comparative analysis indicates that ureaplasmas exist as quasi-species rather than as stable serovars in their native environment. Therefore, differential pathogenicity and clinical outcome of a ureaplasmal infection is most likely not on the serovar level, but rather may be due to the presence or absence of potential pathogenicity factors in an individual ureaplasma clinical isolate and/or patient to patient differences in terms of autoimmunity and microbiome. PMID:22646228
Comparative genome analysis of 19 Ureaplasma urealyticum and Ureaplasma parvum strains.
Paralanov, Vanya; Lu, Jin; Duffy, Lynn B; Crabb, Donna M; Shrivastava, Susmita; Methé, Barbara A; Inman, Jason; Yooseph, Shibu; Xiao, Li; Cassell, Gail H; Waites, Ken B; Glass, John I
2012-05-30
Ureaplasma urealyticum (UUR) and Ureaplasma parvum (UPA) are sexually transmitted bacteria among humans implicated in a variety of disease states including but not limited to: nongonococcal urethritis, infertility, adverse pregnancy outcomes, chorioamnionitis, and bronchopulmonary dysplasia in neonates. There are 10 distinct serotypes of UUR and 4 of UPA. Efforts to determine whether difference in pathogenic potential exists at the ureaplasma serovar level have been hampered by limitations of antibody-based typing methods, multiple cross-reactions and poor discriminating capacity in clinical samples containing two or more serovars. We determined the genome sequences of the American Type Culture Collection (ATCC) type strains of all UUR and UPA serovars as well as four clinical isolates of UUR for which we were not able to determine serovar designation. UPA serovars had 0.75-0.78 Mbp genomes and UUR serovars were 0.84-0.95 Mbp. The original classification of ureaplasma isolates into distinct serovars was largely based on differences in the major ureaplasma surface antigen called the multiple banded antigen (MBA) and reactions of human and animal sera to the organisms. Whole genome analysis of the 14 serovars and the 4 clinical isolates showed the mba gene was part of a large superfamily, which is a phase variable gene system, and that some serovars have identical sets of mba genes. Most of the differences among serovars are hypothetical genes, and in general the two species and 14 serovars are extremely similar at the genome level. Comparative genome analysis suggests UUR is more capable of acquiring genes horizontally, which may contribute to its greater virulence for some conditions. The overwhelming evidence of extensive horizontal gene transfer among these organisms from our previous studies combined with our comparative analysis indicates that ureaplasmas exist as quasi-species rather than as stable serovars in their native environment. Therefore, differential pathogenicity and clinical outcome of a ureaplasmal infection is most likely not on the serovar level, but rather may be due to the presence or absence of potential pathogenicity factors in an individual ureaplasma clinical isolate and/or patient to patient differences in terms of autoimmunity and microbiome.
Extreme sensitivity to ultraviolet light in the fungal pathogen causing white-nose syndrome of bats.
Palmer, Jonathan M; Drees, Kevin P; Foster, Jeffrey T; Lindner, Daniel L
2018-01-02
Bat white-nose syndrome (WNS), caused by the fungal pathogen Pseudogymnoascus destructans, has decimated North American hibernating bats since its emergence in 2006. Here, we utilize comparative genomics to examine the evolutionary history of this pathogen in comparison to six closely related nonpathogenic species. P. destructans displays a large reduction in carbohydrate-utilizing enzymes (CAZymes) and in the predicted secretome (~50%), and an increase in lineage-specific genes. The pathogen has lost a key enzyme, UVE1, in the alternate excision repair (AER) pathway, which is known to contribute to repair of DNA lesions induced by ultraviolet (UV) light. Consistent with a nonfunctional AER pathway, P. destructans is extremely sensitive to UV light, as well as the DNA alkylating agent methyl methanesulfonate (MMS). The differential susceptibility of P. destructans to UV light in comparison to other hibernacula-inhabiting fungi represents a potential "Achilles' heel" of P. destructans that might be exploited for treatment of bats with WNS.
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits.
Larsson, John; Nylander, Johan Aa; Bergman, Birgitta
2011-06-30
Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets.
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits
2011-01-01
Background Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. Results A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. Conclusions The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets. PMID:21718514
Do you really know where this SNP goes?
USDA-ARS?s Scientific Manuscript database
The release of build 10.2 of the swine genome was a marked improvement over previous builds and has proven extremely useful. However, as most know, there are regions of the genome that this particular build does not accurately represent. For instance, nearly 25% of the 62,162 SNP on the Illumina Por...
De novo genome sequencing and comparative genomics of the date palm Phoenix dactylifera)
USDA-ARS?s Scientific Manuscript database
Date Palm has been vital to the Middle East and other arid regions of the world for more than 5000 years. The date palm's ability to withstand extremely harsh conditions, while producing highly nutritious fruit with relatively minimal care, makes it a good candidate for improving arid land agricultu...
Genomic Sciences for Developmentalists: A Merge of Science and Practice
ERIC Educational Resources Information Center
Grigorenko, Elena L.
2015-01-01
The etiological forces of development have been a central question for the developmental sciences (however defined) since their crystallization as a distinct branch of scientific inquiry. Although the history of these sciences contains examples of extreme positions capitalizing on either the predominance of the genome (i.e., the accumulation of…
First draft genome sequence of a strain from the genus Citricoccus.
Hayano-Kanashiro, Corina; López-Arredondo, Damar Lizbeth; Cruz-Morales, Pablo; Alcaraz, Luis-David; Olmedo, Gabriela; Barona-Gómez, Francisco; Herrera-Estrella, Luis
2011-11-01
Bacteria of the genus Citricoccus have been isolated from ecological niches characterized by diverse abiotic stress conditions. Here we report the first genome draft of a strain of the genus Citricoccus isolated from the extremely oligotrophic Churince system in the Cuatro Ciénegas Basin (CCB) in Coahuila, Mexico.
Benazzo, Andrea; Trucchi, Emiliano; Cahill, James A.; Maisano Delser, Pierpaolo; Mona, Stefano; Fumagalli, Matteo; Cornetti, Luca; Ghirotto, Silvia; Girardi, Matteo; Ometto, Lino; Panziera, Alex; Rota-Stabelli, Omar; Zanetti, Enrico; Karamanlidis, Alexandros; Groff, Claudio; Paule, Ladislav; Gentile, Leonardo; Vicario, Saverio; Boitani, Luigi; Fuselli, Silvia; Vernesi, Cristiano; Bertorelle, Giorgio
2017-01-01
About 100 km east of Rome, in the central Apennine Mountains, a critically endangered population of ∼50 brown bears live in complete isolation. Mating outside this population is prevented by several 100 km of bear-free territories. We exploited this natural experiment to better understand the gene and genomic consequences of surviving at extremely small population size. We found that brown bear populations in Europe lost connectivity since Neolithic times, when farming communities expanded and forest burning was used for land clearance. In central Italy, this resulted in a 40-fold population decline. The overall genomic impact of this decline included the complete loss of variation in the mitochondrial genome and along long stretches of the nuclear genome. Several private and deleterious amino acid changes were fixed by random drift; predicted effects include energy deficit, muscle weakness, anomalies in cranial and skeletal development, and reduced aggressiveness. Despite this extreme loss of diversity, Apennine bear genomes show nonrandom peaks of high variation, possibly maintained by balancing selection, at genomic regions significantly enriched for genes associated with immune and olfactory systems. Challenging the paradigm of increased extinction risk in small populations, we suggest that random fixation of deleterious alleles (i) can be an important driver of divergence in isolation, (ii) can be tolerated when balancing selection prevents random loss of variation at important genes, and (iii) is followed by or results directly in favorable behavioral changes. PMID:29078308
Benazzo, Andrea; Trucchi, Emiliano; Cahill, James A; Maisano Delser, Pierpaolo; Mona, Stefano; Fumagalli, Matteo; Bunnefeld, Lynsey; Cornetti, Luca; Ghirotto, Silvia; Girardi, Matteo; Ometto, Lino; Panziera, Alex; Rota-Stabelli, Omar; Zanetti, Enrico; Karamanlidis, Alexandros; Groff, Claudio; Paule, Ladislav; Gentile, Leonardo; Vilà, Carles; Vicario, Saverio; Boitani, Luigi; Orlando, Ludovic; Fuselli, Silvia; Vernesi, Cristiano; Shapiro, Beth; Ciucci, Paolo; Bertorelle, Giorgio
2017-11-07
About 100 km east of Rome, in the central Apennine Mountains, a critically endangered population of ∼50 brown bears live in complete isolation. Mating outside this population is prevented by several 100 km of bear-free territories. We exploited this natural experiment to better understand the gene and genomic consequences of surviving at extremely small population size. We found that brown bear populations in Europe lost connectivity since Neolithic times, when farming communities expanded and forest burning was used for land clearance. In central Italy, this resulted in a 40-fold population decline. The overall genomic impact of this decline included the complete loss of variation in the mitochondrial genome and along long stretches of the nuclear genome. Several private and deleterious amino acid changes were fixed by random drift; predicted effects include energy deficit, muscle weakness, anomalies in cranial and skeletal development, and reduced aggressiveness. Despite this extreme loss of diversity, Apennine bear genomes show nonrandom peaks of high variation, possibly maintained by balancing selection, at genomic regions significantly enriched for genes associated with immune and olfactory systems. Challenging the paradigm of increased extinction risk in small populations, we suggest that random fixation of deleterious alleles ( i ) can be an important driver of divergence in isolation, ( ii ) can be tolerated when balancing selection prevents random loss of variation at important genes, and ( iii ) is followed by or results directly in favorable behavioral changes. Published under the PNAS license.
GTRAC: fast retrieval from compressed collections of genomic variants
Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy
2016-01-01
Motivation: The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. Results: We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. Availability and Implementation: The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC Contact: kedart@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27587665
GTRAC: fast retrieval from compressed collections of genomic variants.
Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy
2016-09-01
The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Maumus, Florian; Blanc, Guillaume
2016-12-14
The nucleocytoplasmic large DNA viruses (NCLDV) are a group of extremely complex double-stranded DNA viruses, which are major parasites of a variety of eukaryotes. Recent studies showed that certain unicellular eukaryotes contain fragments of NCLDV DNA integrated in their genome, when surprisingly many of these organisms were not previously shown to be infected by NCLDVs. These findings prompted us to search the genome of Acanthamoeba castellanii strain Neff (Neff), one of the most prolific hosts in the discovery of giant NCLDVs, for possible DNA inserts of viral origin. We report the identification of 267 markers of lateral gene transfer with viruses, approximately half of which are clustered in Neff genome regions of viral origins, transcriptionally inactive or exhibit nucleotide-composition signatures suggestive of a foreign origin. The integrated viral genes had diverse origin among relatives of viruses that infect Neff, including Mollivirus, Pandoravirus, Marseillevirus, Pithovirus, and Mimivirus However, phylogenetic analysis suggests the existence of a yet-undiscovered family of amoeba-infecting NCLDV in addition to the five already characterized. The active transcription of some apparently anciently integrated virus-like genes suggests that some viral genes might have been domesticated during the amoeba evolution. These insights confirm that genomic insertion of NCLDV DNA is a common theme in eukaryotes. This gene flow contributed fertilizing the eukaryotic gene repertoire and participated in the occurrence of orphan genes, a long standing issue in genomics. Search for viral inserts in eukaryotic genomes followed by environmental screening of the original viruses should be used to isolate radically new NCLDVs. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Wallberg, Andreas; Glémin, Sylvain; Webster, Matthew T.
2015-01-01
Meiotic recombination is a fundamental cellular process, with important consequences for evolution and genome integrity. However, we know little about how recombination rates vary across the genomes of most species and the molecular and evolutionary determinants of this variation. The honeybee, Apis mellifera, has extremely high rates of meiotic recombination, although the evolutionary causes and consequences of this are unclear. Here we use patterns of linkage disequilibrium in whole genome resequencing data from 30 diploid honeybees to construct a fine-scale map of rates of crossing over in the genome. We find that, in contrast to vertebrate genomes, the recombination landscape is not strongly punctate. Crossover rates strongly correlate with levels of genetic variation, but not divergence, which indicates a pervasive impact of selection on the genome. Germ-line methylated genes have reduced crossover rate, which could indicate a role of methylation in suppressing recombination. Controlling for the effects of methylation, we do not infer a strong association between gene expression patterns and recombination. The site frequency spectrum is strongly skewed from neutral expectations in honeybees: rare variants are dominated by AT-biased mutations, whereas GC-biased mutations are found at higher frequencies, indicative of a major influence of GC-biased gene conversion (gBGC), which we infer to generate an allele fixation bias 5 – 50 times the genomic average estimated in humans. We uncover further evidence that this repair bias specifically affects transitions and favours fixation of CpG sites. Recombination, via gBGC, therefore appears to have profound consequences on genome evolution in honeybees and interferes with the process of natural selection. These findings have important implications for our understanding of the forces driving molecular evolution. PMID:25902173
Yan, Dankan; Tang, Yunxia; Hu, Min; Liu, Fengquan; Zhang, Dongfang; Fan, Jiaqin
2014-10-01
Thrips is an ideal group for studying the evolution of mitochondrial (mt) genomes in the genus and family due to independent rearrangements within this order. The complete sequence of the mitochondrial DNA (mtDNA) of the flower thrips Frankliniella intonsa has been completed and annotated in this study. The circular genome is 15,215bp in length with an A+T content of 75.9% and contains the typical 37 genes and it has triplicate putative control regions. Nucleotide composition is A+T biased, and the majority of the protein-coding genes present opposite CG skew which is reflected by the nucleotide composition, codon and amino acid usage. Although the known thrips have massive gene rearrangements, it showed no reversal of strand asymmetry. Gene rearrangements have been found in the lower taxonomic levels of thrips. Three tRNA genes were translocated in the genus Frankliniella and eight tRNA genes in the family Thripidae. Although the gene arrangements of mt genomes of all three thrips species differ massively from the ancestral insect, they are all very similar to each other, indicating that there was a large rearrangement somewhere before the most recent common ancestor of these three species and very little genomic evolution or rearrangements after then. The extremely similar sequences among the CRs suggest that they are ongoing concerted evolution. Analyses of the up and downstream sequence of CRs reveal that the CR2 is actually the ancestral CR. The three CRs are in the same spot in each of the three thrips mt genomes which have the identical inverted genes. These characteristics might be obtained from the most recent common ancestor of this three thrips. Above observations suggest that the mt genomes of the three thrips keep a single massive rearrangement from the common ancestor and have low evolutionary rates among them. Copyright © 2014 Elsevier Inc. All rights reserved.
Energetics and genetics across the prokaryote-eukaryote divide
2011-01-01
Background All complex life on Earth is eukaryotic. All eukaryotic cells share a common ancestor that arose just once in four billion years of evolution. Prokaryotes show no tendency to evolve greater morphological complexity, despite their metabolic virtuosity. Here I argue that the eukaryotic cell originated in a unique prokaryotic endosymbiosis, a singular event that transformed the selection pressures acting on both host and endosymbiont. Results The reductive evolution and specialisation of endosymbionts to mitochondria resulted in an extreme genomic asymmetry, in which the residual mitochondrial genomes enabled the expansion of bioenergetic membranes over several orders of magnitude, overcoming the energetic constraints on prokaryotic genome size, and permitting the host cell genome to expand (in principle) over 200,000-fold. This energetic transformation was permissive, not prescriptive; I suggest that the actual increase in early eukaryotic genome size was driven by a heavy early bombardment of genes and introns from the endosymbiont to the host cell, producing a high mutation rate. Unlike prokaryotes, with lower mutation rates and heavy selection pressure to lose genes, early eukaryotes without genome-size limitations could mask mutations by cell fusion and genome duplication, as in allopolyploidy, giving rise to a proto-sexual cell cycle. The side effect was that a large number of shared eukaryotic basal traits accumulated in the same population, a sexual eukaryotic common ancestor, radically different to any known prokaryote. Conclusions The combination of massive bioenergetic expansion, release from genome-size constraints, and high mutation rate favoured a protosexual cell cycle and the accumulation of eukaryotic traits. These factors explain the unique origin of eukaryotes, the absence of true evolutionary intermediates, and the evolution of sex in eukaryotes but not prokaryotes. Reviewers This article was reviewed by: Eugene Koonin, William Martin, Ford Doolittle and Mark van der Giezen. For complete reports see the Reviewers' Comments section. PMID:21714941
Analysis of Genes Involved in Body Weight Regulation by Targeted Re-Sequencing.
Volckmar, Anna-Lena; Han, Chung Ting; Pütter, Carolin; Haas, Stefan; Vogel, Carla I G; Knoll, Nadja; Struve, Christoph; Göbel, Maria; Haas, Katharina; Herrfurth, Nikolas; Jarick, Ivonne; Grallert, Harald; Schürmann, Annette; Al-Hasani, Hadi; Hebebrand, Johannes; Sauer, Sascha; Hinney, Anke
2016-01-01
Genes involved in body weight regulation that were previously investigated in genome-wide association studies (GWAS) and in animal models were target-enriched followed by massive parallel next generation sequencing. We enriched and re-sequenced continuous genomic regions comprising FTO, MC4R, TMEM18, SDCCAG8, TKNS, MSRA and TBC1D1 in a screening sample of 196 extremely obese children and adolescents with age and sex specific body mass index (BMI) ≥ 99th percentile and 176 lean adults (BMI ≤ 15th percentile). 22 variants were confirmed by Sanger sequencing. Genotyping was performed in up to 705 independent obesity trios (extremely obese child and both parents), 243 extremely obese cases and 261 lean adults. We detected 20 different non-synonymous variants, one frame shift and one nonsense mutation in the 7 continuous genomic regions in study groups of different weight extremes. For SNP Arg695Cys (rs58983546) in TBC1D1 we detected nominal association with obesity (pTDT = 0.03 in 705 trios). Eleven of the variants were rare, thus were only detected heterozygously in up to ten individual(s) of the complete screening sample of 372 individuals. Two of them (in FTO and MSRA) were found in lean individuals, nine in extremely obese. In silico analyses of the 11 variants did not reveal functional implications for the mutations. Concordant with our hypothesis we detected a rare variant that potentially leads to loss of FTO function in a lean individual. For TBC1D1, in contrary to our hypothesis, the loss of function variant (Arg443Stop) was found in an obese individual. Functional in vitro studies are warranted.
Gendreau, Kerry L; Haney, Robert A; Schwager, Evelyn E; Wierschin, Torsten; Stanke, Mario; Richards, Stephen; Garb, Jessica E
2017-02-16
Black widow spiders are infamous for their neurotoxic venom, which can cause extreme and long-lasting pain. This unusual venom is dominated by latrotoxins and latrodectins, two protein families virtually unknown outside of the black widow genus Latrodectus, that are difficult to study given the paucity of spider genomes. Using tissue-, sex- and stage-specific expression data, we analyzed the recently sequenced genome of the house spider (Parasteatoda tepidariorum), a close relative of black widows, to investigate latrotoxin and latrodectin diversity, expression and evolution. We discovered at least 47 latrotoxin genes in the house spider genome, many of which are tandem-arrayed. Latrotoxins vary extensively in predicted structural domains and expression, implying their significant functional diversification. Phylogenetic analyses show latrotoxins have substantially duplicated after the Latrodectus/Parasteatoda split and that they are also related to proteins found in endosymbiotic bacteria. Latrodectin genes are less numerous than latrotoxins, but analyses show their recruitment for venom function from neuropeptide hormone genes following duplication, inversion and domain truncation. While latrodectins and other peptides are highly expressed in house spider and black widow venom glands, latrotoxins account for a far smaller percentage of house spider venom gland expression. The house spider genome sequence provides novel insights into the evolution of venom toxins once considered unique to black widows. Our results greatly expand the size of the latrotoxin gene family, reinforce its narrow phylogenetic distribution, and provide additional evidence for the lateral transfer of latrotoxins between spiders and bacterial endosymbionts. Moreover, we strengthen the evidence for the evolution of latrodectin venom genes from the ecdysozoan Ion Transport Peptide (ITP)/Crustacean Hyperglycemic Hormone (CHH) neuropeptide superfamily. The lower expression of latrotoxins in house spiders relative to black widows, along with the absence of a vertebrate-targeting α-latrotoxin gene in the house spider genome, may account for the extreme potency of black widow venom.
Weighted mining of massive collections of [Formula: see text]-values by convex optimization.
Dobriban, Edgar
2018-06-01
Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp , a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).
KinSNP software for homozygosity mapping of disease genes using SNP microarrays.
Amir, El-Ad David; Bartal, Ofer; Morad, Efrat; Nagar, Tal; Sheynin, Jony; Parvari, Ruti; Chalifa-Caspi, Vered
2010-08-01
Consanguineous families affected with a recessive genetic disease caused by homozygotisation of a mutation offer a unique advantage for positional cloning of rare diseases. Homozygosity mapping of patient genotypes is a powerful technique for the identification of the genomic locus harbouring the causing mutation. This strategy relies on the observation that in these patients a large region spanning the disease locus is also homozygous with high probability. The high marker density in single nucleotide polymorphism (SNP) arrays is extremely advantageous for homozygosity mapping. We present KinSNP, a user-friendly software tool for homozygosity mapping using SNP arrays. The software searches for stretches of SNPs which are homozygous to the same allele in all ascertained sick individuals. User-specified parameters control the number of allowed genotyping 'errors' within homozygous blocks. Candidate disease regions are then reported in a detailed, coloured Excel file, along with genotypes of family members and healthy controls. An interactive genome browser has been included which shows homozygous blocks, individual genotypes, genes and further annotations along the chromosomes, with zooming and scrolling capabilities. The software has been used to identify the location of a mutated gene causing insensitivity to pain in a large Bedouin family. KinSNP is freely available from.
KinSNP software for homozygosity mapping of disease genes using SNP microarrays
2010-01-01
Consanguineous families affected with a recessive genetic disease caused by homozygotisation of a mutation offer a unique advantage for positional cloning of rare diseases. Homozygosity mapping of patient genotypes is a powerful technique for the identification of the genomic locus harbouring the causing mutation. This strategy relies on the observation that in these patients a large region spanning the disease locus is also homozygous with high probability. The high marker density in single nucleotide polymorphism (SNP) arrays is extremely advantageous for homozygosity mapping. We present KinSNP, a user-friendly software tool for homozygosity mapping using SNP arrays. The software searches for stretches of SNPs which are homozygous to the same allele in all ascertained sick individuals. User-specified parameters control the number of allowed genotyping 'errors' within homozygous blocks. Candidate disease regions are then reported in a detailed, coloured Excel file, along with genotypes of family members and healthy controls. An interactive genome browser has been included which shows homozygous blocks, individual genotypes, genes and further annotations along the chromosomes, with zooming and scrolling capabilities. The software has been used to identify the location of a mutated gene causing insensitivity to pain in a large Bedouin family. KinSNP is freely available from http://bioinfo.bgu.ac.il/bsu/software/kinSNP. PMID:20846928
Unlocking the potential of orphan legumes.
Cullis, Christopher; Kunert, Karl J
2017-04-01
Orphan, or underutilized, legumes are domesticated legumes with useful properties, but with less importance than major world crops due to use and supply constraints. However, they play a significant role in many developing countries, providing food security and nutrition to consumers, as well as income to resource-poor farmers. They have been largely neglected by both researchers and industry due to their limited economic importance in the global market. Orphan legumes are better adapted than the major legume crops to extreme soil and climatic conditions, with high tolerance to abiotic environmental stresses such as drought. As a stress response they can also produce compounds with pharmaceutical value. Orphan legumes are therefore a likely source of important traits for introduction into major crops to aid in combating the stresses associated with global climate change. Modern large-scale genomics techniques are now being applied to many of these previously understudied crops, with the first successes reported in the genomics area. However, greater investment of resources and manpower are necessary if the potential of orphan legumes is to be unlocked and applied in the future. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Ma, Peng-Fei; Zhang, Yu-Xiao; Zeng, Chun-Xia; Guo, Zhen-Hua; Li, De-Zhu
2014-11-01
The temperate woody bamboos constitute a distinct tribe Arundinarieae (Poaceae: Bambusoideae) with high species diversity. Estimating phylogenetic relationships among the 11 major lineages of Arundinarieae has been particularly difficult, owing to a possible rapid radiation and the extremely low rate of sequence divergence. Here, we explore the use of chloroplast genome sequencing for phylogenetic inference. We sampled 25 species (22 temperate bamboos and 3 outgroups) for the complete genome representing eight major lineages of Arundinarieae in an attempt to resolve backbone relationships. Phylogenetic analyses of coding versus noncoding sequences, and of different regions of the genome (large single copy and small single copy, and inverted repeat regions) yielded no well-supported contradicting topologies but potential incongruence was found between the coding and noncoding sequences. The use of various data partitioning schemes in analysis of the complete sequences resulted in nearly identical topologies and node support values, although the partitioning schemes were decisively different from each other as to the fit to the data. Our full genomic data set substantially increased resolution along the backbone and provided strong support for most relationships despite the very short internodes and long branches in the tree. The inferred relationships were also robust to potential confounding factors (e.g., long-branch attraction) and received support from independent indels in the genome. We then added taxa from the three Arundinarieae lineages that were not included in the full-genome data set; each of these were sampled for more than 50% genome sequences. The resulting trees not only corroborated the reconstructed deep-level relationships but also largely resolved the phylogenetic placements of these three additional lineages. Furthermore, adding 129 additional taxa sampled for only eight chloroplast loci to the combined data set yielded almost identical relationships, albeit with low support values. We believe that the inferred phylogeny is robust to taxon sampling. Having resolved the deep-level relationships of Arundinarieae, we illuminate how chloroplast phylogenomics can be used for elucidating difficult phylogeny at low taxonomic levels in intractable plant groups. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)
Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn
2009-01-01
Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
Complete genome sequence of Thioalkalivibrio sp. K90mix
Muyzer, Gerard; Sorokin, Dimitry Y.; Mavromatis, Konstantinos; Lapidus, Alla; Foster, Brian; Sun, Hui; Ivanova, Natalia; Pati, Amrita; D'haeseleer, Patrik; Woyke, Tanja; Kyrpides, Nikos C.
2011-01-01
Thioalkalivibrio sp. K90mix is an obligately chemolithoautotrophic, natronophilic sulfur-oxidizing bacterium (SOxB) belonging to the family Ectothiorhodospiraceae within the Gammaproteobacteria. The strain was isolated from a mixture of sediment samples obtained from different soda lakes located in the Kulunda Steppe (Altai, Russia) based on its extreme potassium carbonate tolerance as an enrichment method. Here we report the complete genome sequence of strain K90mix and its annotation. The genome was sequenced within the Joint Genome Institute Community Sequencing Program, because of its relevance to the sustainable removal of sulfide from wastewater and gas streams. PMID:22675584
Guo, Zixiao; Li, Xinnian; He, Ziwen; Yang, Yuchen; Wang, Wenqing; Zhong, Cairong; Greenberg, Anthony J; Wu, Chung-I; Duke, Norman C; Shi, Suhua
2018-04-01
The projected increases in sea levels are expected to affect coastal ecosystems. Tropical communities, anchored by mangrove trees and having experienced frequent past sea level changes, appear to be vibrant at present. However, any optimism about the resilience of these ecosystems is premature because the impact of past climate events may not be reflected in the current abundance. To assess the impact of historical sea level changes, we conducted an extensive genetic diversity survey on the Indo-Malayan coast, a hotspot with a large global mangrove distribution. A survey of 26 populations in six species reveals extremely low genome-wide nucleotide diversity and hence very small effective population sizes (N e ) in all populations. Whole-genome sequencing of three mangrove species further shows the decline in N e to be strongly associated with the speed of past changes in sea level. We also used a recent series of flooding events in Yalong Bay, southern China, to test the robustness of mangroves to sea level changes in relation to their genetic diversity. The events resulted in the death of half of the mangrove trees in this area. Significantly, less genetically diverse mangrove species suffered much greater destruction. The dieback was accompanied by a drastic reduction in local invertebrate biodiversity. We thus predict that tropical coastal communities will be seriously endangered as the global sea level rises. Well-planned coastal development near mangrove forests will be essential to avert this crisis. © 2017 John Wiley & Sons Ltd.
Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates
McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg
2009-01-01
Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110
Jiang, Zhi J; Castoe, Todd A; Austin, Christopher C; Burbrink, Frank T; Herron, Matthew D; McGuire, Jimmy A; Parkinson, Christopher L; Pollock, David D
2007-01-01
Background The mitochondrial genomes of snakes are characterized by an overall evolutionary rate that appears to be one of the most accelerated among vertebrates. They also possess other unusual features, including short tRNAs and other genes, and a duplicated control region that has been stably maintained since it originated more than 70 million years ago. Here, we provide a detailed analysis of evolutionary dynamics in snake mitochondrial genomes to better understand the basis of these extreme characteristics, and to explore the relationship between mitochondrial genome molecular evolution, genome architecture, and molecular function. We sequenced complete mitochondrial genomes from Slowinski's corn snake (Pantherophis slowinskii) and two cottonmouths (Agkistrodon piscivorus) to complement previously existing mitochondrial genomes, and to provide an improved comparative view of how genome architecture affects molecular evolution at contrasting levels of divergence. Results We present a Bayesian genetic approach that suggests that the duplicated control region can function as an additional origin of heavy strand replication. The two control regions also appear to have different intra-specific versus inter-specific evolutionary dynamics that may be associated with complex modes of concerted evolution. We find that different genomic regions have experienced substantial accelerated evolution along early branches in snakes, with different genes having experienced dramatic accelerations along specific branches. Some of these accelerations appear to coincide with, or subsequent to, the shortening of various mitochondrial genes and the duplication of the control region and flanking tRNAs. Conclusion Fluctuations in the strength and pattern of selection during snake evolution have had widely varying gene-specific effects on substitution rates, and these rate accelerations may have been functionally related to unusual changes in genomic architecture. The among-lineage and among-gene variation in rate dynamics observed in snakes is the most extreme thus far observed in animal genomes, and provides an important study system for further evaluating the biochemical and physiological basis of evolutionary pressures in vertebrate mitochondria. PMID:17655768
Pyramiding genes and alleles for improving energy cane biomass yield
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ming, Ray; Nagai, Chifumi; Yu, Qingyi
The overall goal of this project is to identify genes and gene interaction networks contributed to the extreme segregants with 30 folds biomass yield difference in sugarcane F2 populations. Towards achieving this goal, yield trials of 108 F2 extreme segregants from S. officinarum LA Purple and S. robustum MOL5829 (LM population) were carried out in two locations in three years. A yield trial of the second F2 population from S. officinarum LA Purple and S. spontaneum US56-14-4 (LU population) was installed in the summer of 2014 and the first set of yield component data was collected. For genotyping, transcriptomes frommore » leaves and stalks of 70 extreme segregants of the LM F2 population and 119 individuals of the LU F2 populations were sequenced. The genomes of 91 F1 individuals from the LM populations are being sequenced to construct ultra-high density genetic maps for each of the two parents for both assisting the LA Purple genome assembling and for testing a hypothesis of female restitution. The genomes of 110 F2 individuals from single F1 in the LU population, a different set from the 119 F2 individuals used for transcriptome sequencing, are being sequenced for mapping genes and QTLs affecting biomass yield and for testing a hypothesis of female restitution. Gene expression analysis between extreme segregants of high and low biomass yield showed up-regulation of cellulose synthase, cellulose, and xylan synthase in high biomass yield segregants among 3,274 genes differentially expressed between the two extremes. Our transcriptome results revealed not only the increment of cell wall biosynthesis pathway is essential, but the rapid turnover of certain cell wall polymers as well as carbohydrate partitioning are also important for recycling and energy conservation during rapid cell growth in high biomass sugarcane. Seventeen differentially expressed genes in auxin, one in ethylene and one in gibberellin related signaling and biosynthesis pathways were identified, which could potentially regulate biomass yield. Differentially expressed genes, PIF3 and EIL5, involved in gibberellin and ethylene pathway could play an important role in biomass accumulation. Differential gene expression analysis was also carried out on the LU population. High-biomass yield was mainly determined by assimilation of carbon in source tissues. The high-level expression of fermentative genes in the low-biomass group was likely induced by their low-energy status. The haploid (tetraploid) genome of S. spontanium AP85-441 was sequenced with chromosome level assembly and allele defined annotation. This reference genome along with the upcoming S. officinarum genome will allow us to identify genes and alleles contributed to biomass yield.« less
Lee, Jungeun; Shin, Seung Chul; Kim, Su Jin; Kim, Bum-Keun; Hong, Soon Gyu; Kim, Eun Hye; Park, Hyun
2012-01-01
Sphingomonas sp. strain PAMC 26617 has been isolated from an Arctic lichen Umbilicaria sp. on the Svalbard Islands. Here we present the draft genome sequence of this strain, which represents a valuable resource for understanding the symbiotic mechanisms between endosymbiotic bacteria and lichens surviving in extreme environments. PMID:22582371
Transposable elements as a molecular evolutionary force
NASA Technical Reports Server (NTRS)
Fedoroff, N. V.
1999-01-01
This essay addresses the paradoxes of the complex and highly redundant genomes. The central theses developed are that: (1) the distinctive feature of complex genomes is the existence of epigenetic mechanisms that permit extremely high levels of both tandem and dispersed redundancy; (2) the special contribution of transposable elements is to modularize the genome; and (3) the labilizing forces of recombination and transposition are just barely contained, giving a dynamic genetic system of ever increasing complexity that verges on the chaotic.
De Novo Protein Structure Prediction
NASA Astrophysics Data System (ADS)
Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram
An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.
Koiffmann, Celia Priszkulnik
2012-01-01
In recent decades, obesity has reached epidemic proportions worldwide and became a major concern in public health. Despite heritability estimates of 40 to 70% and the long-recognized genetic basis of obesity in a number of rare cases, the list of common obesity susceptibility variants by the currently published genome-wide association studies (GWASs) only explain a small proportion of the individual variation in risk of obesity. It was not until very recently that GWASs of copy number variants (CNVs) in individuals with extreme phenotypes reported a number of large and rare CNVs conferring high risk to obesity, and specifically deletions on chromosome 16p11.2. In this paper, we comment on the recent advances in the field of genetics of obesity with an emphasis on the genes and genomic regions implicated in highly penetrant forms of obesity associated with developmental disorders. Array genomic hybridization in this patient population has afforded discovery opportunities for CNVs that have not previously been detectable. This information can be used to generate new diagnostic arrays and sequencing platforms, which will likely enhance detection of known genetic conditions with the potential to elucidate new disease genes and ultimately help in developing a next-generation sequencing protocol relevant to clinical practice. PMID:23316347
Defining the biological bases of individual differences in musicality.
Gingras, Bruno; Honing, Henkjan; Peretz, Isabelle; Trainor, Laurel J; Fisher, Simon E
2015-03-19
Advances in molecular technologies make it possible to pinpoint genomic factors associated with complex human traits. For cognition and behaviour, identification of underlying genes provides new entry points for deciphering the key neurobiological pathways. In the past decade, the search for genetic correlates of musicality has gained traction. Reports have documented familial clustering for different extremes of ability, including amusia and absolute pitch (AP), with twin studies demonstrating high heritability for some music-related skills, such as pitch perception. Certain chromosomal regions have been linked to AP and musical aptitude, while individual candidate genes have been investigated in relation to aptitude and creativity. Most recently, researchers in this field started performing genome-wide association scans. Thus far, studies have been hampered by relatively small sample sizes and limitations in defining components of musicality, including an emphasis on skills that can only be assessed in trained musicians. With opportunities to administer standardized aptitude tests online, systematic large-scale assessment of musical abilities is now feasible, an important step towards high-powered genome-wide screens. Here, we offer a synthesis of existing literatures and outline concrete suggestions for the development of comprehensive operational tools for the analysis of musical phenotypes. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes
Cannon, Steven B.; Sterck, Lieven; Rombauts, Stephane; Sato, Shusei; Cheung, Foo; Gouzy, Jérôme; Wang, Xiaohong; Mudge, Joann; Vasdewani, Jayprakash; Schiex, Thomas; Spannagl, Manuel; Monaghan, Erin; Nicholson, Christine; Humphray, Sean J.; Schoof, Heiko; Mayer, Klaus F. X.; Rogers, Jane; Quétier, Francis; Oldroyd, Giles E.; Debellé, Frédéric; Cook, Douglas R.; Retzel, Ernest F.; Roe, Bruce A.; Town, Christopher D.; Tabata, Satoshi; Van de Peer, Yves; Young, Nevin D.
2006-01-01
Genome sequencing of the model legumes, Medicago truncatula and Lotus japonicus, provides an opportunity for large-scale sequence-based comparison of two genomes in the same plant family. Here we report synteny comparisons between these species, including details about chromosome relationships, large-scale synteny blocks, microsynteny within blocks, and genome regions lacking clear correspondence. The Lotus and Medicago genomes share a minimum of 10 large-scale synteny blocks, each with substantial collinearity and frequently extending the length of whole chromosome arms. The proportion of genes syntenic and collinear within each synteny block is relatively homogeneous. Medicago–Lotus comparisons also indicate similar and largely homogeneous gene densities, although gene-containing regions in Mt occupy 20–30% more space than Lj counterparts, primarily because of larger numbers of Mt retrotransposons. Because the interpretation of genome comparisons is complicated by large-scale genome duplications, we describe synteny, synonymous substitutions and phylogenetic analyses to identify and date a probable whole-genome duplication event. There is no direct evidence for any recent large-scale genome duplication in either Medicago or Lotus but instead a duplication predating speciation. Phylogenetic comparisons place this duplication within the Rosid I clade, clearly after the split between legumes and Salicaceae (poplar). PMID:17003129
Metcalfe, Cushla J.; Casane, Didier
2013-01-01
Very large genomes, that is, those above 20 Gb, are rare but widely distributed throughout the eukaryotes. They are found within the diatoms, dinoflagellates, metazoans and green plants, but so far have not been found in the excavates. There is a known positive correlation between genome size and the proportion of the genome composed of transposable elements (TEs). Very large genomes may therefore be expected to be almost entirely composed of TEs. Of the large genomes examined, in the angiosperms, gymnosperms and the dinoflagellates only a small portion of the genome was identified as TEs, most of these genomes were unidentified and may be novel or diverse TEs. In the salamanders and lungfish, 25 to 47% of the genome were identifiable retrotransposons, that is, TEs that copy themselves before insertion. However, the predominant class of TEs found in the lungfish was not the same as that found in the salamanders. The little data we have at the moment suggests therefore that the diversity and abundance of TEs is variable between taxa with large genomes, similar to patterns found in taxa with smaller genomes. Based on results from the human genome, we suggest that the ‘missing’ portion of the lungfish and salamander genomes are old, highly divergent, and therefore inactive copies of TEs. The data available indicate that, unlike plants with large genomes, neither the lungfish nor the salamanders show an increased risk of extinction. Based on a slow rate of DNA loss in salamanders it has been suggested that the large salamander genome is the result of run-away genome expansion involving genome size increases via TE proliferation associated with reduced recombination rate. We know of no studies on DNA loss or recombination rates in lungfish genomes, however a similar scenario could describe the process of genome expansion in the lungfish. A series of waves of TE transposition and sequence decay would describe the pattern of TE content seen in both the lungfish and the salamanders. The lungfish and salamanders, therefore, may accommodate their large load of TEs because these TEs have accumulated gradually over a long period of time and have been subject to inactivation and decay. PMID:24616835
Interactive Exploration on Large Genomic Datasets.
Tu, Eric
2016-01-01
The prevalence of large genomics datasets has made the the need to explore this data more important. Large sequencing projects like the 1000 Genomes Project [1], which reconstructed the genomes of 2,504 individuals sampled from 26 populations, have produced over 200TB of publically available data. Meanwhile, existing genomic visualization tools have been unable to scale with the growing amount of larger, more complex data. This difficulty is acute when viewing large regions (over 1 megabase, or 1,000,000 bases of DNA), or when concurrently viewing multiple samples of data. While genomic processing pipelines have shifted towards using distributed computing techniques, such as with ADAM [4], genomic visualization tools have not. In this work we present Mango, a scalable genome browser built on top of ADAM that can run both locally and on a cluster. Mango presents a combination of different optimizations that can be combined in a single application to drive novel genomic visualization techniques over terabytes of genomic data. By building visualization on top of a distributed processing pipeline, we can perform visualization queries over large regions that are not possible with current tools, and decrease the time for viewing large data sets. Mango is part of the Big Data Genomics project at University of California-Berkeley [25] and is published under the Apache 2 license. Mango is available at https://github.com/bigdatagenomics/mango.
Evolutionary Genomics of Fast Evolving Tunicates
Berná, Luisa; Alvarez-Valin, Fernando
2014-01-01
Tunicates have been extensively studied because of their crucial phylogenetic location (the closest living relatives of vertebrates) and particular developmental plan. Recent genome efforts have disclosed that tunicates are also remarkable in their genome organization and molecular evolutionary patterns. Here, we review these latter aspects, comparing the similarities and specificities of two model species of the group: Oikopleura dioica and Ciona intestinalis. These species exhibit great genome plasticity and Oikopleura in particular has undergone a process of extreme genome reduction and compaction that can be explained in part by gene loss, but is mostly due to other mechanisms such as shortening of intergenic distances and introns, and scarcity of mobile elements. In Ciona, genome reorganization was less severe being more similar to the other chordates in several aspects. Rates and patterns of molecular evolution are also peculiar in tunicates, being Ciona about 50% faster than vertebrates and Oikopleura three times faster. In fact, the latter species is considered as the fastest evolving metazoan recorded so far. Two processes of increase in evolutionary rates have taken place in tunicates. One of them is more extreme, and basically restricted to genes encoding regulatory proteins (transcription regulators, chromatin remodeling proteins, and metabolic regulators), and the other one is less pronounced but affects the whole genome. Very likely adaptive evolution has played a very significant role in the first, whereas the functional and/or evolutionary causes of the second are less clear and the evidence is not conclusive. The evidences supporting the incidence of increased mutation and less efficient negative selection are presented and discussed. PMID:25008364
A comprehensive evaluation of assembly scaffolding tools
2014-01-01
Background Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity. PMID:24581555
Finishing bacterial genome assemblies with Mix.
Soueidan, Hayssam; Maurier, Florence; Groppi, Alexis; Sirand-Pugnet, Pascal; Tardy, Florence; Citti, Christine; Dupuy, Virginie; Nikolski, Macha
2013-01-01
Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase. In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length. We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects. Mix is implemented in Python and is available at https://github.com/cbib/MIX, novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/.
Functional genomics of physiological plasticity and local adaptation in killifish.
Whitehead, Andrew; Galvez, Fernando; Zhang, Shujun; Williams, Larissa M; Oleksiak, Marjorie F
2011-01-01
Evolutionary solutions to the physiological challenges of life in highly variable habitats can span the continuum from evolution of a cosmopolitan plastic phenotype to the evolution of locally adapted phenotypes. Killifish (Fundulus sp.) have evolved both highly plastic and locally adapted phenotypes within different selective contexts, providing a comparative system in which to explore the genomic underpinnings of physiological plasticity and adaptive variation. Importantly, extensive variation exists among populations and species for tolerance to a variety of stressors, and we exploit this variation in comparative studies to yield insights into the genomic basis of evolved phenotypic variation. Notably, species of Fundulus occupy the continuum of osmotic habitats from freshwater to marine and populations within Fundulus heteroclitus span far greater variation in pollution tolerance than across all species of fish. Here, we explore how transcriptome regulation underpins extreme physiological plasticity on osmotic shock and how genomic and transcriptomic variation is associated with locally evolved pollution tolerance. We show that F. heteroclitus quickly acclimate to extreme osmotic shock by mounting a dramatic rapid transcriptomic response including an early crisis control phase followed by a tissue remodeling phase involving many regulatory pathways. We also show that convergent evolution of locally adapted pollution tolerance involves complex patterns of gene expression and genome sequence variation, which is confounded with body-weight dependence for some genes. Similarly, exploiting the natural phenotypic variation associated with other established and emerging model organisms is likely to greatly accelerate the pace of discovery of the genomic basis of phenotypic variation.
Functional Genomics of Physiological Plasticity and Local Adaptation in Killifish
Galvez, Fernando; Zhang, Shujun; Williams, Larissa M.; Oleksiak, Marjorie F.
2011-01-01
Evolutionary solutions to the physiological challenges of life in highly variable habitats can span the continuum from evolution of a cosmopolitan plastic phenotype to the evolution of locally adapted phenotypes. Killifish (Fundulus sp.) have evolved both highly plastic and locally adapted phenotypes within different selective contexts, providing a comparative system in which to explore the genomic underpinnings of physiological plasticity and adaptive variation. Importantly, extensive variation exists among populations and species for tolerance to a variety of stressors, and we exploit this variation in comparative studies to yield insights into the genomic basis of evolved phenotypic variation. Notably, species of Fundulus occupy the continuum of osmotic habitats from freshwater to marine and populations within Fundulus heteroclitus span far greater variation in pollution tolerance than across all species of fish. Here, we explore how transcriptome regulation underpins extreme physiological plasticity on osmotic shock and how genomic and transcriptomic variation is associated with locally evolved pollution tolerance. We show that F. heteroclitus quickly acclimate to extreme osmotic shock by mounting a dramatic rapid transcriptomic response including an early crisis control phase followed by a tissue remodeling phase involving many regulatory pathways. We also show that convergent evolution of locally adapted pollution tolerance involves complex patterns of gene expression and genome sequence variation, which is confounded with body-weight dependence for some genes. Similarly, exploiting the natural phenotypic variation associated with other established and emerging model organisms is likely to greatly accelerate the pace of discovery of the genomic basis of phenotypic variation. PMID:20581107
Extremely thermophilic microorganisms for biomass conversion: status and prospects.
Blumer-Schuette, Sara E; Kataeva, Irina; Westpheling, Janet; Adams, Michael Ww; Kelly, Robert M
2008-06-01
Many microorganisms that grow at elevated temperatures are able to utilize a variety of carbohydrates pertinent to the conversion of lignocellulosic biomass to bioenergy. The range of substrates utilized depends on growth temperature optimum and biotope. Hyperthermophilic marine archaea (T(opt)>or=80 degrees C) utilize alpha- and beta-linked glucans, such as starch, barley glucan, laminarin, and chitin, while hyperthermophilic marine bacteria (T(opt)>or=80 degrees C) utilize the same glucans as well as hemicellulose, such as xylans and mannans. However, none of these organisms are able to efficiently utilize crystalline cellulose. Among the thermophiles, this ability is limited to a few terrestrial bacteria with upper temperature limits for growth near 75 degrees C. Deconstruction of crystalline cellulose by these extreme thermophiles is achieved by 'free' primary cellulases, which are distinct from those typically associated with large multi-enzyme complexes known as cellulosomes. These primary cellulases also differ from the endoglucanases (referred to here as 'secondary cellulases') reported from marine hyperthermophiles that show only weak activity toward cellulose. Many extremely thermophilic enzymes implicated in the deconstruction of lignocellulose can be identified in genome sequences, and many more promising biocatalysts probably remain annotated as 'hypothetical proteins'. Characterization of these enzymes will require intensive effort but is likely to generate new opportunities for the use of renewable resources as biofuels.
Complete genome sequence of the bioleaching bacterium Leptospirillum sp. group II strain CF-1.
Ferrer, Alonso; Bunk, Boyke; Spröer, Cathrin; Biedendieck, Rebekka; Valdés, Natalia; Jahn, Martina; Jahn, Dieter; Orellana, Omar; Levicán, Gloria
2016-03-20
We describe the complete genome sequence of Leptospirillum sp. group II strain CF-1, an acidophilic bioleaching bacterium isolated from an acid mine drainage (AMD). This work provides data to gain insights about adaptive response of Leptospirillum spp. to the extreme conditions of bioleaching environments. Copyright © 2016 Elsevier B.V. All rights reserved.
Trubitsyn, Denis; Geurink, Corey; Pikuta, Elena; Lefèvre, Christopher T.; McShan, W. Michael; Gillaspy, Allison F.
2014-01-01
Desulfonatronum thiodismutans strain MLF1, an alkaliphilic bacterium capable of sulfate reduction, was isolated from Mono Lake, California. Here we report the 3.92-Mb draft genome sequence comprising 34 contigs and some results of its automated annotation. These data will improve our knowledge of mechanisms by which bacteria withstand extreme environments. PMID:25081260
Draft Genome Sequence of the Psychrophilic and Alkaliphilic Rhodonellum psychrophilum Strain GCM71T.
Hauptmann, Aviaja L; Glaring, Mikkel A; Hallin, Peter F; Priemé, Anders; Stougaard, Peter
2013-12-05
Rhodonellum psychrophilum GCM71(T), isolated from the cold and alkaline submarine ikaite columns in the Ikka Fjord in Greenland, displays optimal growth at 5 to 10°C and pH 10. Here, we report the draft genome sequence of this strain, which may provide insight into the mechanisms of adaptation to these extreme conditions.
The genome of the Antarctic-endemic copepod, Tigriopus kingsejongensis.
Kang, Seunghyun; Ahn, Do-Hwan; Lee, Jun Hyuck; Lee, Sung Gu; Shin, Seung Chul; Lee, Jungeun; Min, Gi-Sik; Lee, Hyoungseok; Kim, Hyun-Woo; Kim, Sanghee; Park, Hyun
2017-01-01
The Antarctic intertidal zone is continuously subjected to extremely fluctuating biotic and abiotic stressors. The West Antarctic Peninsula is the most rapidly warming region on Earth. Organisms living in Antarctic intertidal pools are therefore interesting for research into evolutionary adaptation to extreme environments and the effects of climate change. We report the whole genome sequence of the Antarctic-endemic harpacticoid copepod Tigriopus kingsejongensi . The 37 Gb raw DNA sequence was generated using the Illumina Miseq platform. Libraries were prepared with 65-fold coverage and a total length of 295 Mb. The final assembly consists of 48 368 contigs with an N50 contig length of 17.5 kb, and 27 823 scaffolds with an N50 contig length of 159.2 kb. A total of 12 772 coding genes were inferred using the MAKER annotation pipeline. Comparative genome analysis revealed that T. kingsejongensis -specific genes are enriched in transport and metabolism processes. Furthermore, rapidly evolving genes related to energy metabolism showed positive selection signatures. The T. kingsejongensis genome provides an interesting example of an evolutionary strategy for Antarctic cold adaptation, and offers new genetic insights into Antarctic intertidal biota. © The Author 2017. Published by Oxford University Press.
The genome of the Antarctic-endemic copepod, Tigriopus kingsejongensis
Kang, Seunghyun; Ahn, Do-Hwan; Lee, Jun Hyuck; Lee, Sung Gu; Shin, Seung Chul; Lee, Jungeun; Min, Gi-Sik; Lee, Hyoungseok
2017-01-01
Abstract Background: The Antarctic intertidal zone is continuously subjected to extremely fluctuating biotic and abiotic stressors. The West Antarctic Peninsula is the most rapidly warming region on Earth. Organisms living in Antarctic intertidal pools are therefore interesting for research into evolutionary adaptation to extreme environments and the effects of climate change. Findings: We report the whole genome sequence of the Antarctic-endemic harpacticoid copepod Tigriopus kingsejongensi. The 37 Gb raw DNA sequence was generated using the Illumina Miseq platform. Libraries were prepared with 65-fold coverage and a total length of 295 Mb. The final assembly consists of 48 368 contigs with an N50 contig length of 17.5 kb, and 27 823 scaffolds with an N50 contig length of 159.2 kb. A total of 12 772 coding genes were inferred using the MAKER annotation pipeline. Comparative genome analysis revealed that T. kingsejongensis-specific genes are enriched in transport and metabolism processes. Furthermore, rapidly evolving genes related to energy metabolism showed positive selection signatures. Conclusions: The T. kingsejongensis genome provides an interesting example of an evolutionary strategy for Antarctic cold adaptation, and offers new genetic insights into Antarctic intertidal biota. PMID:28369352
Association between SCO2 mutation and extreme myopia in Japanese patients.
Wakazono, Tomotaka; Miyake, Masahiro; Yamashiro, Kenji; Yoshikawa, Munemitsu; Yoshimura, Nagahisa
2016-07-01
To investigate the role of SCO2 in extreme myopia of Japanese patients. In total, 101 Japanese patients with extreme myopia (axial length of ≥30 mm) OU at the Kyoto University Hospital were included in this study. Exon 2 of SCO2 was sequenced by conventional Sanger sequencing. The detected variants were assessed using in silico prediction programs: SIFT, PolyPhen-2 and MutationTaster. To determine the frequency of the mutations in normal subjects, we referred to the 1000 Genomes Project data and the Human Genetic Variation Database (HGVD) in the Human Genetic Variation Browser. The average age of the participants was 62.9 ± 12.7 years. There were 31 males (30.7 %) and 70 females. Axial lengths were 31.76 ± 1.17 mm OD and 31.40 ± 1.07 mm OS, and 176 eyes (87.6 %) out of 201 eyes had myopic maculopathy of grade 2 or more. Among the 101 extremely myopic patients, one mutation (c.290 C > T;p.Ala97Val) in SCO2 was detected. This mutation was not found in the 1000 Genomes Project data or HGVD data. Variant type of the mutation was nonsynonymous. Although the SIFT prediction score was 0.350, the PolyPhen-2 probability was 0.846, thus predicting its pathogenicity to be possibly damaging. MutationTaster PhyloP was 1.268, suggesting that the mutation is conserved. We identified one novel possibility of an extreme myopia-causing mutation in SCO2. No other disease-causing mutation was found in 101 extremely myopic Japanese patients, suggesting that SCO2 plays a limited role in Japanese extreme myopia. Further investigation is required for better understanding of extreme myopia.
A HIGH COVERAGE GENOME SEQUENCE FROM AN ARCHAIC DENISOVAN INDIVIDUAL
Meyer, Matthias; Kircher, Martin; Gansauge, Marie-Theres; Li, Heng; Racimo, Fernando; Mallick, Swapan; Schraiber, Joshua G.; Jay, Flora; Prüfer, Kay; de Filippo, Cesare; Sudmant, Peter H.; Alkan, Can; Fu, Qiaomei; Do, Ron; Rohland, Nadin; Tandon, Arti; Siebauer, Michael; Green, Richard E.; Bryc, Katarzyna; Briggs, Adrian W.; Stenzel, Udo; Dabney, Jesse; Shendure, Jay; Kitzman, Jacob; Hammer, Michael F.; Shunkov, Michael V.; Derevianko, Anatoli P.; Patterson, Nick; Andrés, Aida M.; Eichler, Evan E.; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante
2013-01-01
We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans. PMID:22936568
Genomic signatures of selection at linked sites: unifying the disparity among species
Cutter, Asher D.; Payseur, Bret A.
2014-01-01
Population genetics theory supplies powerful predictions about how natural selection interacts with genetic linkage to sculpt the genomic landscape of nucleotide polymorphism. Both the spread of beneficial mutations and removal of deleterious mutations act to depress polymorphism levels, especially in low-recombination regions. However, empiricists have documented extreme disparities among species. Here we characterize the dominant features that could drive variation in linked selection among species, including roles for selective sweeps being ‘hard’ or ‘soft’, and concealing by demography and genomic confounds. We advocate targeted studies of close relatives to unify our understanding of how selection and linkage interact to shape genome evolution. PMID:23478346
Extremophiles in Household Water Heaters
NASA Astrophysics Data System (ADS)
Wilpiszeski, R.; House, C. H.
2016-12-01
A significant fraction of Earth's microbial diversity comes from species living in extreme environments, but natural extreme environments can be difficult to access. Manmade systems like household water heaters serve as an effective proxy for thermophilic environments that are otherwise difficult to sample directly. As such, we are investigating the biogeography, taxonomic distribution, and evolution of thermophiles growing in domestic water heaters. Citizen scientists collected hot tap water culture- and filter- samples from 101 homes across the United States. We recovered a single species of thermophilic heterotroph from culture samples inoculated from water heaters across the United States, Thermus scotoductus. Whole-genome sequencing was conducted to better understand the distribution and evolution of this single species. We have also sequenced hyper-variable regions of the 16S rRNA gene from whole-community filter samples to identify the broad diversity and distribution of microbial cells captured from each water heater. These results shed light on the processes that shape thermophilic populations and genomes at a spatial resolution that is difficult to access in naturally occurring extreme ecosystems.
2013-01-01
Background Macrosatellite repeats (MSRs), usually spanning hundreds of kilobases of genomic DNA, comprise a significant proportion of the human genome. Because of their highly polymorphic nature, MSRs represent an extreme example of copy number variation, but their structure and function is largely understudied. Here, we describe a detailed study of six autosomal and two X chromosomal MSRs among 270 HapMap individuals from Central Europe, Asia and Africa. Copy number variation, stability and genetic heterogeneity of the autosomal macrosatellite repeats RS447 (chromosome 4p), MSR5p (5p), FLJ40296 (13q), RNU2 (17q) and D4Z4 (4q and 10q) and X chromosomal DXZ4 and CT47 were investigated. Results Repeat array size distribution analysis shows that all of these MSRs are highly polymorphic with the most genetic variation among Africans and the least among Asians. A mitotic mutation rate of 0.4-2.2% was observed, exceeding meiotic mutation rates and possibly explaining the large size variability found for these MSRs. By means of a novel Bayesian approach, statistical support for a distinct multimodal rather than a uniform allele size distribution was detected in seven out of eight MSRs, with evidence for equidistant intervals between the modes. Conclusions The multimodal distributions with evidence for equidistant intervals, in combination with the observation of MSR-specific constraints on minimum array size, suggest that MSRs are limited in their configurations and that deviations thereof may cause disease, as is the case for facioscapulohumeral muscular dystrophy. However, at present we cannot exclude that there are mechanistic constraints for MSRs that are not directly disease-related. This study represents the first comprehensive study of MSRs in different human populations by applying novel statistical methods and identifies commonalities and differences in their organization and function in the human genome. PMID:23496858
Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong
2011-01-01
Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308 034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin–Benson–Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative autotrophs appear to be present and contribute to biomass production. PMID:20927138
Trimarchi, Michael P.; Yan, Pearlly; Groden, Joanna; Bundschuh, Ralf; Goodfellow, Paul J.
2017-01-01
Background DNA methylation is a stable epigenetic mark that is frequently altered in tumors. DNA methylation features are attractive biomarkers for disease states given the stability of DNA methylation in living cells and in biologic specimens typically available for analysis. Widespread accumulation of methylation in regulatory elements in some cancers (specifically the CpG island methylator phenotype, CIMP) can play an important role in tumorigenesis. High resolution assessment of CIMP for the entire genome, however, remains cost prohibitive and requires quantities of DNA not available for many tissue samples of interest. Genome-wide scans of methylation have been undertaken for large numbers of tumors, and higher resolution analyses for a limited number of cancer specimens. Methods for analyzing such large datasets and integrating findings from different studies continue to evolve. An approach for comparison of findings from a genome-wide assessment of the methylated component of tumor DNA and more widely applied methylation scans was developed. Methods Methylomes for 76 primary endometrial cancer and 12 normal endometrial samples were generated using methylated fragment capture and second generation sequencing, MethylCap-seq. Publically available Infinium HumanMethylation 450 data from The Cancer Genome Atlas (TCGA) were compared to MethylCap-seq data. Results Analysis of methylation in promoter CpG islands (CGIs) identified a subset of tumors with a methylator phenotype. We used a two-stage approach to develop a 13-region methylation signature associated with a “hypermethylator state.” High level methylation for the 13-region methylation signatures was associated with mismatch repair deficiency, high mutation rate, and low somatic copy number alteration in the TCGA test set. In addition, the signature devised showed good agreement with previously described methylation clusters devised by TCGA. Conclusion We identified a methylation signature for a “hypermethylator phenotype” in endometrial cancer and developed methods that may prove useful for identifying extreme methylation phenotypes in other cancers. PMID:28278225
CoCoNUT: an efficient system for the comparison and analysis of genomes
2008-01-01
Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477
González-Rodríguez, A; Munilla, S; Mouresan, E F; Cañas-Álvarez, J J; Baro, J A; Molina, A; Díaz, C; Altarriba, J; Piedrafita, J; Varona, L
2017-10-01
The Spanish local beef cattle breeds have most likely common origin followed by a process of differentiation. This particular historical evolution has most probably left detectable signatures in the genome. The objective of this study was to identify genomic regions associated with differentiation processes in seven Spanish autochthonous populations (Asturiana de los Valles (AV), Avileña-Negra Ibérica (ANI), Bruna dels Pirineus (BP), Morucha (Mo), Pirenaica (Pi), Retinta (Re) and Rubia Gallega (RG)). The BovineHD 777K BeadChip was used on 342 individuals (AV, n=50; ANI, n=48; BP, n=50; Mo, n=50; Pi, n=48; Re, n=48; RG, n=48) chosen to be as unrelated as possible. We calculated the fixation index (F ST ) and performed a Bayesian analysis named SelEstim. The output of both procedures was very similar, although the Bayesian analysis provided a richer inference and allowed us to calculate significance thresholds by generating a pseudo-observed data set from the estimated posterior distributions. We identified a very large number of genomic regions, but when a very restrictive significance threshold was applied these regions were reduced to only 10. Among them, four regions can be highlighted because they comprised a large number of single nucleotide polymorphisms and showed extremely high signals (Kullback-Leiber divergence (KLD)>6). They are located in BTA 2 (5 575 950 to 10 152 228 base pairs (bp)), BTA 5 (17 596 734 to 18 850 702 bp), BTA 6 (37 853 912 to 39 441 548 bp) and BTA 18 (13 345 515 to 15 243 838 bp) and harbor, among others, the MSTN (Myostatin), KIT-LG (KIT Ligand), LAP3 (leucine aminopeptidase 3), NAPCG (non-SMC condensing I complex, subunit G), LCORL (ligand dependent nuclear receptor corepressor-like) and MC1R (Melanocortin 1 receptor) genes. Knowledge on these genomic regions allows to identify potential targets of recent selection and helps to define potential candidate genes associated with traits of interest, such as coat color, muscle development, fertility, growth, carcass and immunological response.
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures
Pride, David T; Schoenfeld, Thomas
2008-01-01
Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis. PMID:18798991
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.
Pride, David T; Schoenfeld, Thomas
2008-09-17
Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
Martínez-Romero, Esperanza
2012-01-01
We report the complete organelle genome sequences of Trebouxiophyceae sp. strain MX-AZ01, an acidophilic green microalga isolated from a geothermal field in Mexico. This eukaryote has the remarkable ability to thrive in a particular shallow lake with emerging hot springs at the bottom, extremely low pH, and toxic heavy metal concentrations. Trebouxiophyceae sp. MX-AZ01 represents one of few described photosynthetic eukaryotes living in such a hostile environment. The organelle genomes of Trebouxiophyceae sp. MX-AZ01 are remarkable. The plastid genome sequence currently presents the highest G+C content for a trebouxiophyte. The mitochondrial genome sequence is the largest reported to date for the Trebouxiophyceae class of green algae. The analysis of the genome sequences presented here provides insight into the evolution of organelle genomes of trebouxiophytes and green algae. PMID:23104370
Granada, Camille E; Vargas, Luciano K; Sant'Anna, Fernando Hayashi; Balsanelli, Eduardo; Baura, Valter Antonio de; Oliveira Pedrosa, Fábio de; Souza, Emanuel Maltempi de; Falcon, Tiago; Passaglia, Luciane M P
2018-05-17
Lupinus albescens is a resistant cover plant that establishes symbiotic relationships with bacteria belonging to the Bradyrhizobium genus. This symbiosis helps the development of these plants in adverse environmental conditions, such as the ones found in arenized areas of Southern Brazil. This work studied three Bradyrhizobium sp. (AS23, NAS80 and NAS96) isolated from L. albescens plants that grow in extremely poor soils (arenized areas and adjacent grasslands). The genomes of these three strains were sequenced in the Ion Torrent platform using the IonXpress library preparation kit, and presented a total number of bases of 1,230,460,823 for AS23, 1,320,104,022 for NAS80, and 1,236,105,093 for NAS96. The genome comparison with closest strains Bradyrhizobium japonicum USDA6 and Bradyrhizobium diazoefficiens USDA110 showed important variable regions (with less than 80% of similarity). Genes encoding for factors for resistance/tolerance to heavy metal, flagellar motility, response to osmotic and oxidative stresses, heat shock proteins (present only in the three sequenced genomes) could be responsible for the ability of these microorganisms to survive in inhospitable environments. Knowledge about these genomes will provide a foundation for future development of an inoculant bioproduct that should optimize the recovery of degraded soils using cover crops.
Zhou, Yanrong; Lin, Yanli; Wu, Xiaojie; Xiong, Fuyin; Lv, Yuemeng; Zheng, Tao; Huang, Peitang; Chen, Hongxing
2012-02-01
Transgene expression for the mammary gland bioreactor aimed at producing recombinant proteins requires optimized expression vector construction. Previously we presented a hybrid gene locus strategy, which was originally tested with human lactoferrin (hLF) as target transgene, and an extremely high-level expression of rhLF ever been achieved as to 29.8 g/l in mice milk. Here to demonstrate the broad application of this strategy, another 38.4 kb mWAP-htPA hybrid gene locus was constructed, in which the 3-kb genomic coding sequence in the 24-kb mouse whey acidic protein (mWAP) gene locus was substituted by the 17.4-kb genomic coding sequence of human tissue plasminogen activator (htPA), exactly from the start codon to the end codon. Corresponding five transgenic mice lines were generated and the highest expression level of rhtPA in the milk attained as to 3.3 g/l. Our strategy will provide a universal way for the large-scale production of pharmaceutical proteins in the mammary gland of transgenic animals.
Grohar: Automated Visualization of Genome-Scale Metabolic Models and Their Pathways.
Moškon, Miha; Zimic, Nikolaj; Mraz, Miha
2018-05-01
Genome-scale metabolic models (GEMs) have become a powerful tool for the investigation of the entire metabolism of the organism in silico. These models are, however, often extremely hard to reconstruct and also difficult to apply to the selected problem. Visualization of the GEM allows us to easier comprehend the model, to perform its graphical analysis, to find and correct the faulty relations, to identify the parts of the system with a designated function, etc. Even though several approaches for the automatic visualization of GEMs have been proposed, metabolic maps are still manually drawn or at least require large amount of manual curation. We present Grohar, a computational tool for automatic identification and visualization of GEM (sub)networks and their metabolic fluxes. These (sub)networks can be specified directly by listing the metabolites of interest or indirectly by providing reference metabolic pathways from different sources, such as KEGG, SBML, or Matlab file. These pathways are identified within the GEM using three different pathway alignment algorithms. Grohar also supports the visualization of the model adjustments (e.g., activation or inhibition of metabolic reactions) after perturbations are induced.
The identification of somatic genetic alterations that confer sensitivity to pharmacologic inhibitors has led to new cancer therapies. To identify mutations that confer an exceptional dependency, shRNA-based loss-of-function data were analyzed from a dataset of numerous cell lines to reveal genes that are essential in a small subset of cancer cell lines. Once these cell lines were determined, detailed genomic characterization from these cell lines was utilized to ascertain the genomic aberrations that led to this extreme dependency.
Poczai, Péter; Hyvönen, Jaakko
2017-01-01
Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC-rps14 region and 6-kb in the trnG-UCC-psbD, followed by a third <1kb inversion in the trnT sequence.
Hyvönen, Jaakko
2017-01-01
Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC–rps14 region and 6-kb in the trnG-UCC–psbD, followed by a third <1kb inversion in the trnT sequence. PMID:29095905
Sun, Miao; Li, Ning; Dong, Wu; Chen, Zugen; Liu, Qing; Xu, Yiming; He, Guang; Shi, Yongyong; Li, Xin; Hao, Jiajie; Luo, Yang; Shang, Dandan; Lv, Dan; Ma, Fen; Zhang, Dai; Hua, Rui; Lu, Chaoxia; Wen, Yaran; Cao, Lihua; Irvine, Alan D.; McLean, W.H. Irwin; Dong, Qi; Wang, Ming-Rong; Yu, Jun; He, Lin; Lo, Wilson H.Y.; Zhang, Xue
2009-01-01
Congenital generalized hypertrichosis terminalis (CGHT) is a rare condition characterized by universal excessive growth of pigmented terminal hairs and often accompanied with gingival hyperplasia. In the present study, we describe three Han Chinese families with autosomal-dominant CGHT and a sporadic case with extreme CGHT and gingival hyperplasia. We first did a genome-wide linkage scan in a large four-generation family. Our parametric multipoint linkage analysis revealed a genetic locus for CGHT on chromosome 17q24.2-q24.3. Further two-point linkage and haplotyping with microsatellite markers from the same chromosome region confirmed the genetic mapping and showed in all the families a microdeletion within the critical region that was present in all affected individuals but not in unaffected family members. We then carried out copy-number analysis with the Affymetrix Genome-Wide Human SNP Array 6.0 and detected genomic microdeletions of different sizes and with different breakpoints in the three families. We validated these microdeletions by real-time quantitative PCR and confirmed their perfect cosegregation with the disease phenotype in the three families. In the sporadic case, however, we found a de novo microduplication. Two-color interphase FISH analysis demonstrated that the duplication was inverted. These copy-number variations (CNVs) shared a common genomic region in which CNV is not reported in the public database and was not detected in our 434 unrelated Han Chinese normal controls. Thus, pathogenic copy-number mutations on 17q24.2-q24.3 are responsible for CGHT with or without gingival hyperplasia. Our work identifies CGHT as a genomic disorder. PMID:19463983
BactoGeNIE: A large-scale comparative genome visualization for big displays
Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; ...
2015-08-13
The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less
BactoGeNIE: a large-scale comparative genome visualization for big displays
2015-01-01
Background The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. Results In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. Conclusions BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics. PMID:26329021
BactoGeNIE: A large-scale comparative genome visualization for big displays
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aurisano, Jillian; Reda, Khairi; Johnson, Andrew
The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less
Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A
2016-01-01
The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.
The complex hybrid origins of the root knot nematodes revealed through comparative genomics
Kumar, Sujai; Koutsovoulos, Georgios; Blaxter, Mark L.
2014-01-01
Root knot nematodes (RKN) can infect most of the world’s agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by obligatory mitotic parthenogenesis and it has been suggested that these species originated from interspecific hybridizations between unknown parental taxa. We have sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that this species was involved in the hybrid origin of the tropical mitotic parthenogen Meloidogyne incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species Meloidogyne hapla was carried out to trace the evolutionary history of these species’ genomes, and we demonstrate that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome itself revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified, parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and success in agricultural environments may be related to this hybridization, producing transgressive variation on which natural selection can act. It is now clear that studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally. PMID:24860695
Chironomid midges (Diptera, chironomidae) show extremely small genome sizes.
Cornette, Richard; Gusev, Oleg; Nakahara, Yuichi; Shimura, Sachiko; Kikawada, Takahiro; Okuda, Takashi
2015-06-01
Chironomid midges (Diptera; Chironomidae) are found in various environments from the high Arctic to the Antarctic, including temperate and tropical regions. In many freshwater habitats, members of this family are among the most abundant invertebrates. In the present study, the genome sizes of 25 chironomid species were determined by flow cytometry and the resulting C-values ranged from 0.07 to 0.20 pg DNA (i.e. from about 68 to 195 Mbp). These genome sizes were uniformly very small and included, to our knowledge, the smallest genome sizes recorded to date among insects. Small proportion of transposable elements and short intron sizes were suggested to contribute to the reduction of genome sizes in chironomids. We discuss about the possible developmental and physiological advantages of having a small genome size and about putative implications for the ecological success of the family Chironomidae.
Draft Genome Sequence of Bacillus stratosphericus LAMA 585, Isolated from the Atlantic Deep Sea
Cabral, Alencar; Andreote, Fernando Dini; Cavalett, Angélica; Pessatti, Marcos Luiz; Dini-Andreote, Francisco; da Silva, Marcus Adonai Castro
2013-01-01
Bacillus stratosphericus LAMA 585 was isolated from the Mid-Atlantic-Ridge seafloor (5,500-m depth). This bacterium presents the capacity for cellulase, xylanase, and lipase production when growing aerobically in marine-broth media. Genes involved in the tolerance of oligotrophic and extreme conditions and prospection of biotechnological products were annotated in the draft genome (3.7 Mb). PMID:23640380
Koh, Hye Yeon; Lee, Sung Gu; Lee, Jun Hyuck; Doyle, Shawn; Christner, Brent C; Kim, Hak Jun
2012-12-01
The psychrophilic bacterium Paenisporosarcina sp. TG-14 was isolated from sediment-laden stratified basal ice from Taylor Glacier, McMurdo Dry Valleys, Antarctica. Here we report the draft genome sequence of this strain, which may provide useful information on the cold adaptation mechanism in extremely variable environments.
Trubitsyn, Denis; Geurink, Corey; Pikuta, Elena; Lefèvre, Christopher T; McShan, W Michael; Gillaspy, Allison F; Bazylinski, Dennis A
2014-07-31
Desulfonatronum thiodismutans strain MLF1, an alkaliphilic bacterium capable of sulfate reduction, was isolated from Mono Lake, California. Here we report the 3.92-Mb draft genome sequence comprising 34 contigs and some results of its automated annotation. These data will improve our knowledge of mechanisms by which bacteria withstand extreme environments. Copyright © 2014 Trubitsyn et al.
Draft Genome Sequence of the Psychrophilic and Alkaliphilic Rhodonellum psychrophilum Strain GCM71T
Hauptmann, Aviaja L.; Glaring, Mikkel A.; Hallin, Peter F.; Priemé, Anders
2013-01-01
Rhodonellum psychrophilum GCM71T, isolated from the cold and alkaline submarine ikaite columns in the Ikka Fjord in Greenland, displays optimal growth at 5 to 10°C and pH 10. Here, we report the draft genome sequence of this strain, which may provide insight into the mechanisms of adaptation to these extreme conditions. PMID:24309741
Deinococcus geothermalis: The Pool of Extreme Radiation Resistance Genes Shrinks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Makarova, Kira S.; Omelchenko, Marina V.; Gaidamakova, Elena K.
Bacteria of the genus Deinococcus are extremely resistant to ionizing radiation (IR), ultraviolet light (UV) and desiccation. The mesophile Deinococcus radiodurans was the first member of this group whose genome was completely sequenced. Analysis of the genome sequence of D. radiodurans, however, failed to identify unique DNA repair systems. To further delineate the genes underlying the resistance phenotypes, we report the whole-genome sequence of a second Deinococcus species, the thermophile Deinococcus geothermalis, which at itsoptimal growth temperature is as resistant to IR, UV and desiccation as D. radiodurans, and a comparative analysis of the two Deinococcus genomes. Many D. radioduransmore » genes previously implicated in resistance, but for which no sensitive phenotype was observed upon disruption, are absent in D. geothermalis. In contrast, most D. radiodurans genes whose mutants displayed a radiation-sensitive phenotype in D. radiodurans are conserved in D. geothermalis. Supporting the existence of a Deinococcus radiation response regulon, a common palindromic DNA motif was identified in a conserved set of genes associated with resistance, and a dedicated transcriptional regulator was predicted. We present the case that these two species evolved essentially the same diverse set of gene families, and that the extreme stress-resistance phenotypes of the Deinococcus lineage emerged progressively by amassing cell-cleaning systems from different sources, but not by acquisition of novel DNA repair systems. Our reconstruction of the genomic evolution of the Deinococcus-Thermus phylum indicates that the corresponding set of enzymes proliferated mainly in the common ancestor of Deinococcus. Results of the comparative analysis weaken the arguments for a role of higher-order chromosome alignment structures in resistance; more clearly define and substantially revise downward the number of uncharacterized genes that might participate in DNA repair and contribute to resistance; and strengthen the case for a role in survival of systems involved in manganese and iron homeostasis.« less
Auernik, Kathryne S; Maezato, Yukari; Blum, Paul H; Kelly, Robert M
2008-02-01
Despite their taxonomic description, not all members of the order Sulfolobales are capable of oxidizing reduced sulfur species, which, in addition to iron oxidation, is a desirable trait of biomining microorganisms. However, the complete genome sequence of the extremely thermoacidophilic archaeon Metallosphaera sedula DSM 5348 (2.2 Mb, approximately 2,300 open reading frames [ORFs]) provides insights into biologically catalyzed metal sulfide oxidation. Comparative genomics was used to identify pathways and proteins involved (directly or indirectly) with bioleaching. As expected, the M. sedula genome contains genes related to autotrophic carbon fixation, metal tolerance, and adhesion. Also, terminal oxidase cluster organization indicates the presence of hybrid quinol-cytochrome oxidase complexes. Comparisons with the mesophilic biomining bacterium Acidithiobacillus ferrooxidans ATCC 23270 indicate that the M. sedula genome encodes at least one putative rusticyanin, involved in iron oxidation, and a putative tetrathionate hydrolase, implicated in sulfur oxidation. The fox gene cluster, involved in iron oxidation in the thermoacidophilic archaeon Sulfolobus metallicus, was also identified. These iron- and sulfur-oxidizing components are missing from genomes of nonleaching members of the Sulfolobales, such as Sulfolobus solfataricus P2 and Sulfolobus acidocaldarius DSM 639. Whole-genome transcriptional response analysis showed that 88 ORFs were up-regulated twofold or more in M. sedula upon addition of ferrous sulfate to yeast extract-based medium; these included genes for components of terminal oxidase clusters predicted to be involved with iron oxidation, as well as genes predicted to be involved with sulfur metabolism. Many hypothetical proteins were also differentially transcribed, indicating that aspects of the iron and sulfur metabolism of M. sedula remain to be identified and characterized.
Auernik, Kathryne S.; Maezato, Yukari; Blum, Paul H.; Kelly, Robert M.
2008-01-01
Despite their taxonomic description, not all members of the order Sulfolobales are capable of oxidizing reduced sulfur species, which, in addition to iron oxidation, is a desirable trait of biomining microorganisms. However, the complete genome sequence of the extremely thermoacidophilic archaeon Metallosphaera sedula DSM 5348 (2.2 Mb, ∼2,300 open reading frames [ORFs]) provides insights into biologically catalyzed metal sulfide oxidation. Comparative genomics was used to identify pathways and proteins involved (directly or indirectly) with bioleaching. As expected, the M. sedula genome contains genes related to autotrophic carbon fixation, metal tolerance, and adhesion. Also, terminal oxidase cluster organization indicates the presence of hybrid quinol-cytochrome oxidase complexes. Comparisons with the mesophilic biomining bacterium Acidithiobacillus ferrooxidans ATCC 23270 indicate that the M. sedula genome encodes at least one putative rusticyanin, involved in iron oxidation, and a putative tetrathionate hydrolase, implicated in sulfur oxidation. The fox gene cluster, involved in iron oxidation in the thermoacidophilic archaeon Sulfolobus metallicus, was also identified. These iron- and sulfur-oxidizing components are missing from genomes of nonleaching members of the Sulfolobales, such as Sulfolobus solfataricus P2 and Sulfolobus acidocaldarius DSM 639. Whole-genome transcriptional response analysis showed that 88 ORFs were up-regulated twofold or more in M. sedula upon addition of ferrous sulfate to yeast extract-based medium; these included genes for components of terminal oxidase clusters predicted to be involved with iron oxidation, as well as genes predicted to be involved with sulfur metabolism. Many hypothetical proteins were also differentially transcribed, indicating that aspects of the iron and sulfur metabolism of M. sedula remain to be identified and characterized. PMID:18083856
Kelley, Joanna L; Yee, Muh-Ching; Brown, Anthony P; Richardson, Rhea R; Tatarenkov, Andrey; Lee, Clarence C; Harkins, Timothy T; Bustamante, Carlos D; Earley, Ryan L
2016-08-16
The mangrove rivulus (Kryptolebias marmoratus) is one of two preferentially self-fertilizing hermaphroditic vertebrates. This mode of reproduction makes mangrove rivulus an important model for evolutionary and biomedical studies because long periods of self-fertilization result in naturally homozygous genotypes that can produce isogenic lineages without significant limitations associated with inbreeding depression. Over 400 isogenic lineages currently held in laboratories across the globe show considerable among-lineage variation in physiology, behavior, and life history traits that is maintained under common garden conditions. Temperature mediates the development of primary males and also sex change between hermaphrodites and secondary males, which makes the system ideal for the study of sex determination and sexual plasticity. Mangrove rivulus also exhibit remarkable adaptations to living in extreme environments, and the system has great promise to shed light on the evolution of terrestrial locomotion, aerial respiration, and broad tolerances to hypoxia, salinity, temperature, and environmental pollutants. Genome assembly of the mangrove rivulus allows the study of genes and gene families associated with the traits described above. Here we present a de novo assembled reference genome for the mangrove rivulus, with an approximately 900 Mb genome, including 27,328 annotated, predicted, protein-coding genes. Moreover, we are able to place more than 50% of the assembled genome onto a recently published linkage map. The genome provides an important addition to the linkage map and transcriptomic tools recently developed for this species that together provide critical resources for epigenetic, transcriptomic, and proteomic analyses. Moreover, the genome will serve as the foundation for addressing key questions in behavior, physiology, toxicology, and evolutionary biology. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Hatemi, Peter K.; Medland, Sarah E.; Klemmensen, Robert; Oskarrson, Sven; Littvay, Levente; Dawes, Chris; Verhulst, Brad; McDermott, Rose; Nørgaard, Asbjørn Sonne; Klofstad, Casey; Christensen, Kaare; Johannesson, Magnus; Magnusson, Patrik K.E.; Eaves, Lindon J.; Martin, Nicholas G.
2014-01-01
Almost forty years ago, evidence from large studies of adult twins and their relatives suggested that between 30-60% of the variance in social and political attitudes could be explained by genetic influences. However, these findings have not been widely accepted or incorporated into the dominant paradigms that explain the etiology of political ideology. This has been attributed in part to measurement and sample limitations, as well the relative absence of molecular genetic studies. Here we present results from original analyses of a combined sample of over 12,000 twins pairs, ascertained from nine different studies conducted in five democracies, sampled over the course of four decades. We provide evidence that genetic factors play a role in the formation of political ideology, regardless of how ideology is measured, the era, or the population sampled. The only exception is a question that explicitly uses the phrase “Left-Right”. We then present results from one of the first genome-wide association studies on political ideology using data from three samples: a 1990 Australian sample involving 6,894 individuals from 3,516 families; a 2008 Australian sample of 1,160 related individuals from 635 families and a 2010 Swedish sample involving 3,334 individuals from 2,607 families. No polymorphisms reached genome-wide significance in the meta-analysis. The combined evidence suggests that political ideology constitutes a fundamental aspect of one’s genetically informed psychological disposition, but as Fisher proposed long ago, genetic influences on complex traits will be composed of thousands of markers of very small effects and it will require extremely large samples to have enough power in order to identify specific polymorphisms related to complex social traits. PMID:24569950
Pritchard, Victoria L; Mäkinen, Hannu; Vähä, Juha-Pekka; Erkinaro, Jaakko; Orell, Panu; Primmer, Craig R
2018-06-01
Elucidating the genetic basis of adaptation to the local environment can improve our understanding of how the diversity of life has evolved. In this study, we used a dense SNP array to identify candidate loci potentially underlying fine-scale local adaptation within a large Atlantic salmon (Salmo salar) population. By combining outlier, gene-environment association and haplotype homozygosity analyses, we identified multiple regions of the genome with strong evidence for diversifying selection. Several of these candidate regions had previously been identified in other studies, demonstrating that the same loci could be adaptively important in Atlantic salmon at subdrainage, regional and continental scales. Notably, we identified signals consistent with local selection around genes associated with variation in sexual maturation, energy homeostasis and immune defence. These included the large-effect age-at-maturity gene vgll3, the known obesity gene mc4r, and major histocompatibility complex II. Most strikingly, we confirmed a genomic region on Ssa09 that was extremely differentiated among subpopulations and that is also a candidate for local selection over the global range of Atlantic salmon. This region colocalized with a haplotype strongly associated with spawning ecotype in sockeye salmon (Oncorhynchus nerka), with circumstantial evidence that the same gene (six6) may be the selective target in both cases. The phenotypic effect of this region in Atlantic salmon remains cryptic, although allelic variation is related to upstream catchment area and covaries with timing of the return spawning migration. Our results further inform management of Atlantic salmon and open multiple avenues for future research. © 2018 John Wiley & Sons Ltd.
Hatemi, Peter K; Medland, Sarah E; Klemmensen, Robert; Oskarsson, Sven; Littvay, Levente; Dawes, Christopher T; Verhulst, Brad; McDermott, Rose; Nørgaard, Asbjørn Sonne; Klofstad, Casey A; Christensen, Kaare; Johannesson, Magnus; Magnusson, Patrik K E; Eaves, Lindon J; Martin, Nicholas G
2014-05-01
Almost 40 years ago, evidence from large studies of adult twins and their relatives suggested that between 30 and 60% of the variance in social and political attitudes could be explained by genetic influences. However, these findings have not been widely accepted or incorporated into the dominant paradigms that explain the etiology of political ideology. This has been attributed in part to measurement and sample limitations, as well the relative absence of molecular genetic studies. Here we present results from original analyses of a combined sample of over 12,000 twins pairs, ascertained from nine different studies conducted in five democracies, sampled over the course of four decades. We provide evidence that genetic factors play a role in the formation of political ideology, regardless of how ideology is measured, the era, or the population sampled. The only exception is a question that explicitly uses the phrase "Left-Right". We then present results from one of the first genome-wide association studies on political ideology using data from three samples: a 1990 Australian sample involving 6,894 individuals from 3,516 families; a 2008 Australian sample of 1,160 related individuals from 635 families and a 2010 Swedish sample involving 3,334 individuals from 2,607 families. No polymorphisms reached genome-wide significance in the meta-analysis. The combined evidence suggests that political ideology constitutes a fundamental aspect of one's genetically informed psychological disposition, but as Fisher proposed long ago, genetic influences on complex traits will be composed of thousands of markers of very small effects and it will require extremely large samples to have enough power in order to identify specific polymorphisms related to complex social traits.
2013-01-01
Background Obesity, excess fat tissue in the body, can underlie a variety of medical complaints including heart disease, stroke and cancer. The pig is an excellent model organism for the study of various human disorders, including obesity, as well as being the foremost agricultural species. In order to identify genetic variants associated with fatness, we used a selective genomic approach sampling DNA from animals at the extreme ends of the fat and lean spectrum using estimated breeding values derived from a total population size of over 70,000 animals. DNA from 3 breeds (Sire Line Large White, Duroc and a white Pietrain composite line (Titan)) was used to interrogate the Illumina Porcine SNP60 Genotyping Beadchip in order to identify significant associations in terms of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs). Results By sampling animals at each end of the fat/lean EBV (estimate breeding value) spectrum the whole population could be assessed using less than 300 animals, without losing statistical power. Indeed, several significant SNPs (at the 5% genome wide significance level) were discovered, 4 of these linked to genes with ontologies that had previously been correlated with fatness (NTS, FABP6, SST and NR3C2). Quantitative analysis of the data identified putative CNV regions containing genes whose ontology suggested fatness related functions (MCHR1, PPARα, SLC5A1 and SLC5A4). Conclusions Selective genotyping of EBVs at either end of the phenotypic spectrum proved to be a cost effective means of identifying SNPs and CNVs associated with fatness and with estimated major effects in a large population of animals. PMID:24225222
Is It Time for Synthetic Biodiversity Conservation?
Piaggio, Antoinette J; Segelbacher, Gernot; Seddon, Philip J; Alphey, Luke; Bennett, Elizabeth L; Carlson, Robert H; Friedman, Robert M; Kanavy, Dona; Phelan, Ryan; Redford, Kent H; Rosales, Marina; Slobodian, Lydia; Wheeler, Keith
2017-02-01
Evidence indicates that, despite some critical successes, current conservation approaches are not slowing the overall rate of biodiversity loss. The field of synthetic biology, which is capable of altering natural genomes with extremely precise editing, might offer the potential to resolve some intractable conservation problems (e.g., invasive species or pathogens). However, it is our opinion that there has been insufficient engagement by the conservation community with practitioners of synthetic biology. We contend that rapid, large-scale engagement of these two communities is urgently needed to avoid unintended and deleterious ecological consequences. To this point we describe case studies where synthetic biology is currently being applied to conservation, and we highlight the benefits to conservation biologists from engaging with this emerging technology. Published by Elsevier Ltd.
Michaut, Magali; Chin, Suet-Feung; Majewski, Ian; Severson, Tesa M.; Bismeijer, Tycho; de Koning, Leanne; Peeters, Justine K.; Schouten, Philip C.; Rueda, Oscar M.; Bosma, Astrid J.; Tarrant, Finbarr; Fan, Yue; He, Beilei; Xue, Zheng; Mittempergher, Lorenza; Kluin, Roelof J.C.; Heijmans, Jeroen; Snel, Mireille; Pereira, Bernard; Schlicker, Andreas; Provenzano, Elena; Ali, Hamid Raza; Gaber, Alexander; O’Hurley, Gillian; Lehn, Sophie; Muris, Jettie J.F.; Wesseling, Jelle; Kay, Elaine; Sammut, Stephen John; Bardwell, Helen A.; Barbet, Aurélie S.; Bard, Floriane; Lecerf, Caroline; O’Connor, Darran P.; Vis, Daniël J.; Benes, Cyril H.; McDermott, Ultan; Garnett, Mathew J.; Simon, Iris M.; Jirström, Karin; Dubois, Thierry; Linn, Sabine C.; Gallagher, William M.; Wessels, Lodewyk F.A.; Caldas, Carlos; Bernards, Rene
2016-01-01
Invasive lobular carcinoma (ILC) is the second most frequently occurring histological breast cancer subtype after invasive ductal carcinoma (IDC), accounting for around 10% of all breast cancers. The molecular processes that drive the development of ILC are still largely unknown. We have performed a comprehensive genomic, transcriptomic and proteomic analysis of a large ILC patient cohort and present here an integrated molecular portrait of ILC. Mutations in CDH1 and in the PI3K pathway are the most frequent molecular alterations in ILC. We identified two main subtypes of ILCs: (i) an immune related subtype with mRNA up-regulation of PD-L1, PD-1 and CTLA-4 and greater sensitivity to DNA-damaging agents in representative cell line models; (ii) a hormone related subtype, associated with Epithelial to Mesenchymal Transition (EMT), and gain of chromosomes 1q and 8q and loss of chromosome 11q. Using the somatic mutation rate and eIF4B protein level, we identified three groups with different clinical outcomes, including a group with extremely good prognosis. We provide a comprehensive overview of the molecular alterations driving ILC and have explored links with therapy response. This molecular characterization may help to tailor treatment of ILC through the application of specific targeted, chemo- and/or immune-therapies. PMID:26729235
Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture.
Darrow, Emily M; Huntley, Miriam H; Dudchenko, Olga; Stamenova, Elena K; Durand, Neva C; Sun, Zhuo; Huang, Su-Chen; Sanborn, Adrian L; Machol, Ido; Shamim, Muhammad; Seberg, Andrew P; Lander, Eric S; Chadwick, Brian P; Aiden, Erez Lieberman
2016-08-02
During interphase, the inactive X chromosome (Xi) is largely transcriptionally silent and adopts an unusual 3D configuration known as the "Barr body." Despite the importance of X chromosome inactivation, little is known about this 3D conformation. We recently showed that in humans the Xi chromosome exhibits three structural features, two of which are not shared by other chromosomes. First, like the chromosomes of many species, Xi forms compartments. Second, Xi is partitioned into two huge intervals, called "superdomains," such that pairs of loci in the same superdomain tend to colocalize. The boundary between the superdomains lies near DXZ4, a macrosatellite repeat whose Xi allele extensively binds the protein CCCTC-binding factor. Third, Xi exhibits extremely large loops, up to 77 megabases long, called "superloops." DXZ4 lies at the anchor of several superloops. Here, we combine 3D mapping, microscopy, and genome editing to study the structure of Xi, focusing on the role of DXZ4 We show that superloops and superdomains are conserved across eutherian mammals. By analyzing ligation events involving three or more loci, we demonstrate that DXZ4 and other superloop anchors tend to colocate simultaneously. Finally, we show that deleting DXZ4 on Xi leads to the disappearance of superdomains and superloops, changes in compartmentalization patterns, and changes in the distribution of chromatin marks. Thus, DXZ4 is essential for proper Xi packaging.
2009-01-01
Olfaction is essential for the survival of animals. Versatile odour molecules in the environment are received by olfactory receptors (ORs), which form the largest multigene family in vertebrates. Identification of the entire repertories of OR genes using bioinformatics methods from the whole-genome sequences of diverse organisms revealed that the numbers of OR genes vary enormously, ranging from ~1,200 in rats and ~400 in humans to ~150 in zebrafish and ~15 in pufferfish. Most species have a considerable fraction of pseudogenes. Extensive phylogenetic analyses have suggested that the numbers of gene gains and losses are extremely large in the OR gene family, which is a striking example of the birth-and-death evolution. It appears that OR gene repertoires change dynamically, depending on each organism's living environment. For example, higher primates equipped with a well-developed vision system have lost a large number of OR genes. Moreover, two groups of OR genes for detecting airborne odorants greatly expanded after the time of terrestrial adaption in the tetrapod lineage, whereas fishes retain diverse repertoires of genes that were present in aquatic ancestral species. The origin of vertebrate OR genes can be traced back to the common ancestor of all chordate species, but insects, nematodes and echinoderms utilise distinctive families of chemoreceptors, suggesting that chemoreceptor genes have evolved many times independently in animal evolution. PMID:20038498
Demina, Tatiana A; Pietilä, Maija K; Svirskaitė, Julija; Ravantti, Janne J; Atanasova, Nina S; Bamford, Dennis H; Oksanen, Hanna M
2016-07-19
Despite their high genomic diversity, all known viruses are structurally constrained to a limited number of virion morphotypes. One morphotype of viruses infecting bacteria, archaea, and eukaryotes is the tailless icosahedral morphotype with an internal membrane. Although it is considered an abundant morphotype in extreme environments, only seven such archaeal viruses are known. Here, we introduce Haloarcula californiae icosahedral virus 1 (HCIV-1), a halophilic euryarchaeal virus originating from salt crystals. HCIV-1 also retains its infectivity under low-salinity conditions, showing that it is able to adapt to environmental changes. The release of progeny virions resulting from cell lysis was evidenced by reduced cellular oxygen consumption, leakage of intracellular ATP, and binding of an indicator ion to ruptured cell membranes. The virion contains at least 12 different protein species, lipids selectively acquired from the host cell membrane, and a 31,314-bp-long linear double-stranded DNA (dsDNA). The overall genome organization and sequence show high similarity to the genomes of archaeal viruses in the Sphaerolipoviridae family. Phylogenetic analysis based on the major conserved components needed for virion assembly-the major capsid proteins and the packaging ATPase-placed HCIV-1 along with the alphasphaerolipoviruses in a distinct, well-supported clade. On the basis of its virion morphology and sequence similarities, most notably, those of its core virion components, we propose that HCIV-1 is a member of the PRD1-adenovirus structure-based lineage together with other sphaerolipoviruses. This addition to the lineage reinforces the notion of the ancient evolutionary links observed between the viruses and further highlights the limits of the choices found in nature for formation of a virion. Under conditions of extreme salinity, the majority of the organisms present are archaea, which encounter substantial selective pressure, being constantly attacked by viruses. Regardless of the enormous viral sequence diversity, all known viruses can be clustered into a few structure-based viral lineages based on their core virion components. Our description of a new halophilic virus-host system adds significant insights into the largely unstudied field of archaeal viruses and, in general, of life under extreme conditions. Comprehensive molecular characterization of HCIV-1 shows that this icosahedral internal membrane-containing virus exhibits conserved elements responsible for virion organization. This places the virus neatly in the PRD1-adenovirus structure-based lineage. HCIV-1 further highlights the limited diversity of virus morphotypes despite the astronomical number of viruses in the biosphere. The observed high conservation in the core virion elements should be considered in addressing such fundamental issues as the origin and evolution of viruses and their interplay with their hosts. Copyright © 2016 Demina et al.
Microbial Lifestyle and Genome Signatures
Dutta, Chitra; Paul, Sandip
2012-01-01
Microbes are known for their unique ability to adapt to varying lifestyle and environment, even to the extreme or adverse ones. The genomic architecture of a microbe may bear the signatures not only of its phylogenetic position, but also of the kind of lifestyle to which it is adapted. The present review aims to provide an account of the specific genome signatures observed in microbes acclimatized to distinct lifestyles or ecological niches. Niche-specific signatures identified at different levels of microbial genome organization like base composition, GC-skew, purine-pyrimidine ratio, dinucleotide abundance, codon bias, oligonucleotide composition etc. have been discussed. Among the specific cases highlighted in the review are the phenomena of genome shrinkage in obligatory host-restricted microbes, genome expansion in strictly intra-amoebal pathogens, strand-specific codon usage in intracellular species, acquisition of genome islands in pathogenic or symbiotic organisms, discriminatory genomic traits of marine microbes with distinct trophic strategies, and conspicuous sequence features of certain extremophiles like those adapted to high temperature or high salinity. PMID:23024607
Sinzelle, Ludivine; Chesneau, Albert; Bigot, Yves; Mazabraud, André; Pollet, Nicolas
2006-01-01
Mariner-like elements (MLEs) belong to the Tc1-mariner superfamily of DNA transposons, which is very widespread in animal genomes. We report here the first complete description of a MLE, Xtmar1, within the genome of a poikilotherm vertebrate, the amphibian Xenopus tropicalis. A close relative, XlMLE, is also characterized within the genome of a sibling species, Xenopus laevis. The phylogenetic analysis of the relationships between MLE transposases reveals that Xtmar1 is closely related to Hsmar2 and Bytmar1 and that together they form a second distinct lineage of the irritans subfamily. All members of this lineage are also characterized by the 36- to 43-bp size of their imperfectly conserved inverted terminal repeats and by the -8-bp motif located at their outer extremity. Since XlMLE, Xlmar1, and Hsmar2 are present in species located at both extremities of the vertebrate evolutionary tree, we looked for MLE relatives belonging to the same subfamily in the available sequencing projects using the amino acid consensus sequence of the Hsmar2 transposase as an in silico probe. We found that irritans MLEs are present in chordate genomes including most craniates. This therefore suggests that these elements have been present within chordate genomes for 750 Myr and that the main way they have been maintained in these species has been via vertical transmission. The very small number of stochastic losses observed in the data available suggests that their inactivation during evolution has been very slow.
Genomic insights into the evolutionary origin of Myxozoa within Cnidaria
Chang, E. Sally; Neuhof, Moran; Rubinstein, Nimrod D.; Diamant, Arik; Philippe, Hervé; Huchon, Dorothée; Cartwright, Paulyn
2015-01-01
The Myxozoa comprise over 2,000 species of microscopic obligate parasites that use both invertebrate and vertebrate hosts as part of their life cycle. Although the evolutionary origin of myxozoans has been elusive, a close relationship with cnidarians, a group that includes corals, sea anemones, jellyfish, and hydroids, is supported by some phylogenetic studies and the observation that the distinctive myxozoan structure, the polar capsule, is remarkably similar to the stinging structures (nematocysts) in cnidarians. To gain insight into the extreme evolutionary transition from a free-living cnidarian to a microscopic endoparasite, we analyzed genomic and transcriptomic assemblies from two distantly related myxozoan species, Kudoa iwatai and Myxobolus cerebralis, and compared these to the transcriptome and genome of the less reduced cnidarian parasite, Polypodium hydriforme. A phylogenomic analysis, using for the first time to our knowledge, a taxonomic sampling that represents the breadth of myxozoan diversity, including four newly generated myxozoan assemblies, confirms that myxozoans are cnidarians and are a sister taxon to P. hydriforme. Estimations of genome size reveal that myxozoans have one of the smallest reported animal genomes. Gene enrichment analyses show depletion of expressed genes in categories related to development, cell differentiation, and cell–cell communication. In addition, a search for candidate genes indicates that myxozoans lack key elements of signaling pathways and transcriptional factors important for multicellular development. Our results suggest that the degeneration of the myxozoan body plan from a free-living cnidarian to a microscopic parasitic cnidarian was accompanied by extreme reduction in genome size and gene content. PMID:26627241
Genomic insights into the evolutionary origin of Myxozoa within Cnidaria.
Chang, E Sally; Neuhof, Moran; Rubinstein, Nimrod D; Diamant, Arik; Philippe, Hervé; Huchon, Dorothée; Cartwright, Paulyn
2015-12-01
The Myxozoa comprise over 2,000 species of microscopic obligate parasites that use both invertebrate and vertebrate hosts as part of their life cycle. Although the evolutionary origin of myxozoans has been elusive, a close relationship with cnidarians, a group that includes corals, sea anemones, jellyfish, and hydroids, is supported by some phylogenetic studies and the observation that the distinctive myxozoan structure, the polar capsule, is remarkably similar to the stinging structures (nematocysts) in cnidarians. To gain insight into the extreme evolutionary transition from a free-living cnidarian to a microscopic endoparasite, we analyzed genomic and transcriptomic assemblies from two distantly related myxozoan species, Kudoa iwatai and Myxobolus cerebralis, and compared these to the transcriptome and genome of the less reduced cnidarian parasite, Polypodium hydriforme. A phylogenomic analysis, using for the first time to our knowledge, a taxonomic sampling that represents the breadth of myxozoan diversity, including four newly generated myxozoan assemblies, confirms that myxozoans are cnidarians and are a sister taxon to P. hydriforme. Estimations of genome size reveal that myxozoans have one of the smallest reported animal genomes. Gene enrichment analyses show depletion of expressed genes in categories related to development, cell differentiation, and cell-cell communication. In addition, a search for candidate genes indicates that myxozoans lack key elements of signaling pathways and transcriptional factors important for multicellular development. Our results suggest that the degeneration of the myxozoan body plan from a free-living cnidarian to a microscopic parasitic cnidarian was accompanied by extreme reduction in genome size and gene content.
Extreme variability among mammalian V1R gene families.
Young, Janet M; Massa, Hillary F; Hsu, Li; Trask, Barbara J
2010-01-01
We report an evolutionary analysis of the V1R gene family across 37 mammalian genomes. V1Rs comprise one of three chemosensory receptor families expressed in the vomeronasal organ, and contribute to pheromone detection. We first demonstrate that Trace Archive data can be used effectively to determine V1R family sizes and to obtain sequences of most V1R family members. Analyses of V1R sequences from trace data and genome assemblies show that species-specific expansions previously observed in only eight species were prevalent throughout mammalian evolution, resulting in "semi-private" V1R repertoires for most mammals. The largest families are found in mouse and platypus, whose V1R repertoires have been published previously, followed by mouse lemur and rabbit (approximately 215 and approximately 160 intact V1Rs, respectively). In contrast, two bat species and dolphin possess no functional V1Rs, only pseudogenes, and suffered inactivating mutations in the vomeronasal signal transduction gene Trpc2. We show that primate V1R decline happened prior to acquisition of trichromatic vision, earlier during evolution than was previously thought. We also show that it is extremely unlikely that decline of the dog V1R repertoire occurred in response to selective pressures imposed by humans during domestication. Functional repertoire sizes in each species correlate roughly with anatomical observations of vomeronasal organ size and quality; however, no single ecological correlate explains the very diverse fates of this gene family in different mammalian genomes. V1Rs provide one of the most extreme examples observed to date of massive gene duplication in some genomes, with loss of all functional genes in other species.
Busarakam, Kanungnid; Bull, Alan T; Trujillo, Martha E; Riesco, Raul; Sangal, Vartul; van Wezel, Gilles P; Goodfellow, Michael
2016-06-01
A polyphasic study was designed to determine the taxonomic provenance of three Modestobacter strains isolated from an extreme hyper-arid Atacama Desert soil. The strains, isolates KNN 45-1a, KNN 45-2b(T) and KNN 45-3b, were shown to have chemotaxonomic and morphological properties in line with their classification in the genus Modestobacter. The isolates had identical 16S rRNA gene sequences and formed a branch in the Modestobacter gene tree that was most closely related to the type strain of Modestobacter marinus (99.6% similarity). All three isolates were distinguished readily from Modestobacter type strains by a broad range of phenotypic properties, by qualitative and quantitative differences in fatty acid profiles and by BOX fingerprint patterns. The whole genome sequence of isolate KNN 45-2b(T) showed 89.3% average nucleotide identity, 90.1% (SD: 10.97%) average amino acid identity and a digital DNA-DNA hybridization value of 42.4±3.1 against the genome sequence of M. marinus DSM 45201(T), values consistent with its assignment to a separate species. On the basis of all of these data, it is proposed that the isolates be assigned to the genus Modestobacter as Modestobacter caceresii sp. nov. with isolate KNN 45-2b(T) (CECT 9023(T)=DSM 101691(T)) as the type strain. Analysis of the whole-genome sequence of M. caceresii KNN 45-2b(T), with 4683 open reading frames and a genome size of ∽4.96Mb, revealed the presence of genes and gene-clusters that encode for properties relevant to its adaptability to harsh environmental conditions prevalent in extreme hyper arid Atacama Desert soils. Copyright © 2016. Published by Elsevier GmbH.
Genome expansion and gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism.
Spanu, Pietro D; Abbott, James C; Amselem, Joelle; Burgis, Timothy A; Soanes, Darren M; Stüber, Kurt; Ver Loren van Themaat, Emiel; Brown, James K M; Butcher, Sarah A; Gurr, Sarah J; Lebrun, Marc-Henri; Ridout, Christopher J; Schulze-Lefert, Paul; Talbot, Nicholas J; Ahmadinejad, Nahal; Ametz, Christian; Barton, Geraint R; Benjdia, Mariam; Bidzinski, Przemyslaw; Bindschedler, Laurence V; Both, Maike; Brewer, Marin T; Cadle-Davidson, Lance; Cadle-Davidson, Molly M; Collemare, Jerome; Cramer, Rainer; Frenkel, Omer; Godfrey, Dale; Harriman, James; Hoede, Claire; King, Brian C; Klages, Sven; Kleemann, Jochen; Knoll, Daniela; Koti, Prasanna S; Kreplak, Jonathan; López-Ruiz, Francisco J; Lu, Xunli; Maekawa, Takaki; Mahanil, Siraprapa; Micali, Cristina; Milgroom, Michael G; Montana, Giovanni; Noir, Sandra; O'Connell, Richard J; Oberhaensli, Simone; Parlange, Francis; Pedersen, Carsten; Quesneville, Hadi; Reinhardt, Richard; Rott, Matthias; Sacristán, Soledad; Schmidt, Sarah M; Schön, Moritz; Skamnioti, Pari; Sommer, Hans; Stephens, Amber; Takahara, Hiroyuki; Thordal-Christensen, Hans; Vigouroux, Marielle; Wessling, Ralf; Wicker, Thomas; Panstruga, Ralph
2010-12-10
Powdery mildews are phytopathogens whose growth and reproduction are entirely dependent on living plant cells. The molecular basis of this life-style, obligate biotrophy, remains unknown. We present the genome analysis of barley powdery mildew, Blumeria graminis f.sp. hordei (Blumeria), as well as a comparison with the analysis of two powdery mildews pathogenic on dicotyledonous plants. These genomes display massive retrotransposon proliferation, genome-size expansion, and gene losses. The missing genes encode enzymes of primary and secondary metabolism, carbohydrate-active enzymes, and transporters, probably reflecting their redundancy in an exclusively biotrophic life-style. Among the 248 candidate effectors of pathogenesis identified in the Blumeria genome, very few (less than 10) define a core set conserved in all three mildews, suggesting that most effectors represent species-specific adaptations.
Mannini, Linda; Menga, Stefania; Musio, Antonio
2010-06-01
Cohesin is responsible for sister chromatid cohesion, ensuring the correct chromosome segregation. Beyond this role, cohesin and regulatory cohesin genes seem to play a role in preserving genome stability and gene transcription regulation. DNA damage is thought to be a major culprit for many human diseases, including cancer. Our present knowledge of the molecular basis underlying genome instability is extremely limited. Mutations in cohesin genes cause human diseases such as Cornelia de Lange syndrome and Roberts syndrome/SC phocomelia, and all the cell lines derived from affected patients show genome instability. Cohesin mutations have also been identified in colorectal cancer. Here, we will discuss the human disorders caused by alterations of cohesin function, with emphasis on the emerging role of cohesin as a genome stability caretaker.
USDA-ARS?s Scientific Manuscript database
The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...
Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T
2004-01-01
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
[Recent advances in the field of oral bacteriology].
Shoji, Mikio; Takeshita, Toru; Maruyama, Fumito; Inaba, Hiroaki; Imai, Kenichi; Kawada-Matsuo, Miki
2015-01-01
The oral cavity is inhabited by more than 600 bacterial species; these species compete for nutrients or coexist in order to survive along with the indigenous population. Extreme conditions are prevalent in the oral cavity, and these conditions are influenced by our immunity and variations in nutrition, temperature, and pH. Pathogens that cause dental caries or periodontal disease can survive in these extreme environments; these pathogens are virulent and can cause several diseases. Therefore, research on oral bacteriology is warranted to analyze the virulence factors of these bacteria as well as to ascertain environmental stress responses, interactions between bacteria and human immunity, comparisons of bacterial genomes, and oral microflora. In this review, we provide new data in the fields of bacteriology, immunology, and genomics and describe recent advances in the field of oral bacteriology.
Imputation of unordered markers and the impact on genomic selection accuracy
USDA-ARS?s Scientific Manuscript database
Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large propo...
The cost of large numbers of hypothesis tests on power, effect size and sample size.
Lazzeroni, L C; Ray, A
2012-01-01
Advances in high-throughput biology and computer science are driving an exponential increase in the number of hypothesis tests in genomics and other scientific disciplines. Studies using current genotyping platforms frequently include a million or more tests. In addition to the monetary cost, this increase imposes a statistical cost owing to the multiple testing corrections needed to avoid large numbers of false-positive results. To safeguard against the resulting loss of power, some have suggested sample sizes on the order of tens of thousands that can be impractical for many diseases or may lower the quality of phenotypic measurements. This study examines the relationship between the number of tests on the one hand and power, detectable effect size or required sample size on the other. We show that once the number of tests is large, power can be maintained at a constant level, with comparatively small increases in the effect size or sample size. For example at the 0.05 significance level, a 13% increase in sample size is needed to maintain 80% power for ten million tests compared with one million tests, whereas a 70% increase in sample size is needed for 10 tests compared with a single test. Relative costs are less when measured by increases in the detectable effect size. We provide an interactive Excel calculator to compute power, effect size or sample size when comparing study designs or genome platforms involving different numbers of hypothesis tests. The results are reassuring in an era of extreme multiple testing.
Ordoñez, Omar F; Lanzarotti, Esteban; Kurth, Daniel; Gorriti, Marta F; Revale, Santiago; Cortez, Néstor; Vazquez, Martin P; Farías, María E; Turjanski, Adrian G
2013-07-25
Exiguobacterium sp. strain S17 is a moderately halotolerant, arsenic-resistant bacterium that was isolated from Laguna Socompa stromatolites in the Argentinian Puna. The draft genome sequence suggests potent enzyme candidates that are essential for survival under multiple environmental extreme conditions, such as high levels of UV radiation, elevated salinity, and the presence of critical arsenic concentrations.
Rascovan, Nicolás; Castro, Camila; Revale, Santiago; Giaveno, M. Alejandra; Vazquez, Martín; Donati, Edgardo R.
2014-01-01
Acidianus copahuensis is a recently characterized thermoacidophilic archaeon isolated from the Copahue volcanic area in Argentina. Here, we present its draft genome sequence, in which we found genes involved in key metabolic pathways for developing under Copahue’s extreme environmental conditions, such as sulfur and iron oxidation, carbon fixation, and metal tolerance. PMID:24812211
Wolf, Yuri I; Makarova, Kira S; Yutin, Natalya; Koonin, Eugene V
2012-12-14
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea. The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major 'highways' of horizontal gene transfer. The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time. This article was reviewed by (for complete reviews see the Reviewers' Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
Li, Xiu-Qing
2014-01-01
The precursor messenger RNA (pre-mRNA) three-prime cleaved-off region (3′COR) and the mRNA three-prime untranslated region (3′UTR) play critical roles in regulating gene expression. The differences in base composition between these regions and the corresponding genomes are still largely uncharacterized in animals and plants. In this study, the base compositions of non-redundant 3′CORs and 3′UTRs were compared with the corresponding whole genomes of eleven animals, four dicotyledonous plants, and three monocotyledonous (cereal) plants. Among the four bases (A, C, G, and U for adenine, cytosine, guanine, and uracil, respectively), U (which corresponds to T, for thymine, in DNA) was the most frequent, A the second most frequent, G the third most frequent, and C the least frequent in most of the species in both the 3′COR and 3′UTR regions. In comparison with the whole genomes, in both regions the U content was usually the most overrepresented (particularly in the monocotyledonous plants), and the C content was the most underrepresented. The order obtained for the species groups, when ranked from high to low according to the U contents in the 3′COR and 3′UTR was as follows: dicotyledonous plants, monocotyledonous plants, non-mammal animals, and mammals. In contrast, the genomic T content was highest in dicotyledonous plants, lowest in monocotyledonous plants, and intermediate in animals. These results suggest the following: 1) there is a mechanism operating in both animals and plants which is biased toward U and against C in the 3′COR and 3′UTR; 2) the 3′UTR and 3′COR, as functional units, minimized the difference between dicotyledonous and monocotyledonous plants, while the dicotyledonous and monocotyledonous genomes evolved into two extreme groups in terms of base composition. PMID:24941005
Avvaru, Akshay Kumar; Sowpati, Divya Tej; Mishra, Rakesh Kumar
2018-03-15
Microsatellites or Simple Sequence Repeats (SSRs) are short tandem repeats of DNA motifs present in all genomes. They have long been used for a variety of purposes in the areas of population genetics, genotyping, marker-assisted selection and forensics. Numerous studies have highlighted their functional roles in genome organization and gene regulation. Though several tools are currently available to identify SSRs from genomic sequences, they have significant limitations. We present a novel algorithm called PERF for extremely fast and comprehensive identification of microsatellites from DNA sequences of any size. PERF is several fold faster than existing algorithms and uses up to 5-fold lesser memory. It provides a clean and flexible command-line interface to change the default settings, and produces output in an easily-parseable tab-separated format. In addition, PERF generates an interactive and stand-alone HTML report with charts and tables for easy downstream analysis. PERF is implemented in the Python programming language. It is freely available on PyPI under the package name perf_ssr, and can be installed directly using pip or easy_install. The documentation of PERF is available at https://github.com/rkmlab/perf. The source code of PERF is deposited in GitHub at https://github.com/rkmlab/perf under an MIT license. tej@ccmb.res.in. Supplementary data are available at Bioinformatics online.
Acclimation of microorganisms to harsh soil crust conditions: Experimental and genomic approaches
NASA Astrophysics Data System (ADS)
Raanan, Hagai; Kaplan, Aaron
2015-04-01
Biological soil crusts (BSC) are formed by the adhesion of sand particles to cyanobacterial exo- polysaccharides and play an important role in stabilizing sandy desert. Its destruction promotes desertification. These organisms cope with extreme temperatures, excess light and frequent hydration/dehydration cycles; the mechanisms involved are largely unknown. With the genome of newly sequenced Leptolyngbya, isolated from Nizzana BSC, we conduct comparative genomics of three desiccation tolerant cyanobacteria. This yield 46 unique genes, some of them similar to genes involve in sporulation of the gram positive bacteria Bacillus. In order to understand the molecular mechanisms taking place during desiccation we built an environmental chamber capable of simulating dynamic changes of environmental conditions in the crust. This chamber allows us to perform repetitive and accurate desiccation/rehydration experiments and follow cyanobacterial physiological and molecular response to such environmental changes. When we compared fast desiccation (less than 5 min) of isolated cyanobacteria to simulation of natural desiccation, we observed a 60% lower fluorescence recovery rate. The extent of damage from desiccation depended on the stress conditions during the dry period. These results suggest that cyanobacteria activated protection mechanisms in response to desiccation stress but which were not activated in 5 min desiccation tests. Gene expression patterns during desiccation are being analyzed in order to provide a better understanding of desiccation stress protection mechanisms.
NASA Astrophysics Data System (ADS)
Krupovic, Mart; Koonin, Eugene V.
2014-06-01
Single-stranded (ss)DNA viruses are extremely widespread, infect diverse hosts from all three domains of life and include important pathogens. Most ssDNA viruses possess small genomes that replicate by the rolling-circle-like mechanism initiated by a distinct virus-encoded endonuclease. However, viruses of the family Bidnaviridae, instead of the endonuclease, encode a protein-primed type B DNA polymerase (PolB) and hence break this pattern. We investigated the provenance of all bidnavirus genes and uncover an unexpected turbulent evolutionary history of these unique viruses. Our analysis strongly suggests that bidnaviruses evolved from a parvovirus ancestor from which they inherit a jelly-roll capsid protein and a superfamily 3 helicase. The radiation of bidnaviruses from parvoviruses was probably triggered by integration of the ancestral parvovirus genome into a large virus-derived DNA transposon of the Polinton (polintovirus) family resulting in the acquisition of the polintovirus PolB gene along with terminal inverted repeats. Bidnavirus genes for a receptor-binding protein and a potential novel antiviral defense modulator are derived from dsRNA viruses (Reoviridae) and dsDNA viruses (Baculoviridae), respectively. The unusual evolutionary history of bidnaviruses emphasizes the key role of horizontal gene transfer, sometimes between viruses with completely different genomes but occupying the same niche, in the emergence of new viral types.
Rossetto, Maurizio; Kooyman, Robert; Yap, Jia-Yee S.; Laffan, Shawn W.
2015-01-01
Seed dispersal is a key process in plant spatial dynamics. However, consistently applicable generalizations about dispersal across scales are mostly absent because of the constraints on measuring propagule dispersal distances for many species. Here, we focus on fleshy-fruited taxa, specifically taxa with large fleshy fruits and their dispersers across an entire continental rainforest biome. We compare species-level results of whole-chloroplast DNA analyses in sister taxa with large and small fruits, to regional plot-based samples (310 plots), and whole-continent patterns for the distribution of woody species with either large (more than 30 mm) or smaller fleshy fruits (1093 taxa). The pairwise genomic comparison found higher genetic distances between populations and between regions in the large-fruited species (Endiandra globosa), but higher overall diversity within the small-fruited species (Endiandra discolor). Floristic comparisons among plots confirmed lower numbers of large-fruited species in areas where more extreme rainforest contraction occurred, and re-colonization by small-fruited species readily dispersed by the available fauna. Species' distribution patterns showed that larger-fruited species had smaller geographical ranges than smaller-fruited species and locations with stable refugia (and high endemism) aligned with concentrations of large fleshy-fruited taxa, making them a potentially valuable conservation-planning indicator. PMID:26645199
Rossetto, Maurizio; Kooyman, Robert; Yap, Jia-Yee S; Laffan, Shawn W
2015-12-07
Seed dispersal is a key process in plant spatial dynamics. However, consistently applicable generalizations about dispersal across scales are mostly absent because of the constraints on measuring propagule dispersal distances for many species. Here, we focus on fleshy-fruited taxa, specifically taxa with large fleshy fruits and their dispersers across an entire continental rainforest biome. We compare species-level results of whole-chloroplast DNA analyses in sister taxa with large and small fruits, to regional plot-based samples (310 plots), and whole-continent patterns for the distribution of woody species with either large (more than 30 mm) or smaller fleshy fruits (1093 taxa). The pairwise genomic comparison found higher genetic distances between populations and between regions in the large-fruited species (Endiandra globosa), but higher overall diversity within the small-fruited species (Endiandra discolor). Floristic comparisons among plots confirmed lower numbers of large-fruited species in areas where more extreme rainforest contraction occurred, and re-colonization by small-fruited species readily dispersed by the available fauna. Species' distribution patterns showed that larger-fruited species had smaller geographical ranges than smaller-fruited species and locations with stable refugia (and high endemism) aligned with concentrations of large fleshy-fruited taxa, making them a potentially valuable conservation-planning indicator. © 2015 The Author(s).
Olfactory Receptor Multigene Family in Vertebrates: From the Viewpoint of Evolutionary Genomics
Niimura, Yoshihito
2012-01-01
Olfaction is essential for the survival of animals. Diverse odor molecules in the environment are detected by the olfactory receptors (ORs) in the olfactory epithelium of the nasal cavity. There are ~400 and ~1,000 OR genes in the human and mouse genomes, respectively, forming the largest multigene family in mammals. The relationships between ORs and odorants are multiple-to-multiple, which allows for discriminating almost unlimited number of different odorants by a combination of ORs. However, the OR-ligand relationships are still largely unknown, and predicting the quality of odor from its molecular structure is unsuccessful. Extensive bioinformatic analyses using the whole genomes of various organisms revealed a great variation in number of OR genes among species, reflecting the diversity of their living environments. For example, higher primates equipped with a well-developed vision system and dolphins that are secondarily adapted to the aquatic life have considerably smaller numbers of OR genes than most of other mammals do. OR genes are characterized by extremely frequent gene duplications and losses. The OR gene repertories are also diverse among human individuals, explaining the diversity of odor perception such as the specific anosmia. OR genes are present in all vertebrates. The number of OR genes is smaller in teleost fishes than in mammals, while the diversity is higher in the former than the latter. Because the genome of amphioxus, the most basal chordate species, harbors vertebrate-like OR genes, the origin of OR genes can be traced back to the common ancestor of the phylum Chordata. PMID:23024602
Sun, Yu; Tamarit, Daniel
2017-01-01
Abstract The major codon preference model suggests that codons read by tRNAs in high concentrations are preferentially utilized in highly expressed genes. However, the identity of the optimal codons differs between species although the forces driving such changes are poorly understood. We suggest that these questions can be tackled by placing codon usage studies in a phylogenetic framework and that bacterial genomes with extreme nucleotide composition biases provide informative model systems. Switches in the background substitution biases from GC to AT have occurred in Gardnerella vaginalis (GC = 32%), and from AT to GC in Lactobacillus delbrueckii (GC = 62%) and Lactobacillus fermentum (GC = 63%). We show that despite the large effects on codon usage patterns by these switches, all three species evolve under selection on synonymous sites. In G. vaginalis, the dramatic codon frequency changes coincide with shifts of optimal codons. In contrast, the optimal codons have not shifted in the two Lactobacillus genomes despite an increased fraction of GC-ending codons. We suggest that all three species are in different phases of an on-going shift of optimal codons, and attribute the difference to a stronger background substitution bias and/or longer time since the switch in G. vaginalis. We show that comparative and correlative methods for optimal codon identification yield conflicting results for genomes in flux and discuss possible reasons for the mispredictions. We conclude that switches in the direction of the background substitution biases can drive major shifts in codon preference patterns even under sustained selection on synonymous codon sites. PMID:27540085
Coate, Jeremy E; Doyle, Jeff J
2010-01-01
Evolutionary biologists are increasingly comparing gene expression patterns across species. Due to the way in which expression assays are normalized, such studies provide no direct information about expression per gene copy (dosage responses) or per cell and can give a misleading picture of genes that are differentially expressed. We describe an assay for estimating relative expression per cell. When used in conjunction with transcript profiling data, it is possible to compare the sizes of whole transcriptomes, which in turn makes it possible to compare expression per cell for each gene in the transcript profiling data set. We applied this approach, using quantitative reverse transcriptase-polymerase chain reaction and high throughput RNA sequencing, to a recently formed allopolyploid and showed that its leaf transcriptome was approximately 1.4-fold larger than either progenitor transcriptome (70% of the sum of the progenitor transcriptomes). In contrast, the allopolyploid genome is 94.3% as large as the sum of its progenitor genomes and retains > or =93.5% of the sum of its progenitor gene complements. Thus, "transcriptome downsizing" is greater than genome downsizing. Using this transcriptome size estimate, we inferred dosage responses for several thousand genes and showed that the majority exhibit partial dosage compensation. Homoeologue silencing is nonrandomly distributed across dosage responses, with genes showing extreme responses in either direction significantly more likely to have a silent homoeologue. This experimental approach will add value to transcript profiling experiments involving interspecies and interploidy comparisons by converting expression per transcriptome to expression per genome, eliminating the need for assumptions about transcriptome size.
Pan-vertebrate comparative genomics unmasks retrovirus macroevolution.
Hayward, Alexander; Cornwallis, Charlie K; Jern, Patric
2015-01-13
Although extensive research has demonstrated host-retrovirus microevolutionary dynamics, it has been difficult to gain a deeper understanding of the macroevolutionary patterns of host-retrovirus interactions. Here we use recent technological advances to infer broad patterns in retroviral diversity, evolution, and host-virus relationships by using a large-scale phylogenomic approach using endogenous retroviruses (ERVs). Retroviruses insert a proviral DNA copy into the host cell genome to produce new viruses. ERVs are provirus insertions in germline cells that are inherited down the host lineage and consequently present a record of past host-viral associations. By mining ERVs from 65 host genomes sampled across vertebrate diversity, we uncover a great diversity of ERVs, indicating that retroviral sequences are much more prevalent and widespread across vertebrates than previously appreciated. The majority of ERV clades that we recover do not contain known retroviruses, implying either that retroviral lineages are highly transient over evolutionary time or that a considerable number of retroviruses remain to be identified. By characterizing the distribution of ERVs, we show that no major vertebrate lineage has escaped retroviral activity and that retroviruses are extreme host generalists, having an unprecedented ability for rampant host switching among distantly related vertebrates. In addition, we examine whether the distribution of ERVs can be explained by host factors predicted to influence viral transmission and find that internal fertilization has a pronounced effect on retroviral colonization of host genomes. By capturing the mode and pattern of retroviral evolution and contrasting ERV diversity with known retroviral diversity, our study provides a cohesive framework to understand host-virus coevolution better.
Pan-vertebrate comparative genomics unmasks retrovirus macroevolution
Hayward, Alexander; Cornwallis, Charlie K.; Jern, Patric
2015-01-01
Although extensive research has demonstrated host-retrovirus microevolutionary dynamics, it has been difficult to gain a deeper understanding of the macroevolutionary patterns of host–retrovirus interactions. Here we use recent technological advances to infer broad patterns in retroviral diversity, evolution, and host–virus relationships by using a large-scale phylogenomic approach using endogenous retroviruses (ERVs). Retroviruses insert a proviral DNA copy into the host cell genome to produce new viruses. ERVs are provirus insertions in germline cells that are inherited down the host lineage and consequently present a record of past host–viral associations. By mining ERVs from 65 host genomes sampled across vertebrate diversity, we uncover a great diversity of ERVs, indicating that retroviral sequences are much more prevalent and widespread across vertebrates than previously appreciated. The majority of ERV clades that we recover do not contain known retroviruses, implying either that retroviral lineages are highly transient over evolutionary time or that a considerable number of retroviruses remain to be identified. By characterizing the distribution of ERVs, we show that no major vertebrate lineage has escaped retroviral activity and that retroviruses are extreme host generalists, having an unprecedented ability for rampant host switching among distantly related vertebrates. In addition, we examine whether the distribution of ERVs can be explained by host factors predicted to influence viral transmission and find that internal fertilization has a pronounced effect on retroviral colonization of host genomes. By capturing the mode and pattern of retroviral evolution and contrasting ERV diversity with known retroviral diversity, our study provides a cohesive framework to understand host–virus coevolution better. PMID:25535393
Schwartz, John C; Gibson, Mark S; Heimeier, Dorothea; Koren, Sergey; Phillippy, Adam M; Bickhart, Derek M; Smith, Timothy P L; Medrano, Juan F; Hammond, John A
2017-04-01
Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination.
Genetic, genomic, and molecular tools for studying the protoploid yeast, L. waltii.
Di Rienzi, Sara C; Lindstrom, Kimberly C; Lancaster, Ragina; Rolczynski, Lisa; Raghuraman, M K; Brewer, Bonita J
2011-02-01
Sequencing of the yeast Kluyveromyces waltii (recently renamed Lachancea waltii) provided evidence of a whole genome duplication event in the lineage leading to the well-studied Saccharomyces cerevisiae. While comparative genomic analyses of these yeasts have proven to be extremely instructive in modeling the loss or maintenance of gene duplicates, experimental tests of the ramifications following such genome alterations remain difficult. To transform L. waltii from an organism of the computational comparative genomic literature into an organism of the functional comparative genomic literature, we have developed genetic, molecular and genomic tools for working with L. waltii. In particular, we have characterized basic properties of L. waltii (growth, ploidy, molecular karyotype, mating type and the sexual cycle), developed transformation, cell cycle arrest and synchronization protocols, and have created centromeric and non-centromeric vectors as well as a genome browser for L. waltii. We hope that these tools will be used by the community to follow up on the ideas generated by sequence data and lead to a greater understanding of eukaryotic biology and genome evolution. 2010 John Wiley & Sons, Ltd.
Genetic, genomic, and molecular tools for studying the protoploid yeast, L. waltii
Di Rienzi, Sara C.; Lindstrom, Kimberly C.; Lancaster, Ragina; Rolczynski, Lisa; Raghuraman, M. K.; Brewer, Bonita J.
2011-01-01
Sequencing of the yeast Kluyveromyces waltii (recently renamed Lachancea waltii) provided evidence of a whole genome duplication event in the lineage leading to the well-studied Saccharomyces cerevisiae. While comparative genomic analyses of these yeasts have proven to be extremely instructive in modeling the loss or maintenance of gene duplicates, experimental tests of the ramifications following such genome alterations remain difficult. To transform L. waltii from an organism of the computational comparative genomic literature into an organism of the functional comparative genomic literature, we have developed genetic, molecular and genomic tools for working with L. waltii. In particular, we have characterized basic properties of L. waltii (growth, ploidy, molecular karyotype, mating type and the sexual cycle), developed transformation, cell cycle arrest and synchronization protocols, and have created centromeric and non-centromeric vectors as well as a genome browser for L. waltii. We hope that these tools will be used by the community to follow up on the ideas generated by sequence data and lead to a greater understanding of eukaryotic biology and genome evolution. PMID:21246627
Inverse Symmetry in Complete Genomes and Whole-Genome Inverse Duplication
Kong, Sing-Guan; Fan, Wen-Lang; Chen, Hong-Da; Hsu, Zi-Ting; Zhou, Nengji; Zheng, Bo; Lee, Hoong-Chien
2009-01-01
The cause of symmetry is usually subtle, and its study often leads to a deeper understanding of the bearer of the symmetry. To gain insight into the dynamics driving the growth and evolution of genomes, we conducted a comprehensive study of textual symmetries in 786 complete chromosomes. We focused on symmetry based on our belief that, in spite of their extreme diversity, genomes must share common dynamical principles and mechanisms that drive their growth and evolution, and that the most robust footprints of such dynamics are symmetry related. We found that while complement and reverse symmetries are essentially absent in genomic sequences, inverse–complement plus reverse–symmetry is prevalent in complex patterns in most chromosomes, a vast majority of which have near maximum global inverse symmetry. We also discovered relations that can quantitatively account for the long observed but unexplained phenomenon of -mer skews in genomes. Our results suggest segmental and whole-genome inverse duplications are important mechanisms in genome growth and evolution, probably because they are efficient means by which the genome can exploit its double-stranded structure to enrich its code-inventory. PMID:19898631
Divergence and Mosaicism among Virulent Soil Phages of the Burkholderia cepacia Complex‡
Summer, Elizabeth J.; Gonzalez, Carlos F.; Bomer, Morgan; Carlile, Thomas; Embry, Addie; Kucherka, Amalie M.; Lee, Jonte; Mebane, Leslie; Morrison, William C.; Mark, Louise; King, Maria D.; LiPuma, John J.; Vidaver, Anne K.; Young, Ry
2006-01-01
We have determined the genomic sequences of four virulent myophages, Bcep1, Bcep43, BcepB1A, and Bcep781, whose hosts are soil isolates of the Burkholderia cepacia complex. Despite temporal and spatial separations between initial isolations, three of the phages (Bcep1, Bcep43, and Bcep781, designated the Bcep781 group) exhibit 87% to 99% sequence identity to one another and most coding region differences are due to synonymous nucleotide substitutions, a hallmark of neutral genetic drift. Phage BcepB1A has a very different genome organization but is clearly a mosaic with respect to many of the genes of the Bcep781 group, as is a defective prophage element in Photorhabdus luminescens. Functions were assigned to 27 out of 71 predicted genes of Bcep1 despite extreme sequence divergence. Using a lambda repressor fusion technique, 10 Bcep781-encoded proteins were identified for their ability to support homotypic interactions. While head and tail morphogenesis genes have retained canonical gene order despite extreme sequence divergence, genes involved in DNA metabolism and host lysis are not organized as in other phages. This unusual genome arrangement may contribute to the ability of the Bcep781-like phages to maintain a unified genomic type. However, the Bcep781 group phages can also engage in lateral gene transfer events with otherwise unrelated phages, a process that contributes to the broader-scale genomic mosaicism prevalent among the tailed phages. PMID:16352842
Genome expansion via lineage splitting and genome reduction in the cicada endosymbiont Hodgkinia.
Campbell, Matthew A; Van Leuven, James T; Meister, Russell C; Carey, Kaitlin M; Simon, Chris; McCutcheon, John P
2015-08-18
Comparative genomics from mitochondria, plastids, and mutualistic endosymbiotic bacteria has shown that the stable establishment of a bacterium in a host cell results in genome reduction. Although many highly reduced genomes from endosymbiotic bacteria are stable in gene content and genome structure, organelle genomes are sometimes characterized by dramatic structural diversity. Previous results from Candidatus Hodgkinia cicadicola, an endosymbiont of cicadas, revealed that some lineages of this bacterium had split into two new cytologically distinct yet genetically interdependent species. It was hypothesized that the long life cycle of cicadas in part enabled this unusual lineage-splitting event. Here we test this hypothesis by investigating the structure of the Ca. Hodgkinia genome in one of the longest-lived cicadas, Magicicada tredecim. We show that the Ca. Hodgkinia genome from M. tredecim has fragmented into multiple new chromosomes or genomes, with at least some remaining partitioned into discrete cells. We also show that this lineage-splitting process has resulted in a complex of Ca. Hodgkinia genomes that are 1.1-Mb pairs in length when considered together, an almost 10-fold increase in size from the hypothetical single-genome ancestor. These results parallel some examples of genome fragmentation and expansion in organelles, although the mechanisms that give rise to these extreme genome instabilities are likely different.
Harnessing Whole Genome Sequencing in Medical Mycology.
Cuomo, Christina A
2017-01-01
Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.
Pellicer, Jaume; Kelly, Laura J; Leitch, Ilia J; Zomlefer, Wendy B; Fay, Michael F
2014-03-01
• Since the occurrence of giant genomes in angiosperms is restricted to just a few lineages, identifying where shifts towards genome obesity have occurred is essential for understanding the evolutionary mechanisms triggering this process. • Genome sizes were assessed using flow cytometry in 79 species and new chromosome numbers were obtained. Phylogenetically based statistical methods were applied to infer ancestral character reconstructions of chromosome numbers and nuclear DNA contents. • Melanthiaceae are the most diverse family in terms of genome size, with C-values ranging more than 230-fold. Our data confirmed that giant genomes are restricted to tribe Parideae, with most extant species in the family characterized by small genomes. Ancestral genome size reconstruction revealed that the most recent common ancestor (MRCA) for the family had a relatively small genome (1C = 5.37 pg). Chromosome losses and polyploidy are recovered as the main evolutionary mechanisms generating chromosome number change. • Genome evolution in Melanthiaceae has been characterized by a trend towards genome size reduction, with just one episode of dramatic DNA accumulation in Parideae. Such extreme contrasting profiles of genome size evolution illustrate the key role of transposable elements and chromosome rearrangements in driving the evolution of plant genomes. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Lu, Bingxin; Leong, Hon Wai
2016-02-01
Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.
Fritsche, Lars G.; Igl, Wilmar; Cooke Bailey, Jessica N.; Grassmann, Felix; Sengupta, Sebanti; Bragg-Gresham, Jennifer L.; Burdon, Kathryn P.; Hebbring, Scott J.; Wen, Cindy; Gorski, Mathias; Kim, Ivana K.; Cho, David; Zack, Donald; Souied, Eric; Scholl, Hendrik P. N.; Bala, Elisa; Lee, Kristine E.; Hunter, David J.; Sardell, Rebecca J.; Mitchell, Paul; Merriam, Joanna E.; Cipriani, Valentina; Hoffman, Joshua D.; Schick, Tina; Lechanteur, Yara T. E.; Guymer, Robyn H.; Johnson, Matthew P.; Jiang, Yingda; Stanton, Chloe M.; Buitendijk, Gabriëlle H. S.; Zhan, Xiaowei; Kwong, Alan M.; Boleda, Alexis; Brooks, Matthew; Gieser, Linn; Ratnapriya, Rinki; Branham, Kari E.; Foerster, Johanna R.; Heckenlively, John R.; Othman, Mohammad I.; Vote, Brendan J.; Liang, Helena Hai; Souzeau, Emmanuelle; McAllister, Ian L.; Isaacs, Timothy; Hall, Janette; Lake, Stewart; Mackey, David A.; Constable, Ian J.; Craig, Jamie E.; Kitchner, Terrie E.; Yang, Zhenglin; Su, Zhiguang; Luo, Hongrong; Chen, Daniel; Ouyang, Hong; Flagg, Ken; Lin, Danni; Mao, Guanping; Ferreyra, Henry; Stark, Klaus; von Strachwitz, Claudia N.; Wolf, Armin; Brandl, Caroline; Rudolph, Guenther; Olden, Matthias; Morrison, Margaux A.; Morgan, Denise J.; Schu, Matthew; Ahn, Jeeyun; Silvestri, Giuliana; Tsironi, Evangelia E.; Park, Kyu Hyung; Farrer, Lindsay A.; Orlin, Anton; Brucker, Alexander; Li, Mingyao; Curcio, Christine; Mohand-Saïd, Saddek; Sahel, José-Alain; Audo, Isabelle; Benchaboune, Mustapha; Cree, Angela J.; Rennie, Christina A.; Goverdhan, Srinivas V.; Grunin, Michelle; Hagbi-Levi, Shira; Campochiaro, Peter; Katsanis, Nicholas; Holz, Frank G.; Blond, Frédéric; Blanché, Hélène; Deleuze, Jean-François; Igo, Robert P.; Truitt, Barbara; Peachey, Neal S.; Meuer, Stacy M.; Myers, Chelsea E.; Moore, Emily L.; Klein, Ronald; Hauser, Michael A.; Postel, Eric A.; Courtenay, Monique D.; Schwartz, Stephen G.; Kovach, Jaclyn L.; Scott, William K.; Liew, Gerald; Tƒan, Ava G.; Gopinath, Bamini; Merriam, John C.; Smith, R. Theodore; Khan, Jane C.; Shahid, Humma; Moore, Anthony T.; McGrath, J. Allie; Laux, Reneé; Brantley, Milam A.; Agarwal, Anita; Ersoy, Lebriz; Caramoy, Albert; Langmann, Thomas; Saksens, Nicole T. M.; de Jong, Eiko K.; Hoyng, Carel B.; Cain, Melinda S.; Richardson, Andrea J.; Martin, Tammy M.; Blangero, John; Weeks, Daniel E.; Dhillon, Bal; van Duijn, Cornelia M.; Doheny, Kimberly F.; Romm, Jane; Klaver, Caroline C. W.; Hayward, Caroline; Gorin, Michael B.; Klein, Michael L.; Baird, Paul N.; den Hollander, Anneke I.; Fauser, Sascha; Yates, John R. W.; Allikmets, Rando; Wang, Jie Jin; Schaumberg, Debra A.; Klein, Barbara E. K.; Hagstrom, Stephanie A.; Chowers, Itay; Lotery, Andrew J.; Léveillard, Thierry; Zhang, Kang; Brilliant, Murray H.; Hewitt, Alex W.; Swaroop, Anand; Chew, Emily Y.; Pericak-Vance, Margaret A.; DeAngelis, Margaret; Stambolian, Dwight; Haines, Jonathan L.; Iyengar, Sudha K.; Weber, Bernhard H. F.; Abecasis, Gonçalo R.; Heid, Iris M.
2016-01-01
Advanced age-related macular degeneration (AMD) is the leading cause of blindness in the elderly with limited therapeutic options. Here, we report on a study of >12 million variants including 163,714 directly genotyped, most rare, protein-altering variant. Analyzing 16,144 patients and 17,832 controls, we identify 52 independently associated common and rare variants (P < 5×10–8) distributed across 34 loci. While wet and dry AMD subtypes exhibit predominantly shared genetics, we identify the first signal specific to wet AMD, near MMP9 (difference-P = 4.1×10–10). Very rare coding variants (frequency < 0.1%) in CFH, CFI, and TIMP3 suggest causal roles for these genes, as does a splice variant in SLC16A8. Our results support the hypothesis that rare coding variants can pinpoint causal genes within known genetic loci and illustrate that applying the approach systematically to detect new loci requires extremely large sample sizes. PMID:26691988
NASA Astrophysics Data System (ADS)
Sardanyés, Josep; Simó, Carles; Martínez, Regina; Solé, Ricard V.; Elena, Santiago F.
2014-04-01
The distribution of mutational fitness effects (DMFE) is crucial to the evolutionary fate of quasispecies. In this article we analyze the effect of the DMFE on the dynamics of a large quasispecies by means of a phenotypic version of the classic Eigen's model that incorporates beneficial, neutral, deleterious, and lethal mutations. By parameterizing the model with available experimental data on the DMFE of Vesicular stomatitis virus (VSV) and Tobacco etch virus (TEV), we found that increasing mutation does not totally push the entire viral quasispecies towards deleterious or lethal regions of the phenotypic sequence space. The probability of finding regions in the parameter space of the general model that results in a quasispecies only composed by lethal phenotypes is extremely small at equilibrium and in transient times. The implications of our findings can be extended to other scenarios, such as lethal mutagenesis or genomically unstable cancer, where increased mutagenesis has been suggested as a potential therapy.
Borman, Andrew M; Fraser, Mark; Linton, Christopher J; Palmer, Michael D; Johnson, Elizabeth M
2010-06-01
Here, we present a significantly improved version of our previously published method for the extraction of fungal genomic DNA from pure cultures using Whatman FTA filter paper matrix technology. This modified protocol is extremely rapid, significantly more cost effective than our original method, and importantly, substantially reduces the problem of potential cross-contamination between sequential filters when employing FTA technology.
Extremely Low Genomic Diversity of Rickettsia japonica Distributed in Japan.
Akter, Arzuba; Ooka, Tadasuke; Gotoh, Yasuhiro; Yamamoto, Seigo; Fujita, Hiromi; Terasoma, Fumio; Kida, Kouji; Taira, Masakatsu; Nakadouzono, Fumiko; Gokuden, Mutsuyo; Hirano, Manabu; Miyashiro, Mamoru; Inari, Kouichi; Shimazu, Yukie; Tabara, Kenji; Toyoda, Atsushi; Yoshimura, Dai; Itoh, Takehiko; Kitano, Tomokazu; Sato, Mitsuhiko P; Katsura, Keisuke; Mondal, Shakhinur Islam; Ogura, Yoshitoshi; Ando, Shuji; Hayashi, Tetsuya
2017-01-01
Rickettsiae are obligate intracellular bacteria that have small genomes as a result of reductive evolution. Many Rickettsia species of the spotted fever group (SFG) cause tick-borne diseases known as "spotted fevers". The life cycle of SFG rickettsiae is closely associated with that of the tick, which is generally thought to act as a bacterial vector and reservoir that maintains the bacterium through transstadial and transovarial transmission. Each SFG member is thought to have adapted to a specific tick species, thus restricting the bacterial distribution to a relatively limited geographic region. These unique features of SFG rickettsiae allow investigation of how the genomes of such biologically and ecologically specialized bacteria evolve after genome reduction and the types of population structures that are generated. Here, we performed a nationwide, high-resolution phylogenetic analysis of Rickettsia japonica, an etiological agent of Japanese spotted fever that is distributed in Japan and Korea. The comparison of complete or nearly complete sequences obtained from 31 R. japonica strains isolated from various sources in Japan over the past 30 years demonstrated an extremely low level of genomic diversity. In particular, only 34 single nucleotide polymorphisms were identified among the 27 strains of the major lineage containing all clinical isolates and tick isolates from the three tick species. Our data provide novel insights into the biology and genome evolution of R. japonica, including the possibilities of recent clonal expansion and a long generation time in nature due to the long dormant phase associated with tick life cycles. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.
Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N
2014-07-01
Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
The promise of discovering population-specific disease-associated genes in South Asia.
Nakatsuka, Nathan; Moorjani, Priya; Rai, Niraj; Sarkar, Biswanath; Tandon, Arti; Patterson, Nick; Bhavani, Gandham SriLakshmi; Girisha, Katta Mohan; Mustak, Mohammed S; Srinivasan, Sudha; Kaushik, Amit; Vahab, Saadi Abdul; Jagadeesh, Sujatha M; Satyamoorthy, Kapaettu; Singh, Lalji; Reich, David; Thangaraj, Kumarasamy
2017-09-01
The more than 1.5 billion people who live in South Asia are correctly viewed not as a single large population but as many small endogamous groups. We assembled genome-wide data from over 2,800 individuals from over 260 distinct South Asian groups. We identified 81 unique groups, 14 of which had estimated census sizes of more than 1 million, that descend from founder events more extreme than those in Ashkenazi Jews and Finns, both of which have high rates of recessive disease due to founder events. We identified multiple examples of recessive diseases in South Asia that are the result of such founder events. This study highlights an underappreciated opportunity for decreasing disease burden among South Asians through discovery of and testing for recessive disease-associated genes.
The promise of disease gene discovery in South Asia
Nakatsuka, Nathan; Moorjani, Priya; Rai, Niraj; Sarkar, Biswanath; Tandon, Arti; Patterson, Nick; Bhavani, Gandham SriLakshmi; Girisha, Katta Mohan; Mustak, Mohammed S; Srinivasan, Sudha; Kaushik, Amit; Vahab, Saadi Abdul; Jagadeesh, Sujatha M.; Satyamoorthy, Kapaettu; Singh, Lalji; Reich, David; Thangaraj, Kumarasamy
2017-01-01
The more than 1.5 billion people who live in South Asia are correctly viewed not as a single large population, but as many small endogamous groups. We assembled genome-wide data from over 2,800 individuals from over 260 distinct South Asian groups. We identify 81 unique groups, of which 14 have estimated census sizes of more than a million, that descend from founder events more extreme than those in Ashkenazi Jews and Finns, both of which have high rates of recessive disease due to founder events. We identify multiple examples of recessive diseases in South Asia that are the result of such founder events. This study highlights an under-appreciated opportunity for reducing disease burden among South Asians through the discovery of and testing for recessive disease genes. PMID:28714977
Orlando, Ludovic
2014-06-01
By combining state-of-the-art approaches in ancient genomics, Meyer and co-workers have reconstructed the mitochondrial sequence of an archaic hominin that lived at Sierra de Atapuerca, Spain about 400,000 years ago. This achievement follows recent advances in molecular anthropology that delivered the genome sequence of younger archaic hominins, such as Neanderthals and Denisovans. Molecular phylogenetic reconstructions placed the Atapuercan as a sister group to Denisovans, although its morphology suggested closer affinities with Neanderthals. In addition to possibly challenging our interpretation of the fossil record, this study confirms that genomic information can be recovered from extremely damaged DNA molecules, even in the presence of significant levels of human contamination. Together with the recent characterization of a 700,000-year-old horse genome, this study opens the Middle Pleistocene to genomics, thereby extending the scope of ancient DNA to the last million years. © 2014 WILEY Periodicals, Inc.
Whole-Genome Duplication and the Functional Diversification of Teleost Fish Hemoglobins
Opazo, Juan C.; Butts, G. Tyler; Nery, Mariana F.; Storz, Jay F.; Hoffmann, Federico G.
2013-01-01
Subsequent to the two rounds of whole-genome duplication that occurred in the common ancestor of vertebrates, a third genome duplication occurred in the stem lineage of teleost fishes. This teleost-specific genome duplication (TGD) is thought to have provided genetic raw materials for the physiological, morphological, and behavioral diversification of this highly speciose group. The extreme physiological versatility of teleost fish is manifest in their diversity of blood–gas transport traits, which reflects the myriad solutions that have evolved to maintain tissue O2 delivery in the face of changing metabolic demands and environmental O2 availability during different ontogenetic stages. During the course of development, regulatory changes in blood–O2 transport are mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that meet the particular O2-transport challenges encountered by the developing embryo or fetus (in viviparous or oviparous species) and in free-swimming larvae and adults. The main objective of the present study was to assess the relative contributions of whole-genome duplication, large-scale segmental duplication, and small-scale gene duplication in producing the extraordinary functional diversity of teleost Hbs. To accomplish this, we integrated phylogenetic reconstructions with analyses of conserved synteny to characterize the genomic organization and evolutionary history of the globin gene clusters of teleosts. These results were then integrated with available experimental data on functional properties and developmental patterns of stage-specific gene expression. Our results indicate that multiple α- and β-globin genes were present in the common ancestor of gars (order Lepisoteiformes) and teleosts. The comparative genomic analysis revealed that teleosts possess a dual set of TGD-derived globin gene clusters, each of which has undergone lineage-specific changes in gene content via repeated duplication and deletion events. Phylogenetic reconstructions revealed that paralogous genes convergently evolved similar functional properties in different teleost lineages. Consistent with other recent studies of globin gene family evolution in vertebrates, our results revealed evidence for repeated evolutionary transitions in the developmental regulation of Hb synthesis. PMID:22949522
O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S
2011-01-01
Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shih, Patrick
2012-03-22
Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
Shih, Patrick
2018-01-10
Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S
2017-05-22
Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.
Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.
Cormier, Alexandre; Avia, Komlan; Sterck, Lieven; Derrien, Thomas; Wucher, Valentin; Andres, Gwendoline; Monsoor, Misharl; Godfroy, Olivier; Lipinska, Agnieszka; Perrineau, Marie-Mathilde; Van De Peer, Yves; Hitte, Christophe; Corre, Erwan; Coelho, Susana M; Cock, J Mark
2017-04-01
The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
High-Up: A Remote Reservoir of Microbial Extremophiles in Central Andean Wetlands.
Albarracín, Virginia H; Kurth, Daniel; Ordoñez, Omar F; Belfiore, Carolina; Luccini, Eduardo; Salum, Graciela M; Piacentini, Ruben D; Farías, María E
2015-01-01
The Central Andes region displays unexplored ecosystems of shallow lakes and salt flats at mean altitudes of 3700 m. Being isolated and hostile, these so-called "High-Altitude Andean Lakes" (HAAL) are pristine and have been exposed to little human influence. HAAL proved to be a rich source of microbes showing interesting adaptations to life in extreme settings (poly-extremophiles) such as alkalinity, high concentrations of arsenic and dissolved salts, intense dryness, large daily ambient thermal amplitude, and extreme solar radiation levels. This work reviews HAAL microbiodiversity, taking into account different microbial niches, such as plankton, benthos, microbial mats and microbialites. The modern stromatolites and other microbialites discovered recently at HAAL are highlighted, as they provide unique modern-though quite imperfect-analogs of environments proxy for an earlier time in Earth's history (volcanic setting and profuse hydrothermal activity, low atmospheric O2 pressure, thin ozone layer and high UV exposure). Likewise, we stress the importance of HAAL microbes as model poly-extremophiles in the study of the molecular mechanisms underlying their resistance ability against UV and toxic or deleterious chemicals using genome mining and functional genomics. In future research directions, it will be necessary to exploit the full potential of HAAL poly-extremophiles in terms of their biotechnological applications. Current projects heading this way have yielded detailed molecular information and functional proof on novel extremoenzymes: i.e., DNA repair enzymes and arsenic efflux pumps for which medical and bioremediation applications, respectively, are envisaged. But still, much effort is required to unravel novel functions for this and other molecules that dwell in a unique biological treasure despite its being hidden high up, in the remote Andes.
High-Up: A Remote Reservoir of Microbial Extremophiles in Central Andean Wetlands
Albarracín, Virginia H.; Kurth, Daniel; Ordoñez, Omar F.; Belfiore, Carolina; Luccini, Eduardo; Salum, Graciela M.; Piacentini, Ruben D.; Farías, María E.
2015-01-01
The Central Andes region displays unexplored ecosystems of shallow lakes and salt flats at mean altitudes of 3700 m. Being isolated and hostile, these so-called “High-Altitude Andean Lakes” (HAAL) are pristine and have been exposed to little human influence. HAAL proved to be a rich source of microbes showing interesting adaptations to life in extreme settings (poly-extremophiles) such as alkalinity, high concentrations of arsenic and dissolved salts, intense dryness, large daily ambient thermal amplitude, and extreme solar radiation levels. This work reviews HAAL microbiodiversity, taking into account different microbial niches, such as plankton, benthos, microbial mats and microbialites. The modern stromatolites and other microbialites discovered recently at HAAL are highlighted, as they provide unique modern—though quite imperfect—analogs of environments proxy for an earlier time in Earth's history (volcanic setting and profuse hydrothermal activity, low atmospheric O2 pressure, thin ozone layer and high UV exposure). Likewise, we stress the importance of HAAL microbes as model poly-extremophiles in the study of the molecular mechanisms underlying their resistance ability against UV and toxic or deleterious chemicals using genome mining and functional genomics. In future research directions, it will be necessary to exploit the full potential of HAAL poly-extremophiles in terms of their biotechnological applications. Current projects heading this way have yielded detailed molecular information and functional proof on novel extremoenzymes: i.e., DNA repair enzymes and arsenic efflux pumps for which medical and bioremediation applications, respectively, are envisaged. But still, much effort is required to unravel novel functions for this and other molecules that dwell in a unique biological treasure despite its being hidden high up, in the remote Andes. PMID:26733008
Extreme Value Analysis of hydro meteorological extremes in the ClimEx Large-Ensemble
NASA Astrophysics Data System (ADS)
Wood, R. R.; Martel, J. L.; Willkofer, F.; von Trentini, F.; Schmid, F. J.; Leduc, M.; Frigon, A.; Ludwig, R.
2017-12-01
Many studies show an increase in the magnitude and frequency of hydrological extreme events in the course of climate change. However the contribution of natural variability to the magnitude and frequency of hydrological extreme events is not yet settled. A reliable estimate of extreme events is from great interest for water management and public safety. In the course of the ClimEx Project (www.climex-project.org) a new single-model large-ensemble was created by dynamically downscaling the CanESM2 large-ensemble with the Canadian Regional Climate Model version 5 (CRCM5) for an European Domain and a Northeastern North-American domain. By utilizing the ClimEx 50-Member Large-Ensemble (CRCM5 driven by CanESM2 Large-Ensemble) a thorough analysis of natural variability in extreme events is possible. Are the current extreme value statistical methods able to account for natural variability? How large is the natural variability for e.g. a 1/100 year return period derived from a 50-Member Large-Ensemble for Europe and Northeastern North-America? These questions should be answered by applying various generalized extreme value distributions (GEV) to the ClimEx Large-Ensemble. Hereby various return levels (5-, 10-, 20-, 30-, 60- and 100-years) based on various lengths of time series (20-, 30-, 50-, 100- and 1500-years) should be analyzed for the maximum one day precipitation (RX1d), the maximum three hourly precipitation (RX3h) and the streamflow for selected catchments in Europe. The long time series of the ClimEx Ensemble (7500 years) allows us to give a first reliable estimate of the magnitude and frequency of certain extreme events.
Genome sequencing of a single tardigrade Hypsibius dujardini individual
Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru
2016-01-01
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies. PMID:27529330
Genome sequencing of a single tardigrade Hypsibius dujardini individual.
Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru
2016-08-16
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.
The Power of CRISPR-Cas9-Induced Genome Editing to Speed Up Plant Breeding
Wang, Wenqin; Le, Hien T. T.
2016-01-01
Genome editing with engineered nucleases enabling site-directed sequence modifications bears a great potential for advanced plant breeding and crop protection. Remarkably, the RNA-guided endonuclease technology (RGEN) based on the clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein 9 (Cas9) is an extremely powerful and easy tool that revolutionizes both basic research and plant breeding. Here, we review the major technical advances and recent applications of the CRISPR-Cas9 system for manipulation of model and crop plant genomes. We also discuss the future prospects of this technology in molecular plant breeding. PMID:28097123
Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.
Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris
2010-07-15
The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net
diCenzo, George C; Finan, Turlough M
2018-01-01
The rate at which all genes within a bacterial genome can be identified far exceeds the ability to characterize these genes. To assist in associating genes with cellular functions, a large-scale bacterial genome deletion approach can be employed to rapidly screen tens to thousands of genes for desired phenotypes. Here, we provide a detailed protocol for the generation of deletions of large segments of bacterial genomes that relies on the activity of a site-specific recombinase. In this procedure, two recombinase recognition target sequences are introduced into known positions of a bacterial genome through single cross-over plasmid integration. Subsequent expression of the site-specific recombinase mediates recombination between the two target sequences, resulting in the excision of the intervening region and its loss from the genome. We further illustrate how this deletion system can be readily adapted to function as a large-scale in vivo cloning procedure, in which the region excised from the genome is captured as a replicative plasmid. We next provide a procedure for the metabolic analysis of bacterial large-scale genome deletion mutants using the Biolog Phenotype MicroArray™ system. Finally, a pipeline is described, and a sample Matlab script is provided, for the integration of the obtained data with a draft metabolic reconstruction for the refinement of the reactions and gene-protein-reaction relationships in a metabolic reconstruction.
Nishino, Jo; Kochi, Yuta; Shigemizu, Daichi; Kato, Mamoru; Ikari, Katsunori; Ochi, Hidenori; Noma, Hisashi; Matsui, Kota; Morizono, Takashi; Boroevich, Keith A.; Tsunoda, Tatsuhiko; Matsui, Shigeyuki
2018-01-01
Genome-wide association studies (GWAS) suggest that the genetic architecture of complex diseases consists of unexpectedly numerous variants with small effect sizes. However, the polygenic architectures of many diseases have not been well characterized due to lack of simple and fast methods for unbiased estimation of the underlying proportion of disease-associated variants and their effect-size distribution. Applying empirical Bayes estimation of semi-parametric hierarchical mixture models to GWAS summary statistics, we confirmed that schizophrenia was extremely polygenic [~40% of independent genome-wide SNPs are risk variants, most within odds ratio (OR = 1.03)], whereas rheumatoid arthritis was less polygenic (~4 to 8% risk variants, significant portion reaching OR = 1.05 to 1.1). For rheumatoid arthritis, stratified estimations revealed that expression quantitative loci in blood explained large genetic variance, and low- and high-frequency derived alleles were prone to be risk and protective, respectively, suggesting a predominance of deleterious-risk and advantageous-protective mutations. Despite genetic correlation, effect-size distributions for schizophrenia and bipolar disorder differed across allele frequency. These analyses distinguished disease polygenic architectures and provided clues for etiological differences in complex diseases. PMID:29740473
Kiselev, O I; Vasin, A V; Shevyryova, M P; Deeva, E G; Sivak, K V; Egorov, V V; Tsvetkov, V B; Egorov, A Yu; Romanovskaya-Romanko, E A; Stepanova, L A; Komissarov, A B; Tsybalova, L M; Ignatjev, G M
2015-01-01
Ebola hemorrhagic fever (EHF) epidemic currently ongoing in West Africa is not the first among numerous epidemics in the continent. Yet it seems to be the worst EHF epidemic outbreak caused by Ebola virus Zaire since 1976 as regards its extremely large scale and rapid spread in the population. Experiments to study the agent have continued for more than 20 years. The EHF virus has a relatively simple genome with seven genes and additional reading frame resulting from RNA editing. While being of a relatively low genetic capacity, the virus can be ranked as a standard for pathogenicity with the ability to evade the host immune response in uttermost perfection. The EHF virus has similarities with retroviruses, but belongs to (-)RNA viruses of a nonretroviral origin. Genetic elements of the virus, NIRV, were detected in animal and human genomes. EHF virus glycoprotein (GP) is a class I fusion protein and shows more similarities than distinctions in tertiary structure with SIV and HIV gp41 proteins and even influenza virus hemagglutinin. EHF is an unusual infectious disease, and studying the molecular basis of its pathogenesis may contribute to new findings in therapy of severe conditions leading to a fatal outcome.
2011-01-01
Background 'Selection signatures' delimit regions of the genome that are, or have been, functionally important and have therefore been under either natural or artificial selection. In this study, two different and complementary methods--integrated Haplotype Homozygosity Score (|iHS|) and population differentiation index (FST)--were applied to identify traces of decades of intensive artificial selection for traits of economic importance in modern cattle. Results We scanned the genome of a diverse set of dairy and beef breeds from Germany, Canada and Australia genotyped with a 50 K SNP panel. Across breeds, a total of 109 extreme |iHS| values exceeded the empirical threshold level of 5% with 19, 27, 9, 10 and 17 outliers in Holstein, Brown Swiss, Australian Angus, Hereford and Simmental, respectively. Annotating the regions harboring clustered |iHS| signals revealed a panel of interesting candidate genes like SPATA17, MGAT1, PGRMC2 and ACTC1, COL23A1, MATN2, respectively, in the context of reproduction and muscle formation. In a further step, a new Bayesian FST-based approach was applied with a set of geographically separated populations including Holstein, Brown Swiss, Simmental, North American Angus and Piedmontese for detecting differentiated loci. In total, 127 regions exceeding the 2.5 per cent threshold of the empirical posterior distribution were identified as extremely differentiated. In a substantial number (56 out of 127 cases) the extreme FST values were found to be positioned in poor gene content regions which deviated significantly (p < 0.05) from the expectation assuming a random distribution. However, significant FST values were found in regions of some relevant genes such as SMCP and FGF1. Conclusions Overall, 236 regions putatively subject to recent positive selection in the cattle genome were detected. Both |iHS| and FST suggested selection in the vicinity of the Sialic acid binding Ig-like lectin 5 gene on BTA18. This region was recently reported to be a major QTL with strong effects on productive life and fertility traits in Holstein cattle. We conclude that high-resolution genome scans of selection signatures can be used to identify genomic regions contributing to within- and inter-breed phenotypic variation. PMID:21679429
Castro, Wendel de Oliveira; Torres-Ballesteros, Adriana Maria; Nakayama, Cristina Rossi; Melo, Itamar Soares; Pellizari, Vivian Helena; Silva, Artur; Ramos, Rommel Thiago Jucá
2014-08-14
Organisms in the Haloferax genus are extreme halophiles that grow in environments with pH values between 4 and 12, and temperatures between 0°C and 60°C. In the present study, a draft of the first Haloferax sp. strain ATB1 genome isolated from the region of Cariri (in Paraíba State, Brazil) is presented. Copyright © 2014 Castro et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kalamorz, Falk; Keis, Stefanie; Stanton, Jo-Ann
The genes and molecular machines that allow for a thermoalkaliphilic lifestyle have not been defined. To address this goal, we report on the improved high-quality draft genome sequence of Caldalkalibacillus thermarum strain TA2.A1, an obligately aerobic bacterium that grows optimally at pH 9.5 and 65 to 70 C on a wide variety of carbon and energy sources.
Koo, Hyunmin; Strope, Bailey M; Kim, Eddy H; Shabani, Adel M; Kumar, Ranjit; Crowley, Michael R; Andersen, Dale T; Bej, Asim K
2016-01-21
Janthinobacterium sp. Ant5-2-1, isolated from the Schirmacher Oasis of East Antarctica, produces a purple-violet pigment, manifests diverse energy metabolism abilities, and tolerates cold, ultraviolet radiation, and other environmental stressors. We report here the 6.19-Mb draft genome of strain Ant5-2-1, which will help understand its survival mechanisms in extreme Antarctic ecosystems. Copyright © 2016 Koo et al.
Ptacek, Travis; Crowley, Michael; Swain, Ashit K.; Osborne, John D.; Bej, Asim K.; Andersen, Dale T.
2014-01-01
Hymenobacter sp. IS2118, isolated from a freshwater lake in Schirmacher Oasis, Antarctica, produces extracellular polymeric substance (EPS) and manifests tolerance to cold, UV radiation (UVR), and oxidative stress. We report the 5.26-Mb draft genome of strain IS2118, which will help us to understand its adaptation and survival mechanisms in Antarctic extreme ecosystems. PMID:25103756
Metcalfe, Cushla J; Filée, Jonathan; Germon, Isabelle; Joss, Jean; Casane, Didier
2012-11-01
Haploid genomes greater than 25,000 Mb are rare, within the animals only the lungfish and some of the salamanders and crustaceans are known to have genomes this large. There is very little data on the structure of genomes this size. It is known, however, that for animal genomes up to 3,000 Mb, there is in general a good correlation between genome size and the percent of the genome composed of repetitive sequence and that this repetitive component is highly dynamic. In this study, we sampled the Australian lungfish genome using three mini-genomic libraries and found that with very little sequence, the results converged on an estimate of 40% of the genome being composed of recognizable transposable elements (TEs), chiefly from the CR1 and L2 long interspersed nuclear element clades. We further characterized the CR1 and L2 elements in the lungfish genome and show that although most CR1 elements probably represent recent amplifications, the L2 elements are more diverse and are more likely the result of a series of amplifications. We suggest that our sampling method has probably underestimated the recognizable TE content. However, on the basis of the most likely sources of error, we suggest that this very large genome is not largely composed of recently amplified, undetected TEs but may instead include a large component of older degenerate TEs. Based on these estimates, and on Thomson's (Thomson K. 1972. An attempt to reconstruct evolutionary changes in the cellular DNA content of lungfish. J Exp Zool. 180:363-372) inference that in the lineage leading to the extant Australian lungfish, there was massive increase in genome size between 350 and 200 mya, after which the size of the genome changed little, we speculate that the very large Australian lungfish genome may be the result of a massive amplification of TEs followed by a long period with a very low rate of sequence removal and some ongoing TE activity.
Ullrich, Sophie R.; González, Carolina; Poehlein, Anja; Tischler, Judith S.; Daniel, Rolf; Schlömann, Michael; Holmes, David S.; Mühling, Martin
2016-01-01
Acid mine drainage (AMD), associated with active and abandoned mining sites, is a habitat for acidophilic microorganisms that gain energy from the oxidation of reduced sulfur compounds and ferrous iron and that thrive at pH below 4. Members of the recently proposed genus “Ferrovum” are the first acidophilic iron oxidizers to be described within the Betaproteobacteria. Although they have been detected as typical community members in AMD habitats worldwide, knowledge of their phylogenetic and metabolic diversity is scarce. Genomics approaches appear to be most promising in addressing this lacuna since isolation and cultivation of “Ferrovum” has proven to be extremely difficult and has so far only been successful for the designated type strain “Ferrovum myxofaciens” P3G. In this study, the genomes of two novel strains of “Ferrovum” (PN-J185 and Z-31) derived from water samples of a mine water treatment plant were sequenced. These genomes were compared with those of “Ferrovum” sp. JA12 that also originated from the mine water treatment plant, and of the type strain (P3G). Phylogenomic scrutiny suggests that the four strains represent three “Ferrovum” species that cluster in two groups (1 and 2). Comprehensive analysis of their predicted metabolic pathways revealed that these groups harbor characteristic metabolic profiles, notably with respect to motility, chemotaxis, nitrogen metabolism, biofilm formation and their potential strategies to cope with the acidic environment. For example, while the “F. myxofaciens” strains (group 1) appear to be motile and diazotrophic, the non-motile group 2 strains have the predicted potential to use a greater variety of fixed nitrogen sources. Furthermore, analysis of their genome synteny provides first insights into their genome evolution, suggesting that horizontal gene transfer and genome reduction in the group 2 strains by loss of genes encoding complete metabolic pathways or physiological features contributed to the observed diversification. PMID:27303384
Complete genome sequence of Brachyspira murdochii type strain (56-150T)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pati, Amrita; Sikorski, Johannes; Gronow, Sabine
2010-01-01
Brachyspira murdochii Stanton et al. 1992 is a non-pathogenic but host-associated spirochete of the family Brachyspiraceae. Initially isolated from the intestinal content of a healthy swine, the group B spirochaetes were first described under the basonym Serpulina murdochii. Members of the family Brachyspiraceae are of great phylogenetic interest because of the extremely isolated location of this family within the phylum Spirochaetes . Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a type strain of a member of the family Brachyspiraceaeand only the second genomemore » sequence from a member of the genus Brachyspira. The 3,241,804 bp long genome with its 2,893 protein-coding and 40 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less
Complete genome sequence of Halogeometricum borinquense type strain (PR3T)
Malfatti, Stephanie; Tindall, Brian J.; Schneider, Susanne; Fähnrich, Regine; Lapidus, Alla; LaButtii, Kurt; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Chen, Feng; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Anderson, Iain; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; D’haeseleer, Patrik; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Chain, Patrick
2009-01-01
Halogeometricum borinquense Montalvo-Rodríguez et al. 1998 is the type species of the genus, and is of phylogenetic interest because of its distinct location between the halobacterial genera Haloquadratum and Halosarcina. H. borinquense requires extremely high salt (NaCl) concentrations for growth. It can not only grow aerobically but also anaerobically using nitrate as electron acceptor. The strain described in this report is a free-living, motile, pleomorphic, euryarchaeon, which was originally isolated from the solar salterns of Cabo Rojo, Puerto Rico. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the halobacterial genus Halogeometricum, and this 3,944,467 bp long six replicon genome with its 3937 protein-coding and 57 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304651
Complete genome sequence of Halogeometricum borinquense type strain (PR 3 T)
Malfatti, Stephanie; Tindall, Brian J.; Schneider, Susanne; ...
2009-09-29
Halogeometricum borinquense Montalvo-Rodríguez et al. 1998 is the type species of the genus, and is of phylogenetic interest because of its distinct location between the halobacterial genera Haloquadratum and Halosarcina. H. borinquense requires extremely high salt (NaCl) concentrations for growth. It can not only grow aerobically but also anaerobically using nitrate as electron acceptor. The strain described in this report is a free-living, motile, pleomorphic, euryarchaeon, which was originally isolated from the solar salterns of Cabo Rojo, Puerto Rico. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first completemore » genome sequence of the halobacterial genus Halogeometricum, and this 3,944,467 bp long six replicon genome with its 3937 protein-coding and 57 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less
Sex Determination, Sex Chromosomes, and Karyotype Evolution in Insects.
Blackmon, Heath; Ross, Laura; Bachtrog, Doris
2017-01-01
Insects harbor a tremendous diversity of sex determining mechanisms both within and between groups. For example, in some orders such as Hymenoptera, all members are haplodiploid, whereas Diptera contain species with homomorphic as well as male and female heterogametic sex chromosome systems or paternal genome elimination. We have established a large database on karyotypes and sex chromosomes in insects, containing information on over 13000 species covering 29 orders of insects. This database constitutes a unique starting point to report phylogenetic patterns on the distribution of sex determination mechanisms, sex chromosomes, and karyotypes among insects and allows us to test general theories on the evolutionary dynamics of karyotypes, sex chromosomes, and sex determination systems in a comparative framework. Phylogenetic analysis reveals that male heterogamety is the ancestral mode of sex determination in insects, and transitions to female heterogamety are extremely rare. Many insect orders harbor species with complex sex chromosomes, and gains and losses of the sex-limited chromosome are frequent in some groups. Haplodiploidy originated several times within insects, and parthenogenesis is rare but evolves frequently. Providing a single source to electronically access data previously distributed among more than 500 articles and books will not only accelerate analyses of the assembled data, but also provide a unique resource to guide research on which taxa are likely to be informative to address specific questions, for example, for genome sequencing projects or large-scale comparative studies. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture
Darrow, Emily M.; Huntley, Miriam H.; Dudchenko, Olga; Stamenova, Elena K.; Durand, Neva C.; Sun, Zhuo; Huang, Su-Chen; Sanborn, Adrian L.; Machol, Ido; Shamim, Muhammad; Seberg, Andrew P.; Lander, Eric S.; Chadwick, Brian P.; Aiden, Erez Lieberman
2016-01-01
During interphase, the inactive X chromosome (Xi) is largely transcriptionally silent and adopts an unusual 3D configuration known as the “Barr body.” Despite the importance of X chromosome inactivation, little is known about this 3D conformation. We recently showed that in humans the Xi chromosome exhibits three structural features, two of which are not shared by other chromosomes. First, like the chromosomes of many species, Xi forms compartments. Second, Xi is partitioned into two huge intervals, called “superdomains,” such that pairs of loci in the same superdomain tend to colocalize. The boundary between the superdomains lies near DXZ4, a macrosatellite repeat whose Xi allele extensively binds the protein CCCTC-binding factor. Third, Xi exhibits extremely large loops, up to 77 megabases long, called “superloops.” DXZ4 lies at the anchor of several superloops. Here, we combine 3D mapping, microscopy, and genome editing to study the structure of Xi, focusing on the role of DXZ4. We show that superloops and superdomains are conserved across eutherian mammals. By analyzing ligation events involving three or more loci, we demonstrate that DXZ4 and other superloop anchors tend to colocate simultaneously. Finally, we show that deleting DXZ4 on Xi leads to the disappearance of superdomains and superloops, changes in compartmentalization patterns, and changes in the distribution of chromatin marks. Thus, DXZ4 is essential for proper Xi packaging. PMID:27432957
2013-01-01
Background We describe the genome of the western painted turtle, Chrysemys picta bellii, one of the most widespread, abundant, and well-studied turtles. We place the genome into a comparative evolutionary context, and focus on genomic features associated with tooth loss, immune function, longevity, sex differentiation and determination, and the species' physiological capacities to withstand extreme anoxia and tissue freezing. Results Our phylogenetic analyses confirm that turtles are the sister group to living archosaurs, and demonstrate an extraordinarily slow rate of sequence evolution in the painted turtle. The ability of the painted turtle to withstand complete anoxia and partial freezing appears to be associated with common vertebrate gene networks, and we identify candidate genes for future functional analyses. Tooth loss shares a common pattern of pseudogenization and degradation of tooth-specific genes with birds, although the rate of accumulation of mutations is much slower in the painted turtle. Genes associated with sex differentiation generally reflect phylogeny rather than convergence in sex determination functionality. Among gene families that demonstrate exceptional expansions or show signatures of strong natural selection, immune function and musculoskeletal patterning genes are consistently over-represented. Conclusions Our comparative genomic analyses indicate that common vertebrate regulatory networks, some of which have analogs in human diseases, are often involved in the western painted turtle's extraordinary physiological capacities. As these regulatory pathways are analyzed at the functional level, the painted turtle may offer important insights into the management of a number of human health disorders. PMID:23537068
Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus.
Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark; Arakawa, Kazuharu
2017-07-01
Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda.
Kumwenda, Benjamin; Litthauer, Derek; Reva, Oleg
2014-09-25
Bacteria of genus Thermus inhabit both man-made and natural thermal environments. Several Thermus species have shown biotechnological potential such as reduction of heavy metals which is essential for eradication of heavy metal pollution; removing of organic contaminants in water; opening clogged pipes, controlling global warming among many others. Enzymes from thermophilic bacteria have exhibited higher activity and stability than synthetic or enzymes from mesophilic organisms. Using Meiothermus silvanus DSM 9946 as a reference genome, high level of coordinated rearrangements has been observed in extremely thermophilic Thermus that may imply existence of yet unknown evolutionary forces controlling adaptive re-organization of whole genomes of thermo-extremophiles. However, no remarkable differences were observed across species on distribution of functionally related genes on the chromosome suggesting constraints imposed by metabolic networks. The metabolic network exhibit evolutionary pressures similar to levels of rearrangements as measured by the cross-clustering index. Using stratigraphic analysis of donor-recipient, intensive gene exchanges were observed from Meiothermus species and some unknown sources to Thermus species confirming a well established DNA uptake mechanism as previously proposed. Global genome rearrangements were found to play an important role in the evolution of Thermus bacteria at both genomic and metabolic network levels. Relatively higher level of rearrangements was observed in extremely thermophilic Thermus strains in comparison to the thermo-tolerant Thermus scotoductus. Rearrangements did not significantly disrupt operons and functionally related genes. Thermus species appeared to have a developed capability for acquiring DNA through horizontal gene transfer as shown by the donor-recipient stratigraphic analysis.
RNA-Seq reveals complex genetic response to Deepwater Horizon oil release in Fundulus grandis.
Garcia, Tzintzuni I; Shen, Yingjia; Crawford, Douglas; Oleksiak, Marjorie F; Whitehead, Andrew; Walter, Ronald B
2012-09-12
The release of oil resulting from the blowout of the Deepwater Horizon (DH) drilling platform was one of the largest in history discharging more than 189 million gallons of oil and subject to widespread application of oil dispersants. This event impacted a wide range of ecological habitats with a complex mix of pollutants whose biological impact is still not yet fully understood. To better understand the effects on a vertebrate genome, we studied gene expression in the salt marsh minnow Fundulus grandis, which is local to the northern coast of the Gulf of Mexico and is a sister species of the ecotoxicological model Fundulus heteroclitus. To assess genomic changes, we quantified mRNA expression using high throughput sequencing technologies (RNA-Seq) in F. grandis populations in the marshes and estuaries impacted by DH oil release. This application of RNA-Seq to a non-model, wild, and ecologically significant organism is an important evaluation of the technology to quickly assess similar events in the future. Our de novo assembly of RNA-Seq data produced a large set of sequences which included many duplicates and fragments. In many cases several of these could be associated with a common reference sequence using blast to query a reference database. This reduced the set of significant genes to 1,070 down-regulated and 1,251 up-regulated genes. These genes indicate a broad and complex genomic response to DH oil exposure including the expected AHR-mediated response and CYP genes. In addition a response to hypoxic conditions and an immune response are also indicated. Several genes in the choriogenin family were down-regulated in the exposed group; a response that is consistent with AH exposure. These analyses are in agreement with oligonucleotide-based microarray analyses, and describe only a subset of significant genes with aberrant regulation in the exposed set. RNA-Seq may be successfully applied to feral and extremely polymorphic organisms that do not have an underlying genome sequence assembly to address timely environmental problems. Additionally, the observed changes in a large set of transcript expression levels are indicative of a complex response to the varied petroleum components to which the fish were exposed.
Repar, Jelena; Warnecke, Tobias
2017-01-01
Abstract Inversions are a major contributor to structural genome evolution in prokaryotes. Here, using a novel alignment-based method, we systematically compare 1,651 bacterial and 98 archaeal genomes to show that inversion landscapes are frequently biased toward (symmetric) inversions around the origin–terminus axis. However, symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades. At the extremes, inversion landscapes in Bacillus–Clostridium and Actinobacteria are dominated by symmetric inversions, while there is little or no systematic bias favoring symmetric rearrangements in archaea with a single origin of replication. Within clades, we find strong but clade-specific relationships between symmetric inversion bias and different features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand. We suggest that heterogeneous selection pressures have converged to produce similar patterns of structural genome evolution across prokaryotes. PMID:28407093
Hargreaves, Adam D; Zhou, Long; Christensen, Josef; Marlétaz, Ferdinand; Liu, Shiping; Li, Fang; Jansen, Peter Gildsig; Spiga, Enrico; Hansen, Matilde Thye; Pedersen, Signe Vendelbo Horn; Biswas, Shameek; Serikawa, Kyle; Fox, Brian A; Taylor, William R; Mulley, John Frederick; Zhang, Guojie; Heller, R Scott; Holland, Peter W H
2017-07-18
The sand rat Psammomys obesus is a gerbil species native to deserts of North Africa and the Middle East, and is constrained in its ecology because high carbohydrate diets induce obesity and type II diabetes that, in extreme cases, can lead to pancreatic failure and death. We report the sequencing of the sand rat genome and discovery of an unusual, extensive, and mutationally biased GC-rich genomic domain. This highly divergent genomic region encompasses several functionally essential genes, and spans the ParaHox cluster which includes the insulin-regulating homeobox gene Pdx1. The sequence of sand rat Pdx1 has been grossly affected by GC-biased mutation, leading to the highest divergence observed for this gene across the Bilateria. In addition to genomic insights into restricted caloric intake in a desert species, the discovery of a localized chromosomal region subject to elevated mutation suggests that mutational heterogeneity within genomes could influence the course of evolution.
Wu, Chen; Twort, Victoria G; Crowhurst, Ross N; Newcomb, Richard D; Buckley, Thomas R
2017-11-16
Stick insects (Phasmatodea) have a high incidence of parthenogenesis and other alternative reproductive strategies, yet the genetic basis of reproduction is poorly understood. Phasmatodea includes nearly 3000 species, yet only the genome of Timema cristinae has been published to date. Clitarchus hookeri is a geographical parthenogenetic stick insect distributed across New Zealand. Sexual reproduction dominates in northern habitats but is replaced by parthenogenesis in the south. Here, we present a de novo genome assembly of a female C. hookeri and use it to detect candidate genes associated with gamete production and development in females and males. We also explore the factors underlying large genome size in stick insects. The C. hookeri genome assembly was 4.2 Gb, similar to the flow cytometry estimate, making it the second largest insect genome sequenced and assembled to date. Like the large genome of Locusta migratoria, the genome of C. hookeri is also highly repetitive and the predicted gene models are much longer than those from most other sequenced insect genomes, largely due to longer introns. Miniature inverted repeat transposable elements (MITEs), absent in the much smaller T. cristinae genome, is the most abundant repeat type in the C. hookeri genome assembly. Mapping RNA-Seq reads from female and male gonadal transcriptomes onto the genome assembly resulted in the identification of 39,940 gene loci, 15.8% and 37.6% of which showed female-biased and male-biased expression, respectively. The genes that were over-expressed in females were mostly associated with molecular transportation, developmental process, oocyte growth and reproductive process; whereas, the male-biased genes were enriched in rhythmic process, molecular transducer activity and synapse. Several genes involved in the juvenile hormone synthesis pathway were also identified. The evolution of large insect genomes such as L. migratoria and C. hookeri genomes is most likely due to the accumulation of repetitive regions and intron elongation. MITEs contributed significantly to the growth of C. hookeri genome size yet are surprisingly absent from the T. cristinae genome. Sex-biased genes identified from gonadal tissues, including genes involved in juvenile hormone synthesis, provide interesting candidates for the further study of flexible reproduction in stick insects.
Rodrigues, Debora F; Ivanova, Natalia; He, Zhili; Huebner, Marianne; Zhou, Jizhong; Tiedje, James M
2008-01-01
Background Many microorganisms have a wide temperature growth range and versatility to tolerate large thermal fluctuations in diverse environments, however not many have been fully explored over their entire growth temperature range through a holistic view of its physiology, genome, and transcriptome. We used Exiguobacterium sibiricum strain 255-15, a psychrotrophic bacterium from 3 million year old Siberian permafrost that grows from -5°C to 39°C to study its thermal adaptation. Results The E. sibiricum genome has one chromosome and two small plasmids with a total of 3,015 protein-encoding genes (CDS), and a GC content of 47.7%. The genome and transcriptome analysis along with the organism's known physiology was used to better understand its thermal adaptation. A total of 27%, 3.2%, and 5.2% of E. sibiricum CDS spotted on the DNA microarray detected differentially expressed genes in cells grown at -2.5°C, 10°C, and 39°C, respectively, when compared to cells grown at 28°C. The hypothetical and unknown genes represented 10.6%, 0.89%, and 2.3% of the CDS differentially expressed when grown at -2.5°C, 10°C, and 39°C versus 28°C, respectively. Conclusion The results show that E. sibiricum is constitutively adapted to cold temperatures stressful to mesophiles since little differential gene expression was observed between 4°C and 28°C, but at the extremities of its Arrhenius growth profile, namely -2.5°C and 39°C, several physiological and metabolic adaptations associated with stress responses were observed. PMID:19019206
2013-01-01
Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes. PMID:23537002
Gonzalez, Michael V; Mousel, Michelle R; Herndon, David R; Jiang, Yu; Dalrymple, Brian P; Reynolds, James O; Johnson, Wendell C; Herrmann-Hoesing, Lynn M; White, Stephen N
2013-01-01
A genome-wide association study (GWAS) was performed to investigate seven red blood cell (RBC) phenotypes in over 500 domestic sheep (Ovis aries) from three breeds (Columbia, Polypay, and Rambouillet). A single nucleotide polymorphism (SNP) showed genome-wide significant association with increased mean corpuscular hemoglobin concentration (MCHC, P = 6.2×10(-14)) and genome-wide suggestive association with decreased mean corpuscular volume (MCV, P = 2.5×10(-6)). The ovine HapMap project found the same genomic region and the same peak SNP has been under extreme historical selective pressure, demonstrating the importance of this region for survival, reproduction, and/or artificially selected traits. We observed a large (>50 kb) variant haplotype sequence containing a full-length divergent artiodactyl MYADM-like repeat in strong linkage disequilibrium with the associated SNP. MYADM gene family members play roles in membrane organization and formation in myeloid cells. However, to our knowledge, no member of the MYADM gene family has been identified in development of morphologically variant RBCs. The specific RBC differences may be indicative of alterations in morphology. Additionally, erythrocytes with altered morphological structure often exhibit increased structural fragility, leading to increased RBC turnover and energy expenditure. The divergent artiodactyl MYADM-like repeat was also associated with increased ewe lifetime kilograms of lamb weaned (P = 2×10(-4)). This suggests selection for normal RBCs might increase lamb weights, although further validation is required before implementation in marker-assisted selection. These results provide clues to explain the strong selection on the artiodactyl MYADM-like repeat locus in sheep, and suggest MYADM family members may be important for RBC morphology in other mammals.
Gonzalez, Michael V.; Mousel, Michelle R.; Herndon, David R.; Jiang, Yu; Dalrymple, Brian P.; Reynolds, James O.; Johnson, Wendell C.; Herrmann-Hoesing, Lynn M.; White, Stephen N.
2013-01-01
A genome-wide association study (GWAS) was performed to investigate seven red blood cell (RBC) phenotypes in over 500 domestic sheep (Ovis aries) from three breeds (Columbia, Polypay, and Rambouillet). A single nucleotide polymorphism (SNP) showed genome-wide significant association with increased mean corpuscular hemoglobin concentration (MCHC, P = 6.2×10−14) and genome-wide suggestive association with decreased mean corpuscular volume (MCV, P = 2.5×10−6). The ovine HapMap project found the same genomic region and the same peak SNP has been under extreme historical selective pressure, demonstrating the importance of this region for survival, reproduction, and/or artificially selected traits. We observed a large (>50 kb) variant haplotype sequence containing a full-length divergent artiodactyl MYADM-like repeat in strong linkage disequilibrium with the associated SNP. MYADM gene family members play roles in membrane organization and formation in myeloid cells. However, to our knowledge, no member of the MYADM gene family has been identified in development of morphologically variant RBCs. The specific RBC differences may be indicative of alterations in morphology. Additionally, erythrocytes with altered morphological structure often exhibit increased structural fragility, leading to increased RBC turnover and energy expenditure. The divergent artiodactyl MYADM-like repeat was also associated with increased ewe lifetime kilograms of lamb weaned (P = 2×10−4). This suggests selection for normal RBCs might increase lamb weights, although further validation is required before implementation in marker-assisted selection. These results provide clues to explain the strong selection on the artiodactyl MYADM-like repeat locus in sheep, and suggest MYADM family members may be important for RBC morphology in other mammals. PMID:24023702
Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter
2017-01-01
The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230
Zhang, Han; Rokas, Antonis; Slot, Jason C
2012-01-01
Dermatophyte fungi of the family Arthrodermataceae (Eurotiomycetes) colonize keratinized tissue, such as skin, frequently causing superficial mycoses in humans and other mammals, reptiles, and birds. Competition with native microflora likely underlies the propensity of these dermatophytes to produce a diversity of antibiotics and compounds for scavenging iron, which is extremely scarce, as well as the presence of an unusually large number of putative secondary metabolism gene clusters, most of which contain non-ribosomal peptide synthetases (NRPS), in their genomes. To better understand the historical origins and diversification of NRPS-containing gene clusters we examined the evolution of a variable locus (VL) that exists in one of three alternative conformations among the genomes of seven dermatophyte species. The first conformation of the VL (termed VLA) contains only 539 base pairs of sequence and lacks protein-coding genes, whereas the other two conformations (termed VLB and VLC) span 36 Kb and 27 Kb and contain 12 and 10 genes, respectively. Interestingly, both VLB and VLC appear to contain distinct secondary metabolism gene clusters; VLB contains a NRPS gene as well as four porphyrin metabolism genes never found to be physically linked in the genomes of 128 other fungal species, whereas VLC also contains a NRPS gene as well as several others typically found associated with secondary metabolism gene clusters. Phylogenetic evidence suggests that the VL locus was present in the ancestor of all seven species achieving its present distribution through subsequent differential losses or retentions of specific conformations. We propose that the existence of variable loci, similar to the one we studied, in fungal genomes could potentially explain the dramatic differences in secondary metabolic diversity between closely related species of filamentous fungi, and contribute to host adaptation and the generation of metabolic diversity.
Gill, Jason J.; Summer, Elizabeth J.; Russell, William K.; Cologna, Stephanie M.; Carlile, Thomas M.; Fuller, Alicia C.; Kitsopoulos, Kate; Mebane, Leslie M.; Parkinson, Brandi N.; Sullivan, David; Carmody, Lisa A.; Gonzalez, Carlos F.; LiPuma, John J.; Young, Ry
2011-01-01
Within the Burkholderia cepacia complex, B. cenocepacia is the most common species associated with aggressive infections in the lungs of cystic fibrosis patients, causing disease that is often refractive to treatment by antibiotics. Phage therapy may be a potential alternative form of treatment for these infections. Here we describe the genome of the previously described therapeutic B. cenocepacia podophage BcepIL02 and its close relative, Bcep22. Phage Bcep22 was found to contain a circularly permuted genome of 63,882 bp containing 77 genes; BcepIL02 was found to be 62,714 bp and contains 76 predicted genes. Major virion-associated proteins were identified by proteomic analysis. We propose that these phages comprise the founding members of a novel podophage lineage, the Bcep22-like phages. Among the interesting features of these phages are a series of tandemly repeated putative tail fiber genes that are similar to each other and also to one or more such genes in the other phages. Both phages also contain an extremely large (ca. 4,600-amino-acid), virion-associated, multidomain protein that accounts for over 20% of the phages' coding capacity, is widely distributed among other bacterial and phage genomes, and may be involved in facilitating DNA entry in both phage and other mobile DNA elements. The phages, which were previously presumed to be virulent, show evidence of a temperate lifestyle but are apparently unable to form stable lysogens in their hosts. This ambiguity complicates determination of a phage lifestyle, a key consideration in the selection of therapeutic phages. PMID:21804006
Home - The Cancer Genome Atlas - Cancer Genome - TCGA
The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.
GDC 2: Compression of large collections of genomes
Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin
2015-01-01
The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279
GDC 2: Compression of large collections of genomes.
Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin
2015-06-25
The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about.
Radiation-induced genomic instability: radiation quality and dose response
NASA Technical Reports Server (NTRS)
Smith, Leslie E.; Nagar, Shruti; Kim, Grace J.; Morgan, William F.
2003-01-01
Genomic instability is a term used to describe a phenomenon that results in the accumulation of multiple changes required to convert a stable genome of a normal cell to an unstable genome characteristic of a tumor. There has been considerable recent debate concerning the importance of genomic instability in human cancer and its temporal occurrence in the carcinogenic process. Radiation is capable of inducing genomic instability in mammalian cells and instability is thought to be the driving force responsible for radiation carcinogenesis. Genomic instability is characterized by a large collection of diverse endpoints that include large-scale chromosomal rearrangements and aberrations, amplification of genetic material, aneuploidy, micronucleus formation, microsatellite instability, and gene mutation. The capacity of radiation to induce genomic instability depends to a large extent on radiation quality or linear energy transfer (LET) and dose. There appears to be a low dose threshold effect with low LET, beyond which no additional genomic instability is induced. Low doses of both high and low LET radiation are capable of inducing this phenomenon. This report reviews data concerning dose rate effects of high and low LET radiation and their capacity to induce genomic instability assayed by chromosomal aberrations, delayed lethal mutations, micronuclei and apoptosis.
Jordan, Rebecca; Dillon, Shannon K; Prober, Suzanne M; Hoffmann, Ary A
2016-12-01
In order to contribute to evolutionary resilience and adaptive potential in highly modified landscapes, revegetated areas should ideally reflect levels of genetic diversity within and across natural stands. Landscape genomic analyses enable such diversity patterns to be characterized at genome and chromosomal levels. Landscape-wide patterns of genomic diversity were assessed in Eucalyptus microcarpa, a dominant tree species widely used in revegetation in Southeastern Australia. Trees from small and large patches within large remnants, small isolated remnants and revegetation sites were assessed across the now highly fragmented distribution of this species using the DArTseq genomic approach. Genomic diversity was similar within all three types of remnant patches analysed, although often significantly but only slightly lower in revegetation sites compared with natural remnants. Differences in diversity between stand types varied across chromosomes. Genomic differentiation was higher between small, isolated remnants, and among revegetated sites compared with natural stands. We conclude that small remnants and revegetated sites of our E. microcarpa samples largely but not completely capture patterns in genomic diversity across the landscape. Genomic approaches provide a powerful tool for assessing restoration efforts across the landscape. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Patterns of genome size variation in snapping shrimp.
Jeffery, Nicholas W; Hultgren, Kristin; Chak, Solomon Tin Chi; Gregory, T Ryan; Rubenstein, Dustin R
2016-06-01
Although crustaceans vary extensively in genome size, little is known about how genome size may affect the ecology and evolution of species in this diverse group, in part due to the lack of large genome size datasets. Here we investigate interspecific, intraspecific, and intracolony variation in genome size in 39 species of Synalpheus shrimps, representing one of the largest genome size datasets for a single genus within crustaceans. We find that genome size ranges approximately 4-fold across Synalpheus with little phylogenetic signal, and is not related to body size. In a subset of these species, genome size is related to chromosome size, but not to chromosome number, suggesting that despite large genomes, these species are not polyploid. Interestingly, there appears to be 35% intraspecific genome size variation in Synalpheus idios among geographic regions, and up to 30% variation in Synalpheus duffyi genome size within the same colony.
WheatGenome.info: an integrated database and portal for wheat genome information.
Lai, Kaitao; Berkman, Paul J; Lorenc, Michal Tadeusz; Duran, Chris; Smits, Lars; Manoli, Sahana; Stiller, Jiri; Edwards, David
2012-02-01
Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.
Aberrant expression of long noncoding RNAs in autistic brain.
Ziats, Mark N; Rennert, Owen M
2013-03-01
The autism spectrum disorders (ASD) have a significant hereditary component, but the implicated genetic loci are heterogeneous and complex. Consequently, there is a gap in understanding how diverse genomic aberrations all result in one clinical ASD phenotype. Gene expression studies from autism brain tissue have demonstrated that aberrantly expressed protein-coding genes may converge onto common molecular pathways, potentially reconciling the strong heritability and shared clinical phenotypes with the genomic heterogeneity of the disorder. However, the regulation of gene expression is extremely complex and governed by many mechanisms, including noncoding RNAs. Yet no study in ASD brain tissue has assessed for changes in regulatory long noncoding RNAs (lncRNAs), which represent a large proportion of the human transcriptome, and actively modulate mRNA expression. To assess if aberrant expression of lncRNAs may play a role in the molecular pathogenesis of ASD, we profiled over 33,000 annotated lncRNAs and 30,000 mRNA transcripts from postmortem brain tissue of autistic and control prefrontal cortex and cerebellum by microarray. We detected over 200 differentially expressed lncRNAs in ASD, which were enriched for genomic regions containing genes related to neurodevelopment and psychiatric disease. Additionally, comparison of differences in expression of mRNAs between prefrontal cortex and cerebellum within individual donors showed ASD brains had more transcriptional homogeneity. Moreover, this was also true of the lncRNA transcriptome. Our results suggest that further investigation of lncRNA expression in autistic brain may further elucidate the molecular pathogenesis of this disorder.
Population genomics of the endangered giant Galápagos tortoise
2013-01-01
Background The giant Galápagos tortoise, Chelonoidis nigra, is a large-sized terrestrial chelonian of high patrimonial interest. The species recently colonized a small continental archipelago, the Galápagos Islands, where it has been facing novel environmental conditions and limited resource availability. To explore the genomic consequences of this ecological shift, we analyze the transcriptomic variability of five individuals of C. nigra, and compare it to similar data obtained from several continental species of turtles. Results Having clarified the timing of divergence in the Chelonoidis genus, we report in C. nigra a very low level of genetic polymorphism, signatures of a weakened efficacy of purifying selection, and an elevated mutation load in coding and regulatory sequences. These results are consistent with the hypothesis of an extremely low long-term effective population size in this insular species. Functional evolutionary analyses reveal a reduced diversity of immunity genes in C. nigra, in line with the hypothesis of attenuated pathogen diversity in islands, and an increased selective pressure on genes involved in response to stress, potentially related to the climatic instability of its environment and its elongated lifespan. Finally, we detect no population structure or homozygosity excess in our five-individual sample. Conclusions These results enlighten the molecular evolution of an endangered taxon in a stressful environment and point to island endemic species as a promising model for the study of the deleterious effects on genome evolution of a reduced long-term population size. PMID:24342523
Population genomics of the endangered giant Galápagos tortoise.
Loire, Etienne; Chiari, Ylenia; Bernard, Aurélien; Cahais, Vincent; Romiguier, Jonathan; Nabholz, Benoît; Lourenço, Joao Miguel; Galtier, Nicolas
2013-12-16
The giant Galápagos tortoise, Chelonoidis nigra, is a large-sized terrestrial chelonian of high patrimonial interest. The species recently colonized a small continental archipelago, the Galápagos Islands, where it has been facing novel environmental conditions and limited resource availability. To explore the genomic consequences of this ecological shift, we analyze the transcriptomic variability of five individuals of C. nigra, and compare it to similar data obtained from several continental species of turtles. Having clarified the timing of divergence in the Chelonoidis genus, we report in C. nigra a very low level of genetic polymorphism, signatures of a weakened efficacy of purifying selection, and an elevated mutation load in coding and regulatory sequences. These results are consistent with the hypothesis of an extremely low long-term effective population size in this insular species. Functional evolutionary analyses reveal a reduced diversity of immunity genes in C. nigra, in line with the hypothesis of attenuated pathogen diversity in islands, and an increased selective pressure on genes involved in response to stress, potentially related to the climatic instability of its environment and its elongated lifespan. Finally, we detect no population structure or homozygosity excess in our five-individual sample. These results enlighten the molecular evolution of an endangered taxon in a stressful environment and point to island endemic species as a promising model for the study of the deleterious effects on genome evolution of a reduced long-term population size.
The genome sequence of ectromelia virus Naval and Cornell isolates from outbreaks in North America.
Mavian, Carla; López-Bueno, Alberto; Bryant, Neil A; Seeger, Kathy; Quail, Michael A; Harris, David; Barrell, Bart; Alcami, Antonio
2014-08-01
Ectromelia virus (ECTV) is the causative agent of mousepox, a disease of laboratory mouse colonies and an excellent model for human smallpox. We report the genome sequence of two isolates from outbreaks in laboratory mouse colonies in the USA in 1995 and 1999: ECTV-Naval and ECTV-Cornell, respectively. The genome of ECTV-Naval and ECTV-Cornell was sequenced by the 454-Roche technology. The ECTV-Naval genome was also sequenced by the Sanger and Illumina technologies in order to evaluate these technologies for poxvirus genome sequencing. Genomic comparisons revealed that ECTV-Naval and ECTV-Cornell correspond to the same virus isolated from independent outbreaks. Both ECTV-Naval and ECTV-Cornell are extremely virulent in susceptible BALB/c mice, similar to ECTV-Moscow. This is consistent with the ECTV-Naval genome sharing 98.2% DNA sequence identity with that of ECTV-Moscow, and indicates that the genetic differences with ECTV-Moscow do not affect the virulence of ECTV-Naval in the mousepox model of footpad infection. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
2012-01-01
Background Carcass fatness is an important trait in most pig breeding programs. Following market requests, breeding plans for fresh pork consumption are usually designed to reduce carcass fat content and increase lean meat deposition. However, the Italian pig industry is mainly devoted to the production of Protected Designation of Origin dry cured hams: pigs are slaughtered at around 160 kg of live weight and the breeding goal aims at maintaining fat coverage, measured as backfat thickness to avoid excessive desiccation of the hams. This objective has shaped the genetic pool of Italian heavy pig breeds for a few decades. In this study we applied a selective genotyping approach within a population of ~ 12,000 performance tested Italian Large White pigs. Within this population, we selectively genotyped 304 pigs with extreme and divergent backfat thickness estimated breeding value by the Illumina PorcineSNP60 BeadChip and performed a genome wide association study to identify loci associated to this trait. Results We identified 4 single nucleotide polymorphisms with P≤5.0E-07 and additional 119 ones with 5.0E-07
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goodwin, Stephen; McCorison, Cassandra B.; Cavaletto, Jessica R.
Fungi in the class Dothideomycetes often live in extreme environments or have unusual physiology. One of these, the wine cellar mold Zasmidium cellare, produces thick curtains of mycelial growth in cellars with high humidity, and its ability to metabolize volatile organic compounds including alcohols, esters and formaldehyde is thought to improve air quality. It grows slowly but appears to outcompete ordinarily faster-growing species under anaerobic conditions.Whether these abilities have affected its mitochondrial genome is not known.To fill this gap, its mitochondrial genome was assembled as part of a whole- genome shotgun-sequencing project.The circular-mapping mitochondrial genome of Z. cellare, at onlymore » 23,743 bp, is the smallest yet reported for a filamentous fungus.It contains the complete set of 14 protein-coding genes seen typically in other filamentous fungi, along with genes for large and small ribosomal RNA subunits, 25 predicted tRNA genes capable of decoding all 20 amino acids, and a single open reading frame potentially coding for a protein of unknown function.The Z. cellare mitochondrial genome had genes encoded on both strands with a single change of direction, different from most other fungi but consistent with the Dothideomycetes. The high synteny among mitochondrial genomes of fungi in the Eurotiomycetes broke down almost completely in the Dothideomycetes.Only a low level of microsynteny was observed among protein-coding and tRNA genes in comparison with Mycosphaerella graminicola (synonym Zymoseptoria tritici), the only other fungus in the order Capnodiales with a sequenced mitochondrial genome, involving the three gene pairs atp8-atp9, nad2-nad3, and nad4L-nad5.However, even this low level of microsynteny did not extend to other fungi in the Dothideomycetes and Eurotiomycetes. Phylogenetic analysis of concatenated protein-coding genes confirmed the relationship between Z. cellare and M. graminicola in the Capnodiales, although conclusions were limited due to low sampling density.Other than its small size, the only unusual feature of the Z. cellare mitochondrial genome was two copies of a 110-bp sequence that were duplicated, inverted and separated by approximately 1 kb. This inverted-repeat sequence confused the assembly program but appears to have no functional significance.The small size of the Z. cellare mitochondrial genome was due to slightly smaller genes, lack of introns and non-essential genes, reduced intergenic spaces and very few ORFs relative to other fungi rather than a loss of essential genes. Whether this reduction facilitates its unusual biology remains unknown.« less
Danley, Patrick D; Mullen, Sean P; Liu, Fenglong; Nene, Vishvanath; Quackenbush, John; Shaw, Kerry L
2007-01-01
Background As the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. Results We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page Conclusion Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic resources for three distinct but overlapping fields of inquiry: neurobiology, speciation, and molecular evolution. PMID:17459168
USDA-ARS?s Scientific Manuscript database
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the compu...
Koo, Hyunmin; Ptacek, Travis; Crowley, Michael; Swain, Ashit K; Osborne, John D; Bej, Asim K; Andersen, Dale T
2014-08-07
Hymenobacter sp. IS2118, isolated from a freshwater lake in Schirmacher Oasis, Antarctica, produces extracellular polymeric substance (EPS) and manifests tolerance to cold, UV radiation (UVR), and oxidative stress. We report the 5.26-Mb draft genome of strain IS2118, which will help us to understand its adaptation and survival mechanisms in Antarctic extreme ecosystems. Copyright © 2014 Koo et al.
Evidence of Molecular Adaptation to Extreme Environments and Applicability to Space Environments
NASA Astrophysics Data System (ADS)
Filipovic, M. D.; Ognjanovic, S.; Ognjanovic, M.
2008-06-01
This is initial investigation of gene signatures responsible for adapting microscopic life to the extreme Earth environments. We present preliminary results on identification of the clusters of orthologous groups (COGs) common to several hyperthermophiles and exclusion of those common to a mesophile (non-hyperthermophile): Escherichia coli (E. coli K12), will yield a group of proteins possibly involved in adaptation to life under extreme temperatures. Comparative genome analyses represent a powerful tool in discovery of novel genes responsible for adaptation to specific extreme environments. Methanogens stand out as the only group of organisms that have species capable of growth at 0° C (Metarhizium frigidum (M.~frigidum) and Methanococcoides burtonii (M.~burtonii)) and 110° C (Methanopyrus kandleri (M.~kandleri)). Although not all the components of heat adaptation can be attributed to novel genes, the chaperones known as heat shock proteins stabilize the enzymes under elevated temperature. However, highly conserved chaperons found in bacteria and eukaryots are not present in hyperthermophilic Archea, rather, they have a unique chaperone TF55. Our aim was to use software which we specifically developed for extremophile genome comparative analyses in order to search for additional novel genes involved in hyperthermophile adaptation. The following hyperthermophile genomes incorporated in this software were used for these studies: Methanocaldococcus jannaschii (M.~jannaschii), M.~kandleri, Archaeoglobus fulgidus (A.~fulgidus) and three species of Pyrococcus. Common genes were annotated and grouped according to their roles in cellular processes where such information was available and proteins not previously implicated in the heat-adaptation of hyperthermophiles were identified. Additional experimental data are needed in order to learn more about these proteins. To address non-gene based components of thermal adaptation, all sequenced extremophiles were analysed for their GC contents and aminoacid hydrophobicity. Finally, we develop a prediction model for optimal growth temperature.
Independent test assessment using the extreme value distribution theory.
Almeida, Marcio; Blondell, Lucy; Peralta, Juan M; Kent, Jack W; Jun, Goo; Teslovich, Tanya M; Fuchsberger, Christian; Wood, Andrew R; Manning, Alisa K; Frayling, Timothy M; Cingolani, Pablo E; Sladek, Robert; Dyer, Thomas D; Abecasis, Goncalo; Duggirala, Ravindranath; Blangero, John
2016-01-01
The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14 , associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4 , encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.
Librado, Pablo; Der Sarkissian, Clio; Ermini, Luca; Schubert, Mikkel; Jónsson, Hákon; Albrechtsen, Anders; Fumagalli, Matteo; Yang, Melinda A; Gamba, Cristina; Seguin-Orlando, Andaine; Mortensen, Cecilie D; Petersen, Bent; Hoover, Cindi A; Lorente-Galdos, Belen; Nedoluzhko, Artem; Boulygina, Eugenia; Tsygankova, Svetlana; Neuditschko, Markus; Jagannathan, Vidhya; Thèves, Catherine; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Sicheritz-Ponten, Thomas; Popov, Ruslan; Grigoriev, Semyon; Alekseev, Anatoly N; Rubin, Edward M; McCue, Molly; Rieder, Stefan; Leeb, Tosso; Tikhonov, Alexei; Crubézy, Eric; Slatkin, Montgomery; Marques-Bonet, Tomas; Nielsen, Rasmus; Willerslev, Eske; Kantanen, Juha; Prokhortchouk, Egor; Orlando, Ludovic
2015-12-15
Yakutia, Sakha Republic, in the Siberian Far East, represents one of the coldest places on Earth, with winter record temperatures dropping below -70 °C. Nevertheless, Yakutian horses survive all year round in the open air due to striking phenotypic adaptations, including compact body conformations, extremely hairy winter coats, and acute seasonal differences in metabolic activities. The evolutionary origins of Yakutian horses and the genetic basis of their adaptations remain, however, contentious. Here, we present the complete genomes of nine present-day Yakutian horses and two ancient specimens dating from the early 19th century and ∼5,200 y ago. By comparing these genomes with the genomes of two Late Pleistocene, 27 domesticated, and three wild Przewalski's horses, we find that contemporary Yakutian horses do not descend from the native horses that populated the region until the mid-Holocene, but were most likely introduced following the migration of the Yakut people a few centuries ago. Thus, they represent one of the fastest cases of adaptation to the extreme temperatures of the Arctic. We find cis-regulatory mutations to have contributed more than nonsynonymous changes to their adaptation, likely due to the comparatively limited standing variation within gene bodies at the time the population was founded. Genes involved in hair development, body size, and metabolic and hormone signaling pathways represent an essential part of the Yakutian horse adaptive genetic toolkit. Finally, we find evidence for convergent evolution with native human populations and woolly mammoths, suggesting that only a few evolutionary strategies are compatible with survival in extremely cold environments.
Life in hot acid: Pathway analyses in extremely thermoacidophilic archaea
Auernik, Kathryne S.; Cooper, Charlotte R.; Kelly, Robert M.
2013-01-01
SUMMARY The extremely thermoacidophilic archaea are a particularly intriguing group of microorganisms that must simultaneously cope with biologically extreme pHs (≤ 4) and temperatures (Topt ≥ 60°C) in their natural environments. Their expandi ng biotechnological significance relates to their role in biomining of base and precious metals and their unique mechanisms of survival in hot acid, at both the cellular and biomolecular levels. Recent developments, such as advances in understanding of heavy metal tolerance mechanisms, implementation of a genetic system, and discovery of a new carbon fixation pathway, have been facilitated by availability of genome sequence data and molecular genetic systems. As a result, new insights into the metabolic pathways and physiological features that define extreme thermoacidophily have been obtained, in some cases suggesting prospects for biotechnological opportunities. PMID:18760359
Genomics Portals: integrative web-platform for mining genomics data.
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario
2010-01-13
A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.
Genomics Portals: integrative web-platform for mining genomics data
2010-01-01
Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909
mySyntenyPortal: an application package to construct websites for synteny block analysis.
Lee, Jongin; Lee, Daehwan; Sim, Mikang; Kwon, Daehong; Kim, Juyeon; Ko, Younhee; Kim, Jaebum
2018-06-05
Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users' own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user's own genome data.
Raghavan, Avanthi; Neeli, Hemanth; Jin, Weijun; Badellino, Karen O.; Demissie, Serkalem; Manning, Alisa K.; DerOhannessian, Stephanie L.; Wolfe, Megan L.; Cupples, L. Adrienne; Li, Mingyao; Kathiresan, Sekar; Rader, Daniel J.
2011-01-01
Genome-wide association studies (GWAS) have successfully identified loci associated with quantitative traits, such as blood lipids. Deep resequencing studies are being utilized to catalogue the allelic spectrum at GWAS loci. The goal of these studies is to identify causative variants and missing heritability, including heritability due to low frequency and rare alleles with large phenotypic impact. Whereas rare variant efforts have primarily focused on nonsynonymous coding variants, we hypothesized that noncoding variants in these loci are also functionally important. Using the HDL-C gene LIPG as an example, we explored the effect of regulatory variants identified through resequencing of subjects at HDL-C extremes on gene expression, protein levels, and phenotype. Resequencing a portion of the LIPG promoter and 5′ UTR in human subjects with extreme HDL-C, we identified several rare variants in individuals from both extremes. Luciferase reporter assays were used to measure the effect of these rare variants on LIPG expression. Variants conferring opposing effects on gene expression were enriched in opposite extremes of the phenotypic distribution. Minor alleles of a common regulatory haplotype and noncoding GWAS SNPs were associated with reduced plasma levels of the LIPG gene product endothelial lipase (EL), consistent with its role in HDL-C catabolism. Additionally, we found that a common nonfunctional coding variant associated with HDL-C (rs2000813) is in linkage disequilibrium with a 5′ UTR variant (rs34474737) that decreases LIPG promoter activity. We attribute the gene regulatory role of rs34474737 to the observed association of the coding variant with plasma EL levels and HDL-C. Taken together, the findings show that both rare and common noncoding regulatory variants are important contributors to the allelic spectrum in complex trait loci. PMID:22174694
Two Rounds of Whole Genome Duplication in the Ancestral Vertebrate
Dehal, Paramvir; Boore, Jeffrey L
2005-01-01
The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, and then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish–tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of four-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage. PMID:16128622
Microdiversification in genome-streamlined ubiquitous freshwater Actinobacteria.
Neuenschwander, Stefan M; Ghai, Rohit; Pernthaler, Jakob; Salcher, Michaela M
2018-01-01
Actinobacteria of the acI lineage are the most abundant microbes in freshwater systems, but there are so far no pure living cultures of these organisms, possibly because of metabolic dependencies on other microbes. This, in turn, has hampered an in-depth assessment of the genomic basis for their success in the environment. Here we present genomes from 16 axenic cultures of acI Actinobacteria. The isolates were not only of minute cell size, but also among the most streamlined free-living microbes, with extremely small genome sizes (1.2-1.4 Mbp) and low genomic GC content. Genome reduction in these bacteria might have led to auxotrophy for various vitamins, amino acids and reduced sulphur sources, thus creating dependencies to co-occurring organisms (the 'Black Queen' hypothesis). Genome analyses, moreover, revealed a surprising degree of inter- and intraspecific diversity in metabolic pathways, especially of carbohydrate transport and metabolism, and mainly encoded in genomic islands. The striking genotype microdiversification of acI Actinobacteria might explain their global success in highly dynamic freshwater environments with complex seasonal patterns of allochthonous and autochthonous carbon sources. We propose a new order within Actinobacteria ('Candidatus Nanopelagicales') with two new genera ('Candidatus Nanopelagicus' and 'Candidatus Planktophila') and nine new species.
Large-Scale Meteorological Patterns Associated with Extreme Precipitation in the US Northeast
NASA Astrophysics Data System (ADS)
Agel, L. A.; Barlow, M. A.
2016-12-01
Patterns of daily large-scale circulation associated with Northeast US extreme precipitation are identified using both k-means clustering (KMC) and Self-Organizing Maps (SOM) applied to tropopause height. Tropopause height provides a compact representation of large-scale circulation patterns, as it is linked to mid-level circulation, low-level thermal contrasts and low-level diabatic heating. Extreme precipitation is defined as the top 1% of daily wet-day observations at 35 Northeast stations, 1979-2008. KMC is applied on extreme precipitation days only, while the SOM algorithm is applied to all days in order to place the extreme results into a larger context. Six tropopause patterns are identified on extreme days: a summertime tropopause ridge, a summertime shallow trough/ridge, a summertime shallow eastern US trough, a deeper wintertime eastern US trough, and two versions of a deep cold-weather trough located across the east-central US. Thirty SOM patterns for all days are identified. Results for all days show that 6 SOM patterns account for almost half of the extreme days, although extreme precipitation occurs in all SOM patterns. The same SOM patterns associated with extreme precipitation also routinely produce non-extreme precipitation; however, on extreme precipitation days the troughs, on average, are deeper and the downstream ridges more pronounced. Analysis of other fields associated with the large-scale patterns show various degrees of anomalously strong upward motion during, and moisture transport preceding, extreme precipitation events.
The Peculiar Landscape of Repetitive Sequences in the Olive (Olea europaea L.) Genome
Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea
2014-01-01
Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome. PMID:24671744
The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.
Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea
2014-04-01
Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.
EUPAN enables pan-genome studies of a large number of eukaryotic genomes.
Hu, Zhiqiang; Sun, Chen; Lu, Kuang-Chen; Chu, Xixia; Zhao, Yue; Lu, Jinyuan; Shi, Jianxin; Wei, Chaochun
2017-08-01
Pan-genome analyses are routinely carried out for bacteria to interpret the within-species gene presence/absence variations (PAVs). However, pan-genome analyses are rare for eukaryotes due to the large sizes and higher complexities of their genomes. Here we proposed EUPAN, a eukaryotic pan-genome analysis toolkit, enabling automatic large-scale eukaryotic pan-genome analyses and detection of gene PAVs at a relatively low sequencing depth. In the previous studies, we demonstrated the effectiveness and high accuracy of EUPAN in the pan-genome analysis of 453 rice genomes, in which we also revealed widespread gene PAVs among individual rice genomes. Moreover, EUPAN can be directly applied to the current re-sequencing projects primarily focusing on single nucleotide polymorphisms. EUPAN is implemented in Perl, R and C ++. It is supported under Linux and preferred for a computer cluster with LSF and SLURM job scheduling system. EUPAN together with its standard operating procedure (SOP) is freely available for non-commercial use (CC BY-NC 4.0) at http://cgm.sjtu.edu.cn/eupan/index.html . ccwei@sjtu.edu.cn or jianxin.shi@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan
2014-01-01
To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel andmore » fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.« less
A parts list for fungal cellulosomes revealed by comparative genomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haitjema, Charles H.; Gilmore, Sean P.; Henske, John K.
Cellulosomes are large, multi-protein complexes that tether plant biomass degrading enzymes together for improved hydrolysis1. These complexes were first described in anaerobic bacteria where species specific dockerin domains mediate assembly of enzymes onto complementary cohesin motifs interspersed within non-catalytic protein scaffolds1. The versatile protein assembly mechanism conferred by the bacterial cohesin-dockerin interaction is now a standard design principle for synthetic protein-scale pathways2,3. For decades, analogous structures have been reported in the early branching anaerobic fungi, which are known to assemble by sequence divergent non-catalytic dockerin domains (NCDD)4. However, the enzyme components, modular assembly mechanism, and functional role of fungal cellulosomesmore » remain unknown5,6. Here, we describe the comprehensive set of proteins critical to fungal cellulosome assembly, including novel, conserved scaffolding proteins unique to the Neocallimastigomycota. High quality genomes of the anaerobic fungi Anaeromyces robustus, Neocallimastix californiae and Piromyces finnis were assembled with long-read, single molecule technology to overcome their repeat-richness and extremely low GC content. Genomic analysis coupled with proteomic validation revealed an average 320 NCDD-containing proteins per fungal strain that were overwhelmingly carbohydrate active enzymes (CAZymes), with 95 large fungal scaffoldins identified across 4 genera that contain a conserved amino acid sequence repeat that binds to NCDDs. Fungal dockerin and scaffoldin domains have no similarity to their bacterial counterparts, yet several catalytic domains originated via horizontal gene transfer with gut bacteria. Though many catalytic domains are shared with bacteria, the biocatalytic activity of anaerobic fungi is expanded by the inclusion of GH3, GH6, and GH45 enzymes in the enzyme complexes. Collectively, these findings suggest that the fungal cellulosome is an evolutionarily chimeric structure – an independently evolved fungal complex that co-opted useful activities from bacterial neighbors within the gut microbiome.« less
Zerbini, Francesca; Zanella, Ilaria; Fraccascia, Davide; König, Enrico; Irene, Carmela; Frattini, Luca F; Tomasi, Michele; Fantappiè, Laura; Ganfini, Luisa; Caproni, Elena; Parri, Matteo; Grandi, Alberto; Grandi, Guido
2017-04-24
The exploitation of the CRISPR/Cas9 machinery coupled to lambda (λ) recombinase-mediated homologous recombination (recombineering) is becoming the method of choice for genome editing in E. coli. First proposed by Jiang and co-workers, the strategy has been subsequently fine-tuned by several authors who demonstrated, by using few selected loci, that the efficiency of mutagenesis (number of mutant colonies over total number of colonies analyzed) can be extremely high (up to 100%). However, from published data it is difficult to appreciate the robustness of the technology, defined as the number of successfully mutated loci over the total number of targeted loci. This information is particularly relevant in high-throughput genome editing, where repetition of experiments to rescue missing mutants would be impractical. This work describes a "brute force" validation activity, which culminated in the definition of a robust, simple and rapid protocol for single or multiple gene deletions. We first set up our own version of the CRISPR/Cas9 protocol and then we evaluated the mutagenesis efficiency by changing different parameters including sequence of guide RNAs, length and concentration of donor DNAs, and use of single stranded and double stranded donor DNAs. We then validated the optimized conditions targeting 78 "dispensable" genes. This work led to the definition of a protocol, featuring the use of double stranded synthetic donor DNAs, which guarantees mutagenesis efficiencies consistently higher than 10% and a robustness of 100%. The procedure can be applied also for simultaneous gene deletions. This work defines for the first time the robustness of a CRISPR/Cas9-based protocol based on a large sample size. Since the technical solutions here proposed can be applied to other similar procedures, the data could be of general interest for the scientific community working on bacterial genome editing and, in particular, for those involved in synthetic biology projects requiring high throughput procedures.
Navigating yeast genome maintenance with functional genomics.
Measday, Vivien; Stirling, Peter C
2016-03-01
Maintenance of genome integrity is a fundamental requirement of all organisms. To address this, organisms have evolved extremely faithful modes of replication, DNA repair and chromosome segregation to combat the deleterious effects of an unstable genome. Nonetheless, a small amount of genome instability is the driver of evolutionary change and adaptation, and thus a low level of instability is permitted in populations. While defects in genome maintenance almost invariably reduce fitness in the short term, they can create an environment where beneficial mutations are more likely to occur. The importance of this fact is clearest in the development of human cancer, where genome instability is a well-established enabling characteristic of carcinogenesis. This raises the crucial question: what are the cellular pathways that promote genome maintenance and what are their mechanisms? Work in model organisms, in particular the yeast Saccharomyces cerevisiae, has provided the global foundations of genome maintenance mechanisms in eukaryotes. The development of pioneering genomic tools inS. cerevisiae, such as the systematic creation of mutants in all nonessential and essential genes, has enabled whole-genome approaches to identifying genes with roles in genome maintenance. Here, we review the extensive whole-genome approaches taken in yeast, with an emphasis on functional genomic screens, to understand the genetic basis of genome instability, highlighting a range of genetic and cytological screening modalities. By revealing the biological pathways and processes regulating genome integrity, these analyses contribute to the systems-level map of the yeast cell and inform studies of human disease, especially cancer. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Evolution of precipitation extremes in two large ensembles of climate simulations
NASA Astrophysics Data System (ADS)
Martel, Jean-Luc; Mailhot, Alain; Talbot, Guillaume; Brissette, François; Ludwig, Ralf; Frigon, Anne; Leduc, Martin; Turcotte, Richard
2017-04-01
Recent studies project significant changes in the future distribution of precipitation extremes due to global warming. It is likely that extreme precipitation intensity will increase in a future climate and that extreme events will be more frequent. In this work, annual maxima daily precipitation series from the Canadian Earth System Model (CanESM2) 50-member large ensemble (spatial resolution of 2.8°x2.8°) and the Community Earth System Model (CESM1) 40-member large ensemble (spatial resolution of 1°x1°) are used to investigate extreme precipitation over the historical (1980-2010) and future (2070-2100) periods. The use of these ensembles results in respectively 1 500 (30 years x 50 members) and 1200 (30 years x 40 members) simulated years over both the historical and future periods. These large datasets allow the computation of empirical daily extreme precipitation quantiles for large return periods. Using the CanESM2 and CESM1 large ensembles, extreme daily precipitation with return periods ranging from 2 to 100 years are computed in historical and future periods to assess the impact of climate change. Results indicate that daily precipitation extremes generally increase in the future over most land grid points and that these increases will also impact the 100-year extreme daily precipitation. Considering that many public infrastructures have lifespans exceeding 75 years, the increase in extremes has important implications on service levels of water infrastructures and public safety. Estimated increases in precipitation associated to very extreme precipitation events (e.g. 100 years) will drastically change the likelihood of flooding and their extent in future climate. These results, although interesting, need to be extended to sub-daily durations, relevant for urban flooding protection and urban infrastructure design (e.g. sewer networks, culverts). Models and simulations at finer spatial and temporal resolution are therefore needed.
Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping
2007-10-24
Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic issues. Although the whole mitochondrial DNA sequence based phylogeny is robust, it remains in conflict with phylogenetic relationships suggested by analysis of limited nuclear-encoded data, a situation that will require gathering more nuclear DNA sequence information.
Rusch, Douglas B; Halpern, Aaron L; Sutton, Granger; Heidelberg, Karla B; Williamson, Shannon; Yooseph, Shibu; Wu, Dongying; Eisen, Jonathan A; Hoffman, Jeff M; Remington, Karin; Beeson, Karen; Tran, Bao; Smith, Hamilton; Baden-Tillson, Holly; Stewart, Clare; Thorpe, Joyce; Freeman, Jason; Andrews-Pfannkoch, Cynthia; Venter, Joseph E; Li, Kelvin; Kravitz, Saul; Heidelberg, John F; Utterback, Terry; Rogers, Yu-Hui; Falcón, Luisa I; Souza, Valeria; Bonilla-Rosso, Germán; Eguiarte, Luis E; Karl, David M; Sathyendranath, Shubha; Platt, Trevor; Bermingham, Eldredge; Gallardo, Victor; Tamayo-Castillo, Giselle; Ferrari, Michael R; Strausberg, Robert L; Nealson, Kenneth; Friedman, Robert; Frazier, Marvin; Venter, J. Craig
2007-01-01
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS. PMID:17355176
Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping
2007-01-01
Background Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. Results This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Conclusion Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic issues. Although the whole mitochondrial DNA sequence based phylogeny is robust, it remains in conflict with phylogenetic relationships suggested by analysis of limited nuclear-encoded data, a situation that will require gathering more nuclear DNA sequence information. PMID:17956639
Chechetkin, V R; Lobzin, V V
2017-08-07
Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.
Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H.; Hansen, Mark S. T.; Lawley, Cindy T.; Karlsson, Elinor K.; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Åke; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T.
2011-01-01
The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease. PMID:22022279
Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H; Hansen, Mark S T; Lawley, Cindy T; Karlsson, Elinor K; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Ake; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T
2011-10-01
The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus
Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R.; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D.; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark
2017-01-01
Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda. PMID:28749982
Zueva, Ksenia J; Lumme, Jaakko; Veselov, Alexey E; Kent, Matthew P; Primmer, Craig R
2018-06-01
Understanding the genomic basis of host-parasite adaptation is important for predicting the long-term viability of species and developing successful management practices. However, in wild populations, identifying specific signatures of parasite-driven selection often presents a challenge, as it is difficult to unravel the molecular signatures of selection driven by different, but correlated, environmental factors. Furthermore, separating parasite-mediated selection from similar signatures due to genetic drift and population history can also be difficult. Populations of Atlantic salmon (Salmo salar L.) from northern Europe have pronounced differences in their reactions to the parasitic flatworm Gyrodactylus salaris Malmberg 1957 and are therefore a good model to search for specific genomic regions underlying inter-population differences in pathogen response. We used a dense Atlantic salmon SNP array, along with extensive sampling of 43 salmon populations representing the two G. salaris response extremes (extreme susceptibility vs resistant), to screen the salmon genome for signatures of directional selection while attempting to separate the parasite effect from other factors. After combining the results from two independent genome scan analyses, 57 candidate genes potentially under positive selection were identified, out of which 50 were functionally annotated. This candidate gene set was shown to be functionally enriched for lymph node development, focal adhesion genes and anti-viral response, which suggests that the regulation of both innate and acquired immunity might be an important mechanism for salmon response to G. salaris. Overall, our results offer insights into the apparently complex genetic basis of pathogen susceptibility in salmon and highlight methodological challenges for separating the effects of various environmental factors. Copyright © 2018 Elsevier B.V. All rights reserved.
Molecular mechanisms of phenotypic plasticity in social insects
USDA-ARS?s Scientific Manuscript database
Polyphenism in insects, whereby a single genome expresses different phenotypes in response to environmental cues, is a fascinating biological phenomenon. Social insects are especially intriguing examples of phenotypic plasticity because division of labor results in the development of extreme morphol...
This proposal develops scalable R / Bioconductor software infrastructure and data resources to integrate complex, heterogeneous, and large cancer genomic experiments. The falling cost of genomic assays facilitates collection of multiple data types (e.g., gene and transcript expression, structural variation, copy number, methylation, and microRNA data) from a set of clinical specimens. Furthermore, substantial resources are now available from large consortium activities like The Cancer Genome Atlas (TCGA).
Lessons learnt on the analysis of large sequence data in animal genomics.
Biscarini, F; Cozzi, P; Orozco-Ter Wengel, P
2018-04-06
The 'omics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human 'omics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by next-generation sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to large-scale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awry-the software may crash or stop-and it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets. © 2018 Stichting International Foundation for Animal Genetics.
NASA Astrophysics Data System (ADS)
Martel, J. L.; Brissette, F.; Mailhot, A.; Wood, R. R.; Ludwig, R.; Frigon, A.; Leduc, M.; Turcotte, R.
2017-12-01
Recent studies indicate that the frequency and intensity of extreme precipitation will increase in future climate due to global warming. In this study, we compare annual maxima precipitation series from three large ensembles of climate simulations at various spatial and temporal resolutions. The first two are at the global scale: the Canadian Earth System Model (CanESM2) 50-member large ensemble (CanESM2-LE) at a 2.8° resolution and the Community Earth System Model (CESM1) 40-member large ensemble (CESM1-LE) at a 1° resolution. The third ensemble is at the regional scale over both Eastern North America and Europe: the Canadian Regional Climate Model (CRCM5) 50-member large ensemble (CRCM5-LE) at a 0.11° resolution, driven at its boundaries by the CanESM-LE. The CRCM5-LE is a new ensemble issued from the ClimEx project (http://www.climex-project.org), a Québec-Bavaria collaboration. Using these three large ensembles, change in extreme precipitations over the historical (1980-2010) and future (2070-2100) periods are investigated. This results in 1 500 (30 years x 50 members for CanESM2-LE and CRCM5-LE) and 1200 (30 years x 40 members for CESM1-LE) simulated years over both the historical and future periods. Using these large datasets, the empirical daily (and sub-daily for CRCM5-LE) extreme precipitation quantiles for large return periods ranging from 2 to 100 years are computed. Results indicate that daily extreme precipitations generally will increase over most land grid points of both domains according to the three large ensembles. Regarding the CRCM5-LE, the increase in sub-daily extreme precipitations will be even more important than the one observed for daily extreme precipitations. Considering that many public infrastructures have lifespans exceeding 75 years, the increase in extremes has important implications on service levels of water infrastructures and public safety.
Huang, Zhenzhen; Duan, Huilong; Li, Haomin
2015-01-01
Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform.
CNV discovery for milk composition traits in dairy cattle using whole genome resequencing.
Gao, Yahui; Jiang, Jianping; Yang, Shaohua; Hou, Yali; Liu, George E; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao
2017-03-29
Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2-11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits.
Extreme-value dependence: An application to exchange rate markets
NASA Astrophysics Data System (ADS)
Fernandez, Viviana
2007-04-01
Extreme value theory (EVT) focuses on modeling the tail behavior of a loss distribution using only extreme values rather than the whole data set. For a sample of 10 countries with dirty/free float regimes, we investigate whether paired currencies exhibit a pattern of asymptotic dependence. That is, whether an extremely large appreciation or depreciation in the nominal exchange rate of one country might transmit to another. In general, after controlling for volatility clustering and inertia in returns, we do not find evidence of extreme-value dependence between paired exchange rates. However, for asymptotic-independent paired returns, we find that tail dependency of exchange rates is stronger under large appreciations than under large depreciations.
Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen
2015-01-01
The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952
Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F
2015-04-28
The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.
Genome data from a sixteenth century pig illuminate modern breed relationships
Ramírez, O; Burgos-Paz, W; Casas, E; Ballester, M; Bianco, E; Olalde, I; Santpere, G; Novella, V; Gut, M; Lalueza-Fox, C; Saña, M; Pérez-Enciso, M
2015-01-01
Ancient DNA (aDNA) provides direct evidence of historical events that have modeled the genome of modern individuals. In livestock, resolving the differences between the effects of initial domestication and of subsequent modern breeding is not straight forward without aDNA data. Here, we have obtained shotgun genome sequence data from a sixteenth century pig from Northeastern Spain (Montsoriu castle), the ancient pig was obtained from an extremely well-preserved and diverse assemblage. In addition, we provide the sequence of three new modern genomes from an Iberian pig, Spanish wild boar and a Guatemalan Creole pig. Comparison with both mitochondrial and autosomal genome data shows that the ancient pig is closely related to extant Iberian pigs and to European wild boar. Although the ancient sample was clearly domestic, admixture with wild boar also occurred, according to the D-statistics. The close relationship between Iberian, European wild boar and the ancient pig confirms that Asian introgression in modern Iberian pigs has not existed or has been negligible. In contrast, the Guatemalan Creole pig clusters apart from the Iberian pig genome, likely due to introgression from international breeds. PMID:25204303
Repar, Jelena; Warnecke, Tobias
2017-08-01
Inversions are a major contributor to structural genome evolution in prokaryotes. Here, using a novel alignment-based method, we systematically compare 1,651 bacterial and 98 archaeal genomes to show that inversion landscapes are frequently biased toward (symmetric) inversions around the origin-terminus axis. However, symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades. At the extremes, inversion landscapes in Bacillus-Clostridium and Actinobacteria are dominated by symmetric inversions, while there is little or no systematic bias favoring symmetric rearrangements in archaea with a single origin of replication. Within clades, we find strong but clade-specific relationships between symmetric inversion bias and different features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand. We suggest that heterogeneous selection pressures have converged to produce similar patterns of structural genome evolution across prokaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The Nature and Evolution of Genomic Diversity in the Mycobacterium tuberculosis Complex.
Brites, Daniela; Gagneux, Sebastien
2017-01-01
The Mycobacterium tuberculosis Complex (MTBC) consists of a clonal group of several mycobacterial lineages pathogenic to a range of different mammalian hosts. In this chapter, we discuss the origins and the evolutionary forces shaping the genomic diversity of the human-adapted MTBC. Advances in whole-genome sequencing have brought invaluable insights into the macro-evolution of the MTBC, and the biogeographical distribution of the different MTBC lineages, the phylogenetic relationships between these lineages. Moreover, micro-evolutionary processes start to be better understood, including those influencing bacterial mutation rates and those governing the fate of new mutations emerging within patients during treatment. Current genomic and epidemiological evidence reflect the fact that, through ecological specialization, the MTBC affecting humans became an obligate and extremely well-adapted human pathogen. Identifying the adaptive traits of human-adapted MTBC and unraveling the bacterial loci that interact with human genomic variation might help identify new targets for developing better vaccines and designing more effective treatments.
Juhas, Mario; Dimopoulou, Ioanna; Robinson, Esther; Elamin, Abdel; Harding, Rosalind; Hood, Derek; Crook, Derrick
2013-09-01
A significant part of horizontal gene transfer is facilitated by genomic islands. Haemophilus influenzae genomic island ICEHin1056 is an archetype of a genomic island that accounts for pandemic spread of antibiotics resistance. ICEHin1056 has modular structure and harbors modules involved in type IV secretion and integration. Previous studies have shown that ICEHin1056 encodes a functional type IV secretion system; however, other modules have not been characterized yet. Here we show that the module on the 5' extremity of ICEHin1056 consists of 15 genes that are well conserved in a number of related genomic islands. Furthermore by disrupting six genes of the investigated module of ICEHin1056 by site-specific mutagenesis we demonstrate that in addition to type IV secretion system module, the investigated module is also important for the successful conjugal transfer of ICEHin1056 from donor to recipient cells. Copyright © 2013 Elsevier Inc. All rights reserved.
[Genome similarity of Baikal omul and sig].
Bychenko, O S; Sukhanova, L V; Ukolova, S S; Skvortsov, T A; Potapov, V K; Azhikina, T L; Sverdlov, E D
2009-01-01
Two members of the Baikal sig family, a lake sig (Coregonus lavaretus baicalensis Dybovsky) and omul (C. autumnalis migratorius Georgi), are close relatives that diverged from the same ancestor 10-20 thousand years ago. In this work, we studied genomic polymorphism of these two fish species. The method of subtraction hybridization (SH) did not reveal the presence of extended sequences in the sig genome and their absence in the omul genome. All the fragments found by SH corresponded to polymorphous noncoding genome regions varying in mononucleotide substitutions and short deletions. Many of them are mapped close to genes of the immune system and have regions identical to the Tc-1-like transposons abundant among fish, whose transcription activity may affect the expression of adjacent genes. Thus, we showed for the first time that genetic differences between Baikal sig family members are extremely small and cannot be revealed by the SH method. This is another endorsement of the hypothesis on the close relationship between Baikal sig and omul and their evolutionarily recent divergence from a common ancestor.
2011-01-01
Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
Jungreuthmayer, Christian; Ruckerbauer, David E.; Gerstl, Matthias P.; Hanscho, Michael; Zanghellini, Jürgen
2015-01-01
Despite the significant progress made in recent years, the computation of the complete set of elementary flux modes of large or even genome-scale metabolic networks is still impossible. We introduce a novel approach to speed up the calculation of elementary flux modes by including transcriptional regulatory information into the analysis of metabolic networks. Taking into account gene regulation dramatically reduces the solution space and allows the presented algorithm to constantly eliminate biologically infeasible modes at an early stage of the computation procedure. Thereby, computational costs, such as runtime, memory usage, and disk space, are extremely reduced. Moreover, we show that the application of transcriptional rules identifies non-trivial system-wide effects on metabolism. Using the presented algorithm pushes the size of metabolic networks that can be studied by elementary flux modes to new and much higher limits without the loss of predictive quality. This makes unbiased, system-wide predictions in large scale metabolic networks possible without resorting to any optimization principle. PMID:26091045
GenomeDiagram: a python package for the visualization of large-scale genomic data.
Pritchard, Leighton; White, Jennifer A; Birch, Paul R J; Toth, Ian K
2006-03-01
We present GenomeDiagram, a flexible, open-source Python module for the visualization of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence. GenomeDiagram may be used to generate publication-quality vector graphics, rastered images and in-line streamed graphics for webpages. The package integrates with datatypes from the BioPython project, and is available for Windows, Linux and Mac OS X systems. GenomeDiagram is freely available as source code (under GNU Public License) at http://bioinf.scri.ac.uk/lp/programs.html, and requires Python 2.3 or higher, and recent versions of the ReportLab and BioPython packages. A user manual, example code and images are available at http://bioinf.scri.ac.uk/lp/programs.html.
Ninomiya, M; Takahashi, M; Shimosegawa, T; Okamoto, H
2007-01-01
Recently, we identified a novel human virus with a circular DNA genome of 3.2 kb, tentatively designated as torque teno midi virus (TTMDV), with a genomic organization resembling those of torque teno virus (TTV) of 3.8-3.9 kb and torque teno mini virus (TTMV) of 2.8-2.9 kb. To investigate the extent of genomic variability of TTMDV genomes, the full-length sequence was determined for 15 TTMDV isolates obtained from viremic individuals in Japan. The 15 TTMDV isolates comprised 3175-3230 bases and shared 67.0-90.3% identities with each other, and were only 68.4-73.0% identical to the 3 reported TTMDV isolates over the entire genome. TTMDV possessed a genomic organization with four open reading frames (ORF1-ORF4) with characteristic sequence motifs and stem and loop structures with high GC content, similar to TTV and TTMV. The total of 18 TTMDV genomes differed by up to 60.7% from each other in the amino acid sequence of ORF1 (658-677 amino acids), but segregated phylogenetically into the same cluster, which was distantly related to the TTVs and TTMVs. These results indicate that TTMDV with a circular DNA genome of 3.2 kb, has an extremely high degree of genomic variability, and is classifiable into a third group in the genus Anellovirus.
Simultaneous non-contiguous deletions using large synthetic DNA and site-specific recombinases
Krishnakumar, Radha; Grose, Carissa; Haft, Daniel H.; Zaveri, Jayshree; Alperovich, Nina; Gibson, Daniel G.; Merryman, Chuck; Glass, John I.
2014-01-01
Toward achieving rapid and large scale genome modification directly in a target organism, we have developed a new genome engineering strategy that uses a combination of bioinformatics aided design, large synthetic DNA and site-specific recombinases. Using Cre recombinase we swapped a target 126-kb segment of the Escherichia coli genome with a 72-kb synthetic DNA cassette, thereby effectively eliminating over 54 kb of genomic DNA from three non-contiguous regions in a single recombination event. We observed complete replacement of the native sequence with the modified synthetic sequence through the action of the Cre recombinase and no competition from homologous recombination. Because of the versatility and high-efficiency of the Cre-lox system, this method can be used in any organism where this system is functional as well as adapted to use with other highly precise genome engineering systems. Compared to present-day iterative approaches in genome engineering, we anticipate this method will greatly speed up the creation of reduced, modularized and optimized genomes through the integration of deletion analyses data, transcriptomics, synthetic biology and site-specific recombination. PMID:24914053
New Implications on Genomic Adaptation Derived from the Helicobacter pylori Genome Comparison
Lara-Ramírez, Edgar Eduardo; Segura-Cabrera, Aldo; Guo, Xianwu; Yu, Gongxin; García-Pérez, Carlos Armando; Rodríguez-Pérez, Mario A.
2011-01-01
Background Helicobacter pylori has a reduced genome and lives in a tough environment for long-term persistence. It evolved with its particular characteristics for biological adaptation. Because several H. pylori genome sequences are available, comparative analysis could help to better understand genomic adaptation of this particular bacterium. Principal Findings We analyzed nine H. pylori genomes with emphasis on microevolution from a different perspective. Inversion was an important factor to shape the genome structure. Illegitimate recombination not only led to genomic inversion but also inverted fragment duplication, both of which contributed to the creation of new genes and gene family, and further, homological recombination contributed to events of inversion. Based on the information of genomic rearrangement, the first genome scaffold structure of H. pylori last common ancestor was produced. The core genome consists of 1186 genes, of which 22 genes could particularly adapt to human stomach niche. H. pylori contains high proportion of pseudogenes whose genesis was principally caused by homopolynucleotide (HPN) mutations. Such mutations are reversible and facilitate the control of gene expression through the change of DNA structure. The reversible mutations and a quasi-panmictic feature could allow such genes or gene fragments frequently transferred within or between populations. Hence, pseudogenes could be a reservoir of adaptation materials and the HPN mutations could be favorable to H. pylori adaptation, leading to HPN accumulation on the genomes, which corresponds to a special feature of Helicobacter species: extremely high HPN composition of genome. Conclusion Our research demonstrated that both genome content and structure of H. pylori have been highly adapted to its particular life style. PMID:21387011
Shi, Yan; Chu, Qing; Wei, Dan-Dan; Qiu, Yuan-Jian; Shang, Feng; Dou, Wei; Wang, Jin-Jun
2016-01-01
Bilateral animals are featured by an extremely compact mitochondrial (mt) genome with 37 genes on a single circular chromosome. To date, the complete mt genome has only been determined for four species of Liposcelis, a genus with economic importance, including L. entomophila, L. decolor, L. bostrychophila, and L. paeta. They belong to A, B, or D group of Liposcelis, respectively. Unlike most bilateral animals, L. bostrychophila, L. entomophila and L. paeta have a bitipartite mt genome with genes on two chromosomes. However, the mt genome of L. decolor has the typical mt chromosome of bilateral animals. Here, we sequenced the mt genome of L. sculptilis, and identified 35 genes, which were on a single chromosome. The mt genome fragmentation is not shared by the D group of Liposcelis and the single chromosome of L. sculptilis differed from those of booklice known in gene content and gene arrangement. We inferred that different evolutionary patterns and rate existed in Liposcelis. Further, we reconstructed the evolutionary history of 21 psocodean taxa with phylogenetic analyses, which suggested that Liposcelididae and Phthiraptera have evolved 134 Ma and the sucking lice diversified in the Late Cretaceous. PMID:27470659
Data Mining of Extremely Large Ad Hoc Data Sets to Produce Inverted Indices
2016-06-01
NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS Approved for public release; distribution is unlimited DATA MINING OF...COVERED Master’s Thesis 4. TITLE AND SUBTITLE DATA MINING OF EXTREMELY LARGE AD HOC DATA SETS TO PRODUCE INVERTED INDICES 5. FUNDING NUMBERS 6...INTENTIONALLY LEFT BLANK iii Approved for public release; distribution is unlimited DATA MINING OF EXTREMELY LARGE AD HOC DATA SETS TO PRODUCE
Daverdin, Guillaume; Rouxel, Thierry; Gout, Lilian; Aubertot, Jean-Noël; Fudal, Isabelle; Meyer, Michel; Parlange, Francis; Carpezat, Julien; Balesdent, Marie-Hélène
2012-01-01
Modern agriculture favours the selection and spread of novel plant diseases. Furthermore, crop genetic resistance against pathogens is often rendered ineffective within a few years of its commercial deployment. Leptosphaeria maculans, the cause of phoma stem canker of oilseed rape, develops gene-for-gene interactions with its host plant, and has a high evolutionary potential to render ineffective novel sources of resistance in crops. Here, we established a four-year field experiment to monitor the evolution of populations confronted with the newly released Rlm7 resistance and to investigate the nature of the mutations responsible for virulence against Rlm7. A total of 2551 fungal isolates were collected from experimental crops of a Rlm7 cultivar or a cultivar without Rlm7. All isolates were phenotyped for virulence and a subset was genotyped with neutral genetic markers. Virulent isolates were investigated for molecular events at the AvrLm4-7 locus. Whilst virulent isolates were not found in neighbouring crops, their frequency had reached 36% in the experimental field after four years. An extreme diversity of independent molecular events leading to virulence was identified in populations, with large-scale Repeat Induced Point mutations or complete deletion of AvrLm4-7 being the most frequent. Our data suggest that increased mutability of fungal genes involved in the interactions with plants is directly related to their genomic environment and reproductive system. Thus, rapid allelic diversification of avirulence genes can be generated in L. maculans populations in a single field provided that large population sizes and sexual reproduction are favoured by agricultural practices. PMID:23144620
mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus.
Legendre, Matthieu; Audic, Stéphane; Poirot, Olivier; Hingamp, Pascal; Seltzer, Virginie; Byrne, Deborah; Lartigue, Audrey; Lescot, Magali; Bernadac, Alain; Poulain, Julie; Abergel, Chantal; Claverie, Jean-Michel
2010-05-01
Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic "virion factory," we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5' and 3' extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus "virophage." These results-validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)--will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system.
Functional metagenomics to decipher food-microbe-host crosstalk.
Larraufie, Pierre; de Wouters, Tomas; Potocki-Veronese, Gabrielle; Blottière, Hervé M; Doré, Joël
2015-02-01
The recent developments of metagenomics permit an extremely high-resolution molecular scan of the intestinal microbiota giving new insights and opening perspectives for clinical applications. Beyond the unprecedented vision of the intestinal microbiota given by large-scale quantitative metagenomics studies, such as the EU MetaHIT project, functional metagenomics tools allow the exploration of fine interactions between food constituents, microbiota and host, leading to the identification of signals and intimate mechanisms of crosstalk, especially between bacteria and human cells. Cloning of large genome fragments, either from complex intestinal communities or from selected bacteria, allows the screening of these biological resources for bioactivity towards complex plant polymers or functional food such as prebiotics. This permitted identification of novel carbohydrate-active enzyme families involved in dietary fibre and host glycan breakdown, and highlighted unsuspected bacterial players at the top of the intestinal microbial food chain. Similarly, exposure of fractions from genomic and metagenomic clones onto human cells engineered with reporter systems to track modulation of immune response, cell proliferation or cell metabolism has allowed the identification of bioactive clones modulating key cell signalling pathways or the induction of specific genes. This opens the possibility to decipher mechanisms by which commensal bacteria or candidate probiotics can modulate the activity of cells in the intestinal epithelium or even in distal organs such as the liver, adipose tissue or the brain. Hence, in spite of our inability to culture many of the dominant microbes of the human intestine, functional metagenomics open a new window for the exploration of food-microbe-host crosstalk.
Identification of copy number variants in whole-genome data using Reference Coverage Profiles
Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E.; Stittrich, Anna B.; Ament, Seth A.; Roach, Jared C.; Brunkow, Mary E.; Bodian, Dale L.; Vockley, Joseph G.; Shmulevich, Ilya; Niederhuber, John E.; Hood, Leroy
2015-01-01
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365
Cloud computing for genomic data analysis and collaboration.
Langmead, Ben; Nellore, Abhinav
2018-04-01
Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.
Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.
Pharmacogenomic agreement between two cancer cell line data sets.
2015-12-03
Large cancer cell line collections broadly capture the genomic diversity of human cancers and provide valuable insight into anti-cancer drug response. Here we show substantial agreement and biological consilience between drug sensitivity measurements and their associated genomic predictors from two publicly available large-scale pharmacogenomics resources: The Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer databases.
Bergsveinson, Jordyn; Ziola, Barry
2017-12-01
Beer-spoilage-related lactic acid bacteria (BSR LAB) belong to multiple genera and species; however, beer-spoilage capacity is isolate-specific and partially acquired via horizontal gene transfer within the brewing environment. Thus, the extent to which genus-, species-, or environment- (i.e., brewery-) level genetic variability influences beer-spoilage phenotype is unknown. Publicly available Lactobacillus brevis genomes were analyzed via BlAst Diagnostic Gene findEr (BADGE) for BSR genes and assessed for pangenomic relationships. Also analyzed were functional coding capacities of plasmids of LAB inhabiting extreme niche environments. Considerable genetic variation was observed in L. brevis isolated from clinical samples, whereas 16 candidate genes distinguish BSR and non-BSR L. brevis genomes. These genes are related to nutrient scavenging of gluconate or pentoses, mannose, and metabolism of pectin. BSR L. brevis isolates also have higher average nucleotide identity and stronger pangenome association with one another, though isolation source (i.e., specific brewery) also appears to influence the plasmid coding capacity of BSR LAB. Finally, it is shown that niche-specific adaptation and phenotype are plasmid-encoded for both BSR and non-BSR LAB. The ultimate combination of plasmid-encoded genes dictates the ability of L. brevis to survive in the most extreme beer environment, namely, gassed (i.e., pressurized) beer.
Gorkhali, Neena Amatya; Dong, Kunzhe; Yang, Min; Song, Shen; Kader, Adiljian; Shrestha, Bhola Shankar; He, Xiaohong; Zhao, Qianjun; Pu, Yabin; Li, Xiangchen; Kijas, James; Guan, Weijun; Han, Jianlin; Jiang, Lin; Ma, Yuehui
2016-07-22
Sheep has successfully adapted to the extreme high-altitude Himalayan region. To identify genes underlying such adaptation, we genotyped genome-wide single nucleotide polymorphisms (SNPs) of four major sheep breeds living at different altitudes in Nepal and downloaded SNP array data from additional Asian and Middle East breeds. Using a di value-based genomic comparison between four high-altitude and eight lowland Asian breeds, we discovered the most differentiated variants at the locus of FGF-7 (Keratinocyte growth factor-7), which was previously reported as a good protective candidate for pulmonary injuries. We further found a SNP upstream of FGF-7 that appears to contribute to the divergence signature. First, the SNP occurred at an extremely conserved site. Second, the SNP showed an increasing allele frequency with the elevated altitude in Nepalese sheep. Third, the electrophoretic mobility shift assays (EMSA) analysis using human lung cancer cells revealed the allele-specific DNA-protein interactions. We thus hypothesized that FGF-7 gene potentially enhances lung function by regulating its expression level in high-altitude sheep through altering its binding of specific transcription factors. Especially, FGF-7 gene was not implicated in previous studies of other high-altitude species, suggesting a potential novel adaptive mechanism to high altitude in sheep at the Himalayas.
Librado, Pablo; Der Sarkissian, Clio; Ermini, Luca; Jónsson, Hákon; Albrechtsen, Anders; Fumagalli, Matteo; Yang, Melinda A.; Gamba, Cristina; Seguin-Orlando, Andaine; Mortensen, Cecilie D.; Petersen, Bent; Hoover, Cindi A.; Lorente-Galdos, Belen; Nedoluzhko, Artem; Boulygina, Eugenia; Tsygankova, Svetlana; Neuditschko, Markus; Jagannathan, Vidhya; Thèves, Catherine; Alfarhan, Ahmed H.; Alquraishi, Saleh A.; Al-Rasheid, Khaled A. S.; Popov, Ruslan; Grigoriev, Semyon; Alekseev, Anatoly N.; Rubin, Edward M.; McCue, Molly; Rieder, Stefan; Leeb, Tosso; Tikhonov, Alexei; Crubézy, Eric; Slatkin, Montgomery; Marques-Bonet, Tomas; Nielsen, Rasmus; Willerslev, Eske; Kantanen, Juha; Prokhortchouk, Egor; Orlando, Ludovic
2015-01-01
Yakutia, Sakha Republic, in the Siberian Far East, represents one of the coldest places on Earth, with winter record temperatures dropping below −70 °C. Nevertheless, Yakutian horses survive all year round in the open air due to striking phenotypic adaptations, including compact body conformations, extremely hairy winter coats, and acute seasonal differences in metabolic activities. The evolutionary origins of Yakutian horses and the genetic basis of their adaptations remain, however, contentious. Here, we present the complete genomes of nine present-day Yakutian horses and two ancient specimens dating from the early 19th century and ∼5,200 y ago. By comparing these genomes with the genomes of two Late Pleistocene, 27 domesticated, and three wild Przewalski’s horses, we find that contemporary Yakutian horses do not descend from the native horses that populated the region until the mid-Holocene, but were most likely introduced following the migration of the Yakut people a few centuries ago. Thus, they represent one of the fastest cases of adaptation to the extreme temperatures of the Arctic. We find cis-regulatory mutations to have contributed more than nonsynonymous changes to their adaptation, likely due to the comparatively limited standing variation within gene bodies at the time the population was founded. Genes involved in hair development, body size, and metabolic and hormone signaling pathways represent an essential part of the Yakutian horse adaptive genetic toolkit. Finally, we find evidence for convergent evolution with native human populations and woolly mammoths, suggesting that only a few evolutionary strategies are compatible with survival in extremely cold environments. PMID:26598656
Romero, H; Zavala, A; Musto, H
2000-01-25
It is widely accepted that the compositional pressure is the only factor shaping codon usage in unicellular species displaying extremely biased genomic compositions. This seems to be the case in the prokaryotes Mycoplasma capricolum, Rickettsia prowasekii and Borrelia burgdorferi (GC-poor), and in Micrococcus luteus (GC-rich). However, in the GC-poor unicellular eukaryotes Dictyostelium discoideum and Plasmodium falciparum, there is evidence that selection, acting at the level of translation, influences codon choices. This is a twofold intriguing finding, since (1) the genomic GC levels of the above mentioned eukaryotes are lower than the GC% of any studied bacteria, and (2) bacteria usually have larger effective population sizes than eukaryotes, and hence natural selection is expected to overcome more efficiently the randomizing effects of genetic drift among prokaryotes than among eukaryotes. In order to gain a new insight about this problem, we analysed the patterns of codon preferences of the nuclear genes of Entamoeba histolytica, a unicellular eukaryote characterised by an extremely AT-rich genome (GC = 25%). The overall codon usage is strongly biased towards A and T in the third codon positions, and among the presumed highly expressed sequences, there is an increased relative usage of a subset of codons, many of which are C-ending. Since an increase in C in third codon positions is 'against' the compositional bias, we conclude that codon usage in E. histolytica, as happens in D. discoideum and P. falciparum, is the result of an equilibrium between compositional pressure and selection. These findings raise the question of why strongly compositionally biased eukaryotic cells may be more sensitive to the (presumed) slight differences among synonymous codons than compositionally biased bacteria.
[Genome editing of industrial microorganism].
Zhu, Linjiang; Li, Qi
2015-03-01
Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.
Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM)
Beagrie, Robert A.; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C.A.; Chotalia, Mita; Xie, Sheila Q.; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R.; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A.W.; Nicodemi, Mario; Pombo, Ana
2017-01-01
Summary The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. We developed a novel genome-wide method, Genome Architecture Mapping (GAM), for measuring chromatin contacts, and other features of three-dimensional chromatin topology, based on sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify an enrichment for specific interactions between active genes and enhancers across very large genomic distances, using a mathematical model ‘SLICE’ (Statistical Inference of Co-segregation). GAM also reveals an abundance of three-way contacts genome-wide, especially between regions that are highly transcribed or contain super-enhancers, highlighting a previously inaccessible complexity in genome architecture and a major role for gene-expression specific contacts in organizing the genome in mammalian nuclei. PMID:28273065
The genomes of four tapeworm species reveal adaptations to parasitism.
Tsai, Isheng J; Zarowiecki, Magdalena; Holroyd, Nancy; Garciarrubio, Alejandro; Sánchez-Flores, Alejandro; Brooks, Karen L; Tracey, Alan; Bobes, Raúl J; Fragoso, Gladis; Sciutto, Edda; Aslett, Martin; Beasley, Helen; Bennett, Hayley M; Cai, Xuepeng; Camicia, Federico; Clark, Richard; Cucher, Marcela; De Silva, Nishadi; Day, Tim A; Deplazes, Peter; Estrada, Karel; Fernández, Cecilia; Holland, Peter W H; Hou, Junling; Hu, Songnian; Huckvale, Thomas; Hung, Stacy S; Kamenetzky, Laura; Keane, Jacqueline A; Kiss, Ferenc; Koziol, Uriel; Lambert, Olivia; Liu, Kan; Luo, Xuenong; Luo, Yingfeng; Macchiaroli, Natalia; Nichol, Sarah; Paps, Jordi; Parkinson, John; Pouchkina-Stantcheva, Natasha; Riddiford, Nick; Rosenzvit, Mara; Salinas, Gustavo; Wasmuth, James D; Zamanian, Mostafa; Zheng, Yadong; Cai, Jianping; Soberón, Xavier; Olson, Peter D; Laclette, Juan P; Brehm, Klaus; Berriman, Matthew
2013-04-04
Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.